Thursday, July 25, 2013

Life Learning - Pivotal HD

I was unable to learn much as much people who have spent life with products, but I was interested in something new, for example: Pivotal HD and it was easy for download it as Pivotal HD Single Node VM.

Pivotal HD is a 100% Apache-compatible Hadoop distribution featuring a fully SQL compliant query engine for processing data stored in Hadoop. By adding rich, mature SQL processing, Pivotal HD allows enterprises to simplify development, expand Hadoop’s capabilities, increase productivity, and cut costs. It has been tested for scale on the 1000 node Pivotal Analytics Workbench to ensure that the stack works flawlessly in large enterprise deployments.
Wow! I thought it's not bad, if I will learn a little bit about it.
First of all, I downloaded Pivotal HD Single Node VM and added it in my virtualbox, then started vm and check. (ssh by using "gpadmin"/"password")
[gpadmin@pivhdsne ~]$ pwd
/home/gpadmin
[gpadmin@pivhdsne ~]$ ls
Attic  Desktop  Documents  Downloads  gpAdminLogs  pivotal-samples  workspace
[gpadmin@pivhdsne ~]$ cd Desktop/
[gpadmin@pivhdsne Desktop]$ ls
eclipse  Pivotal_Community_Edition_VM_EULA_20130712_final.pdf  Pivotal_Docs  README  README~  start_piv_hd.sh  stop_piv_hd.sh
[gpadmin@pivhdsne Desktop]$ cat README
Pivotal HD 1.0.1 Single Node (VM)
Version 1

How to use this VM:
1. Start the Hadoop services using start_all.sh on the desktop
2. Follow the tutorials at http://pivotalhd.cfapps.io/getting-started/pivotalhd-vm.html
3. Leverage the Pivotal HD community at http://gopivotal.com/community for support
4. root and gpadmin accounts have password password
5. Command Center login is gpadmin, with password gpadmin
6. gpadmin account has sudo privileges

What is included:
1. Pivotal HD - Hadoop 2.x, Zookeeper, HBase, Hive, Pig, Mahout
2. Pivotal HAWQ
3. Pivotal Extension Framework (PXF)
4. Pivotal DataLoader
5. Product usage documentation

Other installed packages:
1. JDK 6
2. Ant
3. Maven
4. Eclipse

[gpadmin@pivhdsne Desktop]$ ./start_piv_hd.sh
Starting services
SUCCESS: Start complete
Using JAVA_HOME: /usr/java/jdk1.6.0_26
Starting dataloader in standalone mode...
Starting Embedded Zookeeper Server...
Sending output to /var/log/gphd/dataloader/dataloader-embedded-zk.log
Embedded Zookeeper Server started!
Starting dataloader scheduler...
Sending output to /var/log/gphd/dataloader/dataloader-scheduler.log
Dataloader Scheduler Started!
Starting dataloader manager...
Sending output to /var/log/gphd/dataloader/dataloader-manager.log
Dataloader Manager Started!
Dataloader started!
20130725:01:16:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting gpstart with args: -a
20130725:01:16:19:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Gathering information and validating the environment...
20130725:01:16:32:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (HAWQ) 4.2.0 build 1'
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Greenplum Catalog Version: '201306170'
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[WARNING]:-postmaster.pid file exists on Master, checking if recovery startup required
20130725:01:16:41:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing recovery startup checks
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No socket connection or lock file in /tmp found for port=5432
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No Master instance process, entering recovery startup mode
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Clearing Master instance pid file
20130725:01:16:43:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance in admin mode
20130725:01:17:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20130725:01:17:18:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Segment details from master...
20130725:01:17:20:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Setting new master era
20130725:01:17:20:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing forced instance shutdown
20130725:01:17:33:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance in admin mode
20130725:01:17:46:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20130725:01:17:46:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Obtaining Segment details from master...
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Setting new master era
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Master Started...
20130725:01:17:47:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Shutting down master
20130725:01:17:59:005738 gpstart:pivhdsne:gpadmin-[INFO]:-No standby master configured.  skipping...
20130725:01:18:00:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
...............
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Process results...
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-   Successful segment starts                                            = 2
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-----------------------------------------------------
20130725:01:18:15:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Starting Master instance pivhdsne.localdomain directory /data/1/hawq_master/gpseg-1
20130725:01:18:23:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Command pg_ctl reports Master pivhdsne.localdomain instance active
20130725:01:18:23:005738 gpstart:pivhdsne:gpadmin-[INFO]:-Database successfully started
This was first step. Then Read More..
After I read a bit, I thought I should test some sample. I chose to begin with Setting up the Development Environment. ...some step from link.
[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop jar target/customer_first_and_last_order_dates-1.0.jar com.pivotal.hadoop.CustomerFirstLastOrderDateDriver /retail_demo/orders/orders.tsv.gz /output-mr2
13/07/25 04:28:22 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/07/25 04:28:24 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/07/25 04:28:25 INFO input.FileInputFormat: Total input paths to process : 1
13/07/25 04:28:26 WARN snappy.LoadSnappy: Snappy native library is available
13/07/25 04:28:26 INFO snappy.LoadSnappy: Snappy native library loaded
13/07/25 04:28:27 INFO mapreduce.JobSubmitter: number of splits:1
13/07/25 04:28:27 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
13/07/25 04:28:27 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/07/25 04:28:27 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/07/25 04:28:27 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/07/25 04:28:27 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
13/07/25 04:28:27 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/07/25 04:28:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1374729279703_0002
13/07/25 04:28:30 INFO client.YarnClientImpl: Submitted application application_1374729279703_0002 to ResourceManager at pivhdsne/127.0.0.2:8032
13/07/25 04:28:30 INFO mapreduce.Job: The url to track the job: http://pivhdsne:8088/proxy/application_1374729279703_0002/
13/07/25 04:28:30 INFO mapreduce.Job: Running job: job_1374729279703_0002
13/07/25 04:29:01 INFO mapreduce.Job: Job job_1374729279703_0002 running in uber mode : false
13/07/25 04:29:01 INFO mapreduce.Job:  map 0% reduce 0%
13/07/25 04:29:59 INFO mapreduce.Job:  map 3% reduce 0%
13/07/25 04:30:03 INFO mapreduce.Job:  map 10% reduce 0%
13/07/25 04:30:06 INFO mapreduce.Job:  map 22% reduce 0%
13/07/25 04:30:10 INFO mapreduce.Job:  map 33% reduce 0%
13/07/25 04:30:13 INFO mapreduce.Job:  map 45% reduce 0%
13/07/25 04:30:16 INFO mapreduce.Job:  map 52% reduce 0%
13/07/25 04:30:20 INFO mapreduce.Job:  map 59% reduce 0%
13/07/25 04:30:23 INFO mapreduce.Job:  map 66% reduce 0%
13/07/25 04:30:32 INFO mapreduce.Job:  map 100% reduce 0%
13/07/25 04:31:08 INFO mapreduce.Job:  map 100% reduce 33%
13/07/25 04:31:11 INFO mapreduce.Job:  map 100% reduce 66%
13/07/25 04:31:26 INFO mapreduce.Job:  map 100% reduce 67%
13/07/25 04:31:33 INFO mapreduce.Job:  map 100% reduce 68%
13/07/25 04:31:36 INFO mapreduce.Job:  map 100% reduce 69%
13/07/25 04:31:39 INFO mapreduce.Job:  map 100% reduce 74%
13/07/25 04:31:43 INFO mapreduce.Job:  map 100% reduce 78%
13/07/25 04:31:46 INFO mapreduce.Job:  map 100% reduce 82%
13/07/25 04:31:49 INFO mapreduce.Job:  map 100% reduce 87%
13/07/25 04:31:53 INFO mapreduce.Job:  map 100% reduce 91%
13/07/25 04:31:56 INFO mapreduce.Job:  map 100% reduce 96%
13/07/25 04:31:59 INFO mapreduce.Job:  map 100% reduce 98%
13/07/25 04:32:02 INFO mapreduce.Job:  map 100% reduce 100%
13/07/25 04:32:02 INFO mapreduce.Job: Job job_1374729279703_0002 completed successfully
13/07/25 04:32:02 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=18946633
                FILE: Number of bytes written=38031433
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=72797182
                HDFS: Number of bytes written=11891611
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=184466
                Total time spent by all reduces in occupied slots (ms)=257625
        Map-Reduce Framework
                Map input records=512071
                Map output records=512071
                Map output bytes=17922485
                Map output materialized bytes=18946633
                Input split bytes=118
                Combine input records=0
                Combine output records=0
                Reduce input groups=167966
                Reduce shuffle bytes=18946633
                Reduce input records=512071
                Reduce output records=167966
                Spilled Records=1024142
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=2486
                CPU time spent (ms)=67950
                Physical memory (bytes) snapshot=1088774144
                Virtual memory (bytes) snapshot=5095378944
                Total committed heap usage (bytes)=658378752
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=72797064
        File Output Format Counters
                Bytes Written=11891611
[gpadmin@pivhdsne customer_first_and_last_order_dates]$
[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop fs -cat /output-mr2/part-r-00000 | wc -l
167966
[gpadmin@pivhdsne customer_first_and_last_order_dates]$ hadoop fs -cat /output-mr2/part-r-00000 | tail
54992348        6933068175      2010-10-04 02:21:44     6311149380      2010-10-11 18:21:12
54992896        8136297804      2010-10-02 11:33:38     6310999573      2010-10-11 19:01:48
54993581        6311050522      2010-10-11 21:43:05     8122646976      2010-10-14 16:28:00
54993992        6805481711      2010-10-01 15:32:50     8212538352      2010-10-07 00:20:50
54994403        7708210740      2010-10-08 06:41:29     8122646502      2010-10-14 06:36:05
54994814        8136355210      2010-10-02 15:36:27     7708874714      2010-10-08 19:50:08
54994951        6805748378      2010-10-01 05:38:36     8494440118      2010-10-09 04:44:55
54995088        8136355283      2010-10-02 23:29:08     5007019717      2010-10-13 12:32:23
54995225        6805524075      2010-10-01 08:26:01     6933068564      2010-10-04 23:02:24
54995773        6933024646      2010-10-04 11:57:57     5007019751      2010-10-13 03:36:25
[gpadmin@pivhdsne customer_first_and_last_order_dates]$
I forgot to show url (http://ip:50000)
I thought Pivotal HD is a good product that I can use it to learn about Hadoop and etc (Hadoop 2.x, Zookeeper, HBase, Hive, Pig, Mahout and Pivotal HAWQ).

No comments: