Wednesday, May 29, 2013

Learn a little bit - Pig

I learned Introduction to Data Science by video lecture that talk about Pig. Teacher uses Pig to explain more about MapReduce. So, I was necessary to install Pig on my virutalbox (Hadoop test). Anyway, It's not difficult for installation and test it.
First, I chose to download binary and tested a little bit.
[surachart@linux01 ~]$ wget http://apache.cs.utah.edu/pig/stable/pig-0.11.1.tar.gz
[surachart@linux01 ~]$ tar zxf pig-0.11.1.tar.gz
[surachart@linux01 ~]$ ln -s  pig-0.11.1 pig
[surachart@linux01 ~]$ export PATH=$PATH:$HOME/pig/bin
[surachart@linux01 ~]$ pig -x local
2013-05-29 15:22:27,126 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 15:22:27,129 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/surachart/pig_1369815747107.log
2013-05-29 15:22:27,264 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 15:22:27,717 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
grunt> quit
[surachart@linux01 ~]$ pig -x mapreduce
2013-05-29 15:22:46,385 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 15:22:46,389 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/surachart/pig_1369815766361.log
2013-05-29 15:22:46,531 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 15:22:47,436 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2013-05-29 15:22:49,820 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
grunt> quit
I spent time in 10 minutes to download and was able to use it. Then, I downloaded pig-wordcount for word count testing.
[surachart@linux01 ~]$ tar xf  pig-wordcount-7-26.tar
[surachart@linux01 ~]$ cd pig-wordcount
[surachart@linux01 pig-wordcount]$ ls
input.txt  readme  wordcount.pig
[surachart@linux01 pig-wordcount]$ cat readme
readme file for Pig tutorial

1) Run Pig Word Count using Local Mode
        bin/pig -x local wordcount.pig
2) Run Pig Word Count using Hadoop Mode
        a.configure Hadoop cluster
        b.bin/pig -x mapreduce wordcount.pig
[surachart@linux01 pig-wordcount]$ cat wordcount.pig
A = load './input.txt';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = group B by word;
D = foreach C generate COUNT(B), group;
store D into './wordcount';

[surachart@linux01 pig-wordcount]$ pig -x local wordcount.pig
2013-05-29 16:40:25,113 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 16:40:25,117 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/surachart/pig-wordcount/pig_1369820425090.log
2013-05-29 16:40:26,303 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 16:40:26,759 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2013-05-29 16:40:30,606 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2013-05-29 16:40:31,258 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-05-29 16:40:31,348 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2013-05-29 16:40:31,462 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-05-29 16:40:31,463 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-05-29 16:40:31,605 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 16:40:31,688 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 16:40:31,712 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-05-29 16:40:31,730 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=93
2013-05-29 16:40:31,731 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 16:40:31,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 16:40:31,969 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 16:40:31,969 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 16:40:31,970 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: /tmp/1369820431968-0
2013-05-29 16:40:32,404 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 16:40:32,499 [JobControl] INFO  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-05-29 16:40:32,536 [JobControl] WARN  org.apache.hadoop.mapred.JobClient - No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
2013-05-29 16:40:32,753 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 16:40:32,760 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 16:40:32,841 [JobControl] WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2013-05-29 16:40:32,856 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 16:40:32,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-05-29 16:40:34,450 [Thread-3] INFO  org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2013-05-29 16:40:34,471 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local2053039924_0001_m_000000_0
2013-05-29 16:40:34,728 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local2053039924_0001
2013-05-29 16:40:34,729 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C,D
2013-05-29 16:40:34,729 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:40:34,864 [pool-1-thread-1] INFO  org.apache.hadoop.util.ProcessTree - setsid exited with exit code 0
2013-05-29 16:40:34,911 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.Task -  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5b76de14
2013-05-29 16:40:34,982 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 93
Input split[0]:
   Length = 93
  Locations:

-----------------------

2013-05-29 16:40:35,029 [pool-1-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/surachart/pig-wordcount/input.txt:0+93
2013-05-29 16:40:35,101 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2013-05-29 16:40:35,184 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-05-29 16:40:35,186 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-05-29 16:40:35,384 [pool-1-thread-1] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2013-05-29 16:40:35,487 [pool-1-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:40:35,558 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
2013-05-29 16:40:35,748 [pool-1-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine - Aliases being processed per job phase (AliasName[line,offset]): M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:40:35,782 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.MapTask - Finished spill 0
2013-05-29 16:40:35,808 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.Task - Task:attempt_local2053039924_0001_m_000000_0 is done. And is in the process of commiting
2013-05-29 16:40:35,843 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2013-05-29 16:40:35,846 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.Task - Task 'attempt_local2053039924_0001_m_000000_0' done.
2013-05-29 16:40:35,846 [pool-1-thread-1] INFO  org.apache.hadoop.mapred.LocalJobRunner - Finishing task: attempt_local2053039924_0001_m_000000_0
2013-05-29 16:40:35,847 [Thread-3] INFO  org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-05-29 16:40:35,968 [Thread-3] INFO  org.apache.hadoop.mapred.Task -  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3ebc312f
2013-05-29 16:40:35,974 [Thread-3] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2013-05-29 16:40:35,998 [Thread-3] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2013-05-29 16:40:36,075 [Thread-3] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 156 bytes
2013-05-29 16:40:36,082 [Thread-3] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2013-05-29 16:40:36,169 [Thread-3] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-05-29 16:40:36,235 [Thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:40:36,247 [Thread-3] INFO  org.apache.hadoop.mapred.Task - Task:attempt_local2053039924_0001_r_000000_0 is done. And is in the process of commiting
2013-05-29 16:40:36,257 [Thread-3] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2013-05-29 16:40:36,258 [Thread-3] INFO  org.apache.hadoop.mapred.Task - Task attempt_local2053039924_0001_r_000000_0 is allowed to commit now
2013-05-29 16:40:36,275 [Thread-3] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local2053039924_0001_r_000000_0' to file:/home/surachart/pig-wordcount/wordcount
2013-05-29 16:40:36,291 [Thread-3] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2013-05-29 16:40:36,292 [Thread-3] INFO  org.apache.hadoop.mapred.Task - Task 'attempt_local2053039924_0001_r_000000_0' done.
2013-05-29 16:40:36,840 [main] WARN  org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob for job job_local2053039924_0001
2013-05-29 16:40:36,853 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-05-29 16:40:36,854 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2013-05-29 16:40:36,868 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
1.2.0   0.11.1  surachart       2013-05-29 16:40:31     2013-05-29 16:40:36     GROUP_BY

Success!

Job Stats (time in seconds):
JobId   Alias   Feature Outputs
job_local2053039924_0001        A,B,C,D GROUP_BY,COMBINER       file:///home/surachart/pig-wordcount/wordcount,

Input(s):
Successfully read records from: "file:///home/surachart/pig-wordcount/input.txt"

Output(s):
Successfully stored records in: "file:///home/surachart/pig-wordcount/wordcount"

Job DAG:
job_local2053039924_0001


2013-05-29 16:40:36,887 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
[surachart@linux01 pig-wordcount]$ cat wordcount/part-r-00000
2       in
1       for
2       pig
2       2012
1       word
1       count
2       school
2       summer
1       indiana
2       tutorial

[surachart@linux01 pig-wordcount]$
[surachart@linux01 pig-wordcount]$ hadoop dfs -put input.txt .
[surachart@linux01 pig-wordcount]$ hadoop dfs -cat input.txt
summer school 2012 in indiana
pig tutorial for summer school 2012
word count in pig tutorial
[surachart@linux01 pig-wordcount]$ less readme
[surachart@linux01 pig-wordcount]$ pig -x mapreduce wordcount.pig
2013-05-29 16:41:51,628 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 16:41:51,636 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/surachart/pig-wordcount/pig_1369820511607.log
2013-05-29 16:41:52,631 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 16:41:53,493 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2013-05-29 16:41:54,952 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
2013-05-29 16:41:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2013-05-29 16:41:58,250 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-05-29 16:41:58,340 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2013-05-29 16:41:58,448 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-05-29 16:41:58,448 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-05-29 16:41:58,806 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 16:41:58,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 16:41:58,903 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-05-29 16:41:58,913 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=93
2013-05-29 16:41:58,913 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 16:41:58,916 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7981723318394905164.jar
2013-05-29 16:42:11,650 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7981723318394905164.jar created
2013-05-29 16:42:11,720 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 16:42:11,749 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 16:42:11,750 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 16:42:11,755 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-05-29 16:42:12,129 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 16:42:12,634 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-05-29 16:42:13,430 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 16:42:13,435 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 16:42:13,495 [JobControl] INFO  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-05-29 16:42:13,496 [JobControl] WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2013-05-29 16:42:13,507 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 16:42:15,162 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201305291026_0016
2013-05-29 16:42:15,162 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C,D
2013-05-29 16:42:15,162 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2013-05-29 16:42:15,163 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201305291026_0016
2013-05-29 16:42:36,017 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2013-05-29 16:43:05,312 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-05-29 16:43:05,323 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
1.2.0   0.11.1  surachart       2013-05-29 16:41:58     2013-05-29 16:43:05     GROUP_BY

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias   Feature Outputs
job_201305291026_0016   1       1       10      10      10      10      19      19      19      19      A,B,C,D GROUP_BY,COMBINER       hdfs://localhost:9000/user/surachart/wordcount,

Input(s):
Successfully read 3 records (458 bytes) from: "hdfs://localhost:9000/user/surachart/input.txt"

Output(s):
Successfully stored 10 records (78 bytes) in: "hdfs://localhost:9000/user/surachart/wordcount"

Counters:
Total records written : 10
Total bytes written : 78
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201305291026_0016


2013-05-29 16:43:05,408 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
[surachart@linux01 pig-wordcount]$ hadoop dfs -cat  wordcount/part-r-00000
2       in
1       for
2       pig
2       2012
1       word
1       count
2       school
2       summer
1       indiana
2       tutorial

[surachart@linux01 pig-wordcount]$
As above examples, I used "wordcount.pig" script. It didn't sort words as I wanted. So, I changed it a bit and tested.
[surachart@linux01 ~]$ pig -x mapreduce
2013-05-29 19:07:00,396 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-29 19:07:00,412 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/surachart/pig_1369829220356.log
2013-05-29 19:07:00,565 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/surachart/.pigbootup not found
2013-05-29 19:07:01,429 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
2013-05-29 19:07:03,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
grunt> A = load './input.txt';
grunt> B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
grunt> C = group B by word;
grunt> D = ORDER C BY $0;
grunt> E = foreach D generate COUNT(B), group;
grunt> store E into './wordcount-opun';
2013-05-29 19:07:48,256 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,ORDER_BY
2013-05-29 19:07:49,056 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-05-29 19:07:49,355 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2013-05-29 19:07:49,357 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2013-05-29 19:07:49,726 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 19:07:49,799 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 19:07:49,810 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-05-29 19:07:49,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=93
2013-05-29 19:07:49,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 19:07:49,821 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job8101458691848291674.jar
2013-05-29 19:08:03,816 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job8101458691848291674.jar created
2013-05-29 19:08:03,914 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 19:08:03,950 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 19:08:03,950 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 19:08:03,970 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-05-29 19:08:04,331 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 19:08:04,834 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-05-29 19:08:05,791 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 19:08:05,797 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 19:08:05,866 [JobControl] INFO  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-05-29 19:08:05,868 [JobControl] WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2013-05-29 19:08:05,885 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 19:08:07,448 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201305291026_0021
2013-05-29 19:08:07,448 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C
2013-05-29 19:08:07,449 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],B[2,4],C[3,4] C:  R:
2013-05-29 19:08:07,449 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201305291026_0021
2013-05-29 19:08:30,911 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 16% complete
2013-05-29 19:08:50,200 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2013-05-29 19:09:02,519 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 19:09:02,523 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 19:09:02,527 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-05-29 19:09:02,587 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=112001
2013-05-29 19:09:02,594 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 19:09:02,604 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job9027641740102345362.jar
2013-05-29 19:09:15,302 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job9027641740102345362.jar created
2013-05-29 19:09:15,354 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 19:09:15,358 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 19:09:15,358 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 19:09:15,359 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-05-29 19:09:15,507 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 19:09:16,038 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 19:09:16,040 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 19:09:16,047 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 19:09:17,156 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201305291026_0022
2013-05-29 19:09:17,157 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases D
2013-05-29 19:09:17,157 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: D[4,4] C:  R:
2013-05-29 19:09:17,158 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201305291026_0022
2013-05-29 19:09:41,586 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2013-05-29 19:10:06,039 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
2013-05-29 19:10:21,970 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-05-29 19:10:21,977 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-05-29 19:10:21,978 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-05-29 19:10:21,982 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4575885537034789195.jar
2013-05-29 19:10:35,364 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4575885537034789195.jar created
2013-05-29 19:10:35,412 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-05-29 19:10:35,415 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-05-29 19:10:35,415 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-05-29 19:10:35,416 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-05-29 19:10:35,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-05-29 19:10:36,182 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-05-29 19:10:36,183 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-05-29 19:10:36,192 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-05-29 19:10:37,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201305291026_0023
2013-05-29 19:10:37,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases D,E
2013-05-29 19:10:37,291 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: D[4,4] C:  R: E[5,4]
2013-05-29 19:10:37,291 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201305291026_0023
2013-05-29 19:11:02,322 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete
2013-05-29 19:11:07,401 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete
2013-05-29 19:11:37,447 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-05-29 19:11:37,458 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
1.2.0   0.11.1  surachart       2013-05-29 19:07:49     2013-05-29 19:11:37     GROUP_BY,ORDER_BY

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias   Feature Outputs
job_201305291026_0021   1       1       11      11      11      11      19      19      19      19      A,B,C   GROUP_BY
job_201305291026_0022   1       1       13      13      13      13      24      24      24      24      D       SAMPLER
job_201305291026_0023   1       1       12      12      12      12      20      20      20      20      D,E     ORDER_BY        hdfs://localhost:9000/user/surachart/wordcount-opun,

Input(s):
Successfully read 3 records (458 bytes) from: "hdfs://localhost:9000/user/surachart/input.txt"

Output(s):
Successfully stored 10 records (78 bytes) in: "hdfs://localhost:9000/user/surachart/wordcount-opun"

Counters:
Total records written : 10
Total bytes written : 78
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201305291026_0021   ->      job_201305291026_0022,
job_201305291026_0022   ->      job_201305291026_0023,
job_201305291026_0023


2013-05-29 19:11:37,585 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
grunt> quit
[surachart@linux01 ~]$ hadoop fs -cat wordcount-opun/part*
2       2012
1       count
1       for
2       in
1       indiana
2       pig
2       school
2       summer
2       tutorial
1       word
Finally, I got result as I wanted ^_________^
Read more on Pig Document.

No comments: