Thursday, November 01, 2012

Programming Hive - Data Warehouse and Query Language for Hadoop By Edward Capriolo, Dean Wampler, Jason Rutherglen

Today, there have talked about Big Data. Some productions have used Hadoop. So, How to move a relational database application to Hadoop. I think Hive is interesting to learn too. That's the point to find out more information about it.
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Last time, I downloaded Impala and tested it. However, I didn't want to talk about it but it's good for using it to learn and test.
-bash-4.1$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/cloudera/hive_job_log_cloudera_201211010525_336972305.txt
hive> show databases
    > ;
OK
default
Time taken: 8.265 seconds
hive>

If you are interested Hive, you can start to learn from link. Maybe, It's good idea to read some book which was written about it. So, i mention book titles Programming Hive Data Warehouse and Query Language for Hadoop By Edward Capriolo, Dean Wampler, Jason Rutherglen. You will begin with Hive from newbie to intermediate.


This book has 23 chapters, It was started with introduction and getting started topics. You will learn why you are supposed to use Hive and installation. After you knew whether you should use it or not. You will learn something about Hive. For example: Data Types and File Formats, Data Definition, Data Manipulation, Queries and etc. You will learn to setup and use some applications what relate to Hive. For Example: Hive Thrift and etc When you read this book, you can follow examples and test command on your test.
For readers, you should know a bit about Hadoop, SQL-92 and XML. However, you can download virtualbox (impala) and install, then run hive to practice from book examples, even through you have no idea about Hadoop. Practice and practice! I still believe there is a best way for learning. This book still gathers case studies which are interesting. Somehow, I read this book, there is helpful for me understand about Hive and how to learn, install, configure and use it further. I could learn many command-line for Hive and HiveQL. I could follow command and HiveQL for test by myself. I could learn some parameters about Hive, that's useful for tuning and etc. If you are thinking and planing to implement Hive and find out book. This book might help you.

Something in book should adjust, that might be URL links in a book because some links have not worked. Totally, the detail in book is valuable for Hive learning.

About the Author, they have experienced many years with Software Development and Big Data.
Edward Capriolo (@edwardcapriolo)
Dean Wampler (@deanwampler)
Jason Rutherglen



No comments: