Monday, December 24, 2012

MapReduce Design Patterns Building Effective Algorithms and Analytics for Hadoop and Other Systems By Donald Miner, Adam Shook

What is MapReduce? It is computing paradigm for processing data that reside on hundreds of computer. BTW, I believe you can read more meaning on wikipedia or some page, there will give you more.
As you know MapReduce is the heart of Hadoop. If you are interested in Hadoop. You cannot avoid to learn about MapReduce, it's really important.
If you are developer or someone who is interested in design patterns for the MapReduce framework. I mention book from O'reilly - MapReduce Design Patterns Building Effective Algorithms and Analytics for Hadoop and Other Systems By Donald Miner, Adam Shook. All code examples in book are written for Hadoop. You will learn from many examples. This book looks like "cook book" (Each example, you will see question, how to do, idea, example code and comparing with sql & pig), but readers should even know about Hadoop and java programming or be able to read java code, because all example is java code. However, it's a good idea to use this book as reference. Readers can reproduce code in book with their work or real world.
You will see:
  • Summarization patterns: get a top-level view by summarizing and grouping data
  • Filtering patterns: view data subsets such as records generated from one user
  • Data organization patterns: reorganize data to work with other systems, or to make MapReduce analysis easier
  • Join patterns: analyze different datasets together to discover interesting relationships
  • Metapatterns: piece together several patterns to solve multi-stage problems, or to perform several analytics in the same job
  • Input and output patterns: customize the way you use Hadoop to load or store data
I believe it's a good book about MapReduce Design Patterns. A template format for example is useful, you will see description of the problem, why you have to solve it, idea, output and etc. There will inspire you to follow each example and each idea. As I told examples might be difficult, if readers don't have knowledge about Hadoop and can't read java code.


No comments: