Open Source Books: Hadoop, the Definitive Guide

“Hadoop: The Definitive Guide” is the first book covering the now famous java framework supporting data intensive distributed applications.

Doug Cutting, the project’s author now working at Cloudera, wrote that Tom White – author of the book and long time contributor to the Apache top-level project – is the most qualified person to write a book about hadoop.

The book starts with an introduction to Google’s MapReduce, than it looks in depth first at HDFS, Hadoop’s own filesystem and I/O fundamentals in Hadoop.

The guide covers also Hadoop administration, and reports a number of  case studies, introducing the user to use Pig (a high level query language for large-scale data processing), HBase (Hadoop’s database for structured and semi-structured data) and ZooKeeper, a toolkit of coordination primitives for building distributed systems.

To know more about Hadoop and MapReduce read also “Getting started with Hadoop and MapReduce“.