Showing posts from December, 2014

Hadoop Quick Reference - Part 2

Apart from the fundamentals of Hadoop discussed earlier, what often intrigues a person new to Hadoop are the numerous open source projects/components based out of Hadoop. This article serves as a glossary for all of such projects and concepts:
YARN: Stands for Yet Another Resource Negotiator. YARN is responsible for managing 'reosurce requests' from other components/application. Resources here typically refer to CPU and Memory. In the earlier version of Hadoop, MapReduce (MR1) was responsible for both 'Resource Management' and 'Data Processing'. With YARN we now have a more modular/component-iced set of responsibilities across MR2 and YARN.

Ambari: Ambari aims to offer a single web-based open source tool for provisioning, managing and monitoring Hadoop clusters. As part of monitoring it also offers means to perform retrospective analysis and cluster management.

Avro: There was a need for applications (in various programming languages) to be able to serialize an…