Apart from the fundamentals of Hadoop discussed earlier, what often intrigues a person new to Hadoop are the numerous open source projects/components based out of Hadoop. This article serves as a glossary for all of such projects and concepts:
Cassandra uses the Gossip protocol for internode communications, and Gossip services are integrated with the Cassandra software. HBase relies on Zookeeper -- an entirely separate distributed application -- to handle corresponding tasks.
While neither Cassandra nor HBase support real transactions, both provide some level of consistency control.
Hive looks very much like traditional database code with SQL access. However, because Hive is based on Hadoop and MapReduce operations, there are several key differences. The first is that Hadoop is intended for long sequential scans, and because Hive is based on Hadoop, you can expect queries to have a very high latency (many minutes). This means that Hive would not be appropriate for applications that need very fast response times. Finally, Hive is read-based and therefore not appropriate for transaction processing that typically involves a high percentage of write operations.