Monday, November 10, 2014

Bluemix + Watson = Unlimited Possibilities!


Event: Meetup Event Details

Date: 08-Nov-2014
 
Host / Speakers:
Salil Ahuja, Sr. Product Manager, IBM Watson Developer Cloud.
Rajendra Kamath (myself), Engineering Manager, IBM India Software Labs



Agenda:
1) Overview on Bluemix
2) IBM Watson Services on Bluemix
3) Demo Applications using IBM Watson Services

Summary:
I flagged off the event with an overview on the value proposition of bluemix that you can find here. This was followed with a historical perspective on how the computing industry has evolved towards cognitive computing. This included the narration on how IBM Deep Blue first showed that computing systems could better humans when it beat the then reigning World Champion, Gary Kasparov. This was followed with an interesting discussion on IBM Watson first making a public appearance in a game of jeapordy that is elaborated here. The baton was then handed over to Salil who covered all of the 8 IBM Watson Services that are now available on Bluemix. This also included live demos that show cases the use-case and possible usage scenarios for real world applications.
The event was resumed post a short tea/snacks break. Finally based on popular demand we also covered an overview on all of the Bluemix Services along with a demo building a app using Watson services in Bluemix.

Some key take-aways:

1) Attendees were especially excited to learn on limitless possibilities for next-gen solutions/ideas with cognitive capabilities using Watson Services on Bluemix.

2) Although this meetup agenda was focused on Watson Services, there were a few who expressed excitement on the other bluemix services (IoT, DevOps) and have committed to evaluate and get back to us if they need assistance.

Watch out this space for updates on future such events!

Cheers to Bluemix!

Monday, November 3, 2014

Hadoop Quick Reference - Part 1

How it all started?
Although businesses and organizations are waking up only now to the realties on the need to manage large amounts of data, this has been the #1 focus area for web search companies vis-a-vis Google, Yahoo. Having to manage billions of indexes and web page information is a daunting task with traditional RDBMS and high-availability technologies. During the early part of the decade (2002-2005) Yahoo started working on a project to handle this challenge which concluded in what came to be known as Hadoop.  The cost of high available/reliable hardware was expensive and starting to breach the economics of scale to store and manage tis kind of data. Hadoop was built on the premise that 'hardware failure is imminent' and hence the question was on how this is to be handled in the higher software layers. This premise has lead to the emergence of maturing of technologies around the Hadoop framework.

What does Hadoop offer?
Hadoop in essence offers the software framework for distributed storage of large data sets on clusters of computer systems (could be cheap commodity computer hardware systems). The very architecture makes it ideal for high-availability and rapid data processing (via distributed processing across the cluster).

A good introduction video:


Key Concepts:
Hadoop at its core is defined by:
1) A Data Storage Strategy:
This is called the HDFS (Hadoop Distrubuted File System)
Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. The data is broken down into blocks (files) and stored on data nodes. The blocks are replicated on multiple data nodes for reliability. One of the machines referred to as the 'Name Node'  that manages the file system namespace and the mapping of the files to the blocks that belong to it. HDFS continuously monitors the data replicas in the system and check for corruption or datanode/disk failures. Since the HDFS framework (software) takes care of automatically recovering data from failed replicas, the failed nodes/hardware need not be replaced immediately (unlike traditional HA technologies where hardware failure needs to be addressed immediately)
Image Courtesy: ibm


2) A Data Processing Strategy:
This is MapReduce. MapReduce is essentially the 'computation' part for processing the data. The MapReduce job is co-located across each of the data nodes. This primarily involves the 'Map' part which is applying a particular search query/function across all of the data node and the 'Reduce' part which is aggregating all of the results from the 'Map' phase.





Predictive Analytics ....... what next?

I have often pondered on this question, wondering what could possibly be the next big quantum leap in the real of data and data centric de...