Wednesday, December 21, 2011

Big Data is more than Map/Reduce

Companies of all sizes are looking for ways to make sense of their unstructured data. Data is growing at tremendous volumes and keeping track of it is becoming more challenging and expensive. Enter solutions such as Hadoop that allow you to make sense of this data using a highly distributed architecture that is based on horizontal scaling among other things.

Products like Hadoop are a critical layer of the solution, but they solve only one part of the overall tapestry needed to make sense of your Big Data. There are two key areas that are critical to your overall Big Data solution.

Imagine your Hadoop environment sitting in the center. Data must obviously be fed into it and data will flow out. An important aspect of a successful Hadoop deployment is managing these inputs and output points. Efficient management of your data as it comes in raw and then leaves as much more easily to consume information is key to having a successful Big Data environment.

Data In - Preparing your Big Data
First, you need to prepare your data for processing by Hadoop. Unstructured data must be prepared and loaded and sometimes integrated with relational data from enterprise database sources, for example. All this must be automated and fed into your Hadoop environment. This is a critical step especially as companies get into more real-time Big Data processing. The need for real-time analysis is becoming more critical with the explosion of social information and online commerce.

This requires automation to feed and prepare your Hadoop environment to minimize manual labor and many potentially error prone steps. Having this fully automated with as minimal human intervention is critical. Solutions such as JobServer, make this automation much more manageable. JobServer is ideal for integrating with your back office databases and can meld well with your IT environment using solutions such as SOA, Mule and ETL just as a few examples. JobServer can be used to centralize the logic and management for this preparation work so that steps like loading data into your HDFS are all automated easy to monitor and track.

Data Out - Data Analytics
As large amounts information are extracted out of solutions like Hadoop, visualizing the results is critical. Here to is where JobServer and its soafaces developer API can step in to address this challenge. soafaces is based on an open source API that allows for building rich reporting and analytic solutions that can capture data as it is coming out of your Hadoop environment and visualizing it in an easy to manage way. soafaces is based on the Google GWT framework for GUI development and can support rich web-based reporting and graphing technologies to provide for rich visualization of your results. JobServer also can be leveraged here to show, in a very organized manner, the results of each Hadoop job run and can be used to provide access to the final results to the right people in your organization through web reports, spreadsheets and email alerts...etc.

Contact Grand Logic and see how we can help you make better sense of your Big Data environment. JobServer is also partnering with other Big Data solution providers and major distributions to provide complete Big Data solution for both your in-house and cloud Hadoop deployments.

Please contact Grand Logic for more information.

No comments: