Cloud Analytics & ML with Sam Taha

Friday, December 30, 2011

Big Data Predictions for 2012

Well 2011 has been a great year for Hadoop and its supporting ecosystem. There is a growing base of sub projects evolving to fill the many niches in and around Hadoop and there are companies coming out of the wood work to claim their piece of the pie. Not to mention the VC money pouring into Big Data related startups and many established tech players changing their business plans to account for Hadoop. So what can we expect in 2012?

Here are seven predictions for what might be in store for Big Data in 2012:

1) Going Mainstream
Discovering all of what you can do with Big Data analytics in the enterprise is only in its infancy. Right now solutions like Hadoop are the secret weapon of the rich and social who can afford the investment in time, resources and infrastructure. Companies like Facebook and Twitter are using solutions like Hadoop to do things not possible before with traditional relational BI and analytics solutions. We will see in 2012 the window widening with more traditional enterprises seeing the potential benefits that Hadoop analytics can offer. We will see more companies in various industries look to leverage Hadoop to ask questions about their operations and customers not possible before. Look for Hadoop to go more mainstream and loose some of that exoticness that currently relegates it only to the big boys.

2) Put it in the Cloud
The barrier of entry is lowering for Hadoop with players like Amazon offering low cost of entry platforms for initial Hadoop deployments. In 2012 we will see continued acceptance of using the Cloud as the infrastructure of choice for deploying your Hadoop. With Amazon and others improving virtual private network services it will make integrating private Cloud solutions for Hadoop more palatable for security conscious enterprises. Cloud will be the target platform of choice for Hadoop in 2012. This will also open the door for smaller enterprises to dip their toe into Hadoop to discover what they have been missing in their volumes of consumer and operational data warehouses.

3) Automation and Integration
Right now most Hadoop deployments are islands of data and processing infrastructure. In the coming year we will see more tech companies begin to offer better tools to enable businesses to tie their back office data stores and data warehouses with their Hadoop environments in a more seamless fashion. Efficiently moving customer and business data out of traditional data stores such as relational databases, and processed and prepared for Hadoop consumption will be critical for successful Hadoop deployments. We anticipate a new category of ETL that will be focused on the management of data movement in and out of Hadoop and HDFS. This will gain more traction in 2012. There are already Hadoop projects focusing on related areas and we will see more Hadoop type connectors popping up from traditional software vendors eager to get their products integrated with Hadoop.

4) Analytics and Visualization
Traditional BI reporting tools are not geared well toward the type of output generated from Big Data type environments. A new breed of reporting tools and analytics solutions will emerge to better consume the output coming out of Big Data systems. Look for many traditional BI vendors to begin to tailor their front-end reporting solutions to fit with Hadoop and distributed data stores including NoSQL type of data stores. But much of what traditional BI vendors will offer will not be a natural fit since most of the BI vendors and their tools are more comfortable dealing with highly structured data. Also as business analytics in companies start to get a taste of the kind of problems that can now be solved with Big Data, that were not possible before, they will begin to think of new problems to ask that will drive the need for more visualization and reporting of the data coming out of Big Data. So keep an eye out for startups and tech companies offering Big Data native analytics solutions tailored from the ground up for visualizing the statistical kinds of data coming out of Hadoop. Turning statistical questions, common when dealing with Big Data, into visual reports that can be understood by business users will be a big leap forward to turning the raw data in many enterprises into meaningful value and actionable results.

5) Going Mobile
We will see in 2012 apps and solutions that allow business users to get a glimpse of their Hadoop operations and resulting output presented on mobile and tablet devices. This one is not a big stretch considering the growing proliferation of mobile computing. But look for Hadoop to get a bit more mobile in 2012. Visualization of BI on mobile is natural trend and Big Data is no exception.

6) Going Vertical and Healthcare
Healthcare is the perennial elephant in the room when it comes to needing operational efficiency improvements and managing exploding volume of patient data (not to mention making sense of patient data). From both the billing dimension and the diagnostic patient data aspect, healthcare will benefit greatly from the type of problems that Big Data can solve. In 2012 we will see healthcare providers and healthcare IT companies begin to seriously invest in Big Data to help them solve problems not possible before with traditional healthcare IT. Look for healthcare providers to tap Hadoop to better understand their patients inorder to deal with the volumes of digital patient data and to help them deal with government regulations and compliance.

7) Real-Time Big Data?
This might be a stretch, but look for some early signs of various tech player looking to deliver more real-time business solutions around Big Data. Hadoop brings tremendous processing power to bear to solve problems that were not practical before. With computing power growing and virtualization easer to manage and deploy, look for business users to demand Big Data type problems to be solved in more near real-time situations. This will open the door for even more interesting applications of Big Data for business and even end consumers.

Let's regroup in twelve months and see how well these predictions panned out :)

Wednesday, December 21, 2011

Big Data is more than Map/Reduce

Companies of all sizes are looking for ways to make sense of their unstructured data. Data is growing at tremendous volumes and keeping track of it is becoming more challenging and expensive. Enter solutions such as Hadoop that allow you to make sense of this data using a highly distributed architecture that is based on horizontal scaling among other things.

Products like Hadoop are a critical layer of the solution, but they solve only one part of the overall tapestry needed to make sense of your Big Data. There are two key areas that are critical to your overall Big Data solution.

Imagine your Hadoop environment sitting in the center. Data must obviously be fed into it and data will flow out. An important aspect of a successful Hadoop deployment is managing these inputs and output points. Efficient management of your data as it comes in raw and then leaves as much more easily to consume information is key to having a successful Big Data environment.

Data In - Preparing your Big Data
First, you need to prepare your data for processing by Hadoop. Unstructured data must be prepared and loaded and sometimes integrated with relational data from enterprise database sources, for example. All this must be automated and fed into your Hadoop environment. This is a critical step especially as companies get into more real-time Big Data processing. The need for real-time analysis is becoming more critical with the explosion of social information and online commerce.

This requires automation to feed and prepare your Hadoop environment to minimize manual labor and many potentially error prone steps. Having this fully automated with as minimal human intervention is critical. Solutions such as JobServer, make this automation much more manageable. JobServer is ideal for integrating with your back office databases and can meld well with your IT environment using solutions such as SOA, Mule and ETL just as a few examples. JobServer can be used to centralize the logic and management for this preparation work so that steps like loading data into your HDFS are all automated easy to monitor and track.

Data Out - Data Analytics
As large amounts information are extracted out of solutions like Hadoop, visualizing the results is critical. Here to is where JobServer and its soafaces developer API can step in to address this challenge. soafaces is based on an open source API that allows for building rich reporting and analytic solutions that can capture data as it is coming out of your Hadoop environment and visualizing it in an easy to manage way. soafaces is based on the Google GWT framework for GUI development and can support rich web-based reporting and graphing technologies to provide for rich visualization of your results. JobServer also can be leveraged here to show, in a very organized manner, the results of each Hadoop job run and can be used to provide access to the final results to the right people in your organization through web reports, spreadsheets and email alerts...etc.

Contact Grand Logic and see how we can help you make better sense of your Big Data environment. JobServer is also partnering with other Big Data solution providers and major distributions to provide complete Big Data solution for both your in-house and cloud Hadoop deployments.

Please contact Grand Logic for more information.

Wednesday, December 7, 2011

Tame your Hadoop - Hadoop Professional Services

Grand Logic is pleased to announce our expanded consulting services specializing in Hadoop solutions. Hadoop is quickly becoming the tool of choice for Big Data analytics and our expertise with Hadoop and applying our tools like JobServer (job workflow/scheduling engine) and soafaces (open source framework) enable us to build complete enterprise solutions around Hadoop for our customers.

Hadoop comes with many great supporting modules but needs additional tools and features to make it easy to manage and organize all the activity and content around your Hadoop operations. With our consulting services we can quickly come in and build a management and integration layer to help you automate and manage your Hadoop deployment. This will allow you to tie your back office data and IT systems to feed data to your Hadoop operations to automate and streamline data movement between your business data and your Hadoop analytics. This is vital to having a successful ROI for Hadoop. If you can't efficiently feed data into Hadoop and extract it (and visualize it) your Hadoop number crunching will be for not. With JobServer as part of your Hadoop environment and our expertise and professional services you will be able to:

Effectively build custom ETL processing between your back office data and your Hadoop data stores
Build, package and reuse custom logic and server-side tasks for managing and editing your Hadoop jobs
Compose complex Hadoop workflows from multiple simpler Hadoop jobs (and non-Hadoop jobs) to build support for rich scenarios where data is moved between HDFS and local storage and between multiple Hadoop jobs.
Integrate easily with modules like Cascading and Pig.
Security on job by job basis to restrict all aspects of job configuration, monitoring, reporting and execution on user by user basis.
Application permissions - control what tools are available to which users.
Detailed alerting (via email or sms) to report on status and failures by jobs and by job groups.
Organize jobs into groups and partitions for user organization resource management.
Powerful job scheduling - any scheduling pattern you can think of for running your Hadoop jobs can be be built by our consulting team.
Easily create custom reports per Hadoop job. Using the soafaces framework we can build custom reports that let you view the status and results of each of your Hadoop jobs.
We are experts with soafaces and GWT and can build rich visualization for your Hadoop generated data. And we can help you visualize this on web, mobile and tablets devices to develop rich analytics that can be consumed by decision makers in your company.

So if you want to tame your Hadoop environment, Grand Logic has the technical expertise and tool chest of tools and frameworks to get you going and organized. Contact us today to see what we can do for you.

Thursday, December 1, 2011

Hadoop Meets JobServer - Big Data Job Scheduling

Grand Logic is pleased to announce the release of JobServer 3.4. This release delivers support for Hadoop allowing Hadoop customers to use JobServer as their central access point to schedule, track and report on their Hadoop jobs and environment.

Hadoop is quickly becoming the tool of choice for open source Big Data Analytics and related computing. The JobServer team saw a great opportunity to extend JobServer's awesome job processing, scheduling and reporting capabilities to make the lives of Hadoop users better.

Hadoop and JobServer are a natural fit. JobServer offers, out of the box, extensive customization and extensibility through its soafaces developer API. This allows developers to quickly build simple or complex server-side Java jobs and manage them from a central location. Extending JobServer to schedule, manage, monitor and track Hadoop jobs was a natural fit for JobServer.

With the 3.4 release, JobServer comes bundled with standard components that let Java developers bundle their Hadoop jobs into the JobServer environment. This allows Hadoop users to launch and monitor all their Hadoop jobs from one location and track all their Hadoop processing activity from one place using JobServer's web based monitoring, reporting and tracking tools. JobServer runs along side your Hadoop environment and provides both developers and administrators with the ability to build, orchestrate, schedule and deploy Hadoop jobs from one central location.

Use JobServer to build, schedule and track a single Hadoop task or to compose multiple Hadoop tasks into more complex jobs and workflows. For example, a job can be composed of multiple Hadoop tasks where one task directs its output to the input of other Hadoop tasks in the job chain. JobServer makes it easy to centrally schedule, track and monitor all your Hadoop tasks and jobs from one place for real-time monitoring as well as for historical reporting and auditing. JobServer's workflow orchestration allows users to manage and manipulate files and content between your local storage system and your Hadoop HDFS. You can also build custom web GUIs (using GWT) for your Hadoop tasks to visualize all input and output content and show real-time status as the tasks are running or to report on final status and results after a Hadoop task has finished processing.

Start using JobServer today with your Hadoop environment and see all the benefits for yourself. Download and test drive JobServer 3.4 now and improve your Hadoop experience!

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.