Friday, February 17, 2012

JobServer Support on Mac OS X

Grand Logic is happy to announce the release of JobServer 3.4.4. For all those Apple fans, this release provides support for JobServer on Mac OS X. You can now install and deploy JobServer on your favorite Mac. This release includes minor bug fixes.

Download and test drive JobServer 3.4.4 now and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier, while using some of the best Java/AJAX and web/SOA open source technology available to developers.

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

Tuesday, February 14, 2012

Enterprise Job Scheduling for Big Data & Hadoop

Businesses of all sizes are looking beyond traditional business intelligence taking a more broader approach to BI that goes beyond the traditional data warehouse and operational database technologies of the past. With the explosion of social communication, mobile device data and many other forms of unstructured data coming into focus, businesses are now more interested than ever to ask questions about their data and their customers that they could not ask before.

Hadoop type solutions lets businesses build out this new BI 2.0 type architecture and begin to leverage their data and operations in new ways in order to ask questions that they could not have imagined possible in the past. Hadoop analytics lets businesses ask questions and build reporting solution that effectively leverage massive (yet commodity) processing power and manipulate terabytes of data that where not practical for the average enterprise to do before.

Hadoop provides a broad stack of solutions from cpu/compute clustering, parallel programming, distributed data management, advanced ETL and NoSQL type data management....etc. Hadoop is also moving quickly to build more advanced resource management to allow more efficient job flow processing on larger clusters for the bigger deployments that may have hundreds or thousands of nodes and need to run many jobs concurrently.

Hadoop comes with a few internal capacity type schedulers for managing internal cluster load and resource management, but these are strictly for internal cluster capacity scheduling between nodes and are not functional or calendar based job scheduling tools. Vanilla Hadoop distributions do not include often necessary features required by enterprises to manage and automate the full ecosystem and life-cycle of data processing typically needed by an enterprise to effectively support an end to end BI solution. In most cases an enterprise's IT group must build the necessary infrastructure to smoothly integrate Hadoop into their IT environment and avoid a lot of manual labor and impedance mismatches between their Hadoop operations and their traditional enterprise operations.

This is where JobServer, an enterprise job scheduler, comes into play. JobServer integrates with Hadoop at an enterprise IT level, letting analysts and IT administrators schedule and integrate their IT operations into the Hadoop stack. JobServer leverages a very open and flexible Java plugin API to let Java developers integrate their customizations tightly into JobServer and into Hadoop. Often times what is needed is high level job and workflow automation in order to schedule ETL processing from operational data stores in order to pump data into your Hadoop stack and to schedule jobs to run on regular interval based on business rules and business needs.

JobServer provides the job automation and job scheduling needed to accomplish this, plus it offers key features such as audit-trails to track what jobs where run, when, and edited by whom for example. JobServer, for example, can be used to coordinate and orchestrate a number of Hadoop job flows together into a larger job flow and then take the output and pump it back out into your enterprise reporting systems and enterprise data warehouses. JobServer provides a number of GUI reporting features to let enterprise users from programmers and IT staff to track what is going on in your Hadoop and IT environment and to be alerted quickly of problems.

If you need to tame your Hadoop operations and provide automated and tight integration with your existing IT environment, applications and reporting solutions, give JobServer a look. It can be a great asset to help you run your Big Data operations more efficiently. Visit the JobServer product website for more details.

Contact Grand Logic and see how we can help you make better sense of your Big Data environment. JobServer is also partnering with other Big Data solution providers and major distributions to provide complete Big Data solution for both your in house and cloud Hadoop deployments. Please contact Grand Logic for more information to see how our products can services can make your Hadoop deployment a success.

Tuesday, February 7, 2012

Native Multi-Tenant Hadoop - Big Data 2.0

For Hadoop to gain wider adoption and lower the barrier of entry to a broader audience it must become much more economical for businesses of all sizes to manage and operate a Hadoop processing cluster. Right now it takes a significant upfront investment in hardware and IT knowhow to provision the hardware and the necessary IT admin skills to configure and manage a full blown Hadoop cluster for any significant operation.

Cloud services like Amazon Elastic Map Reduce help reduce some of this but they can quickly become costly if you need to do seriously heavy processing and especially if you need to manage data in HDFS as opposed to constantly moving it between your HDFS cluster and S3 in order to shutdown datanodes to save cost as is the standard with Amazon EMR. Utilities like Whirr also help push the infrastructure management onto the EC2 cloud but again here for serious data processing this can quickly become cost prohibitive.

Operating short lived Hadoop clusters can be q useful option, but many organizations need long running processing and need to leverage HDFS for longer-term persistence as opposed to just a transient storage engine during the lifespan of MapReduce processing as is the case of Amazon EMR. For Hadoop, and Big Data in general, to make the next evolutionary leap for the boarder business world, we need a fully secure and multi-tenant Hadoop platform. In such as multi-tenant environment organizations can share clusters securely and manage the processing load in very controllable ways. And also allow each tenant to customize their Hadoop job flows and code in an isolated manner.

Hadoop already has various capacity management scheduling algorithms but what is needed is higher order resources management that can full isolate between different organizations for HDFS security and data processing purposes to support true multi-tenant capability. This will drive wider adoption within large organizations and by infrastructure services providers because it will increase the efficient utilization of unused CPU and storage just in same way that SaaS has allowed software to achieve greater economies of scale and services and democratize software for small and big organizations alike.

Native multi-tenant support in Hadoop will drastically reduce the upfront cost of rolling out a Hadoop environment and make the long-term costs much more cost effective and open the door for Hadoop and Big Data solutions to go mainstream in much the same way that Salesforce, for example, has created a rich ecosystem of solutions around business applications and CRM. This will also allow organizations to keep long-running environments and keep their data in HDFS for longer periods of time allowing them be more creative and spontaneous.