Cloud Analytics & ML with Sam Taha |

Thursday, July 25, 2013

JobServer 3.4.28 - Isolated JVM Containers

We are happy to announce the release of JobServer 3.4.28 which adds a number of new features for administrators along with supporting the latest version of Google Web Toolkit and expanded remote management APIs.

With this release, JobServer now supports expanded remote web services programatic APIs. Also included in this release is the capability to run distributed jobs under customizable Linux/Unix userspace accounts on a job by job basis, which gives administrators fined grained control over how they run their jobs. This allows users to run jobs inside isolated JVMs in a more granular fashion.

It has always been our focus to make JobServer the most developer and IT friendly scheduling and job processing platform on the planet. We are proud of our focus on taking customer and developer feedback to continuously make JobServer the best scheduling and job processing engine around. JobServer tames your job processing and scheduling environment in a way that is a joy for Java developers to customize upon while providing powerful web UI management and administration features for business users and IT operations administrators.

Download and test drive JobServer 3.4.28 today and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop consulting services that maximize your Big Data investment.

Saturday, June 1, 2013

Big Data is More Than Correlation and Causality

There is no discounting that the Big Data movement is getting a lot of attention from all avenues of business and technology. Large scale computing has been around for decades, since the days of super computers, and has been brought to the forefront of late by the high flying internet companies. This has been driven in part by significant advances in the availability of commodity hardware, open source distributed computing software, cloud computing, and virtualization among other things.

A lot of the debate as to the value and benefits of Big Data is largely centered around how it can benefit companies in analyzing large data sets to help them make marketing type decisions such as recommending what movie or product you should buy and thus improve the bottom line of these businesses. There are also other applications such as the analysis of vast volumes of sensor or transactional data in order to find patterns using machine learning. The possibilities for applying Big Data are abound for both analyzing structured and unstructured data in order to extract information and improve marketing and overall business decision making.

Correlation vs Causality
One common debate about Big Data is the effectiveness of the analytics applied in Big Data solutions, and whether it really can discover answers to questions or is it just better suited for correlations and not necessarily best suited in identifying precise causality. These debates are good discussion to have and in general I think Big Data can serve many purposes from finding correlations to solving very specific problems from a wide spectrum of data sources. The ability to extract value from Big Data is driven in part by the volume of data available and applying the right machine learning algorithms. However, I believe there is a much bigger value to be gained from the Big Data computing movement than just correlations or sifting through transactions to calculate some metric or finding a needle in a hay stack from petabytes of data.

Insights are not Enough
Extracting insights from vast volumes of structured and loosely structured data has many applications, but the ultimate application of this is enabling computing systems to make smart and intelligent decisions with less and less human involvement. This is what leads to lower costs and improved productivity and what has historically been part of the human evolution where it relates to technology. We have evolved over the decades to have machines do more work for us, so the smarter our machines get and the more autonomous they get the more we evolve as a technology driven society.

Automation and Intelligence
Ultimately Big Data can help us go beyond just a discussion around finding correlations or summarizing metrics to generate visually captivating reports. The ultimate benefit business can gain from Big Data is no different from what it has always been in the past with other computing and communications technology advances. It is about automation in its simplest form and in the most advanced form it is about enabling software and computers to power artificial intelligence to enable system autonomy. The smarter and more independent our systems are the more we advance and the more efficient business becomes. This drives getter productivity and effectiveness in all aspects of business. This will, for example, allow us to build power plants that run themselves much more efficiently, to build computers like IBM Watson that can make human like decisions, to automation software like Siri and Google Now that can understand what we want and deliver the right information to us at the exact time we need it. So Big Data is many things, but ultimately it will turn our computers and data into information that will automate all aspects of our lives and make business more efficient and productive.

The Time for Artificial Intelligence is Now
With advances in distributed computing, networking, and storage the time has come for AI to be at the heart of what of Big Data is all about. Big Data will allow AI to achieve the potential we have all dreamed it could be. AI has never achieved many of the scifi type capabilities we have all grown up watching on TV and in movies. Big Data will be what allows AI to achieve its full potential and this will make many things we only dreamt of possible.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop consulting services and that maximize your Big Data investment.

Tuesday, May 14, 2013

Hadoop the New 'T' in ETL

ETL is a common computing paradigm used in a variety of data movement and data management scenarios. As demand for more insight into business data as grown, ETL has been used to move more data from operational data stores into OLAP and data warehousing environments. This has expanded the need for analytics and other solutions that rely on data being reconstituted into easier to consume forms or data models more efficient to solve specific problems.

So nothing special going on here, but as data volumes have grown and sources of data have exploded, the transformation part of ETL (the "T") is becoming more of a challenge, especially as organizations demand more near real-time analytics and up to date information. Transforming the volumes of operational data is becoming a computing bottleneck and often limits what you can do with data after it has been transformed and loaded into downstream data marts. See a typical ETL data flow diagram below.

Big Data to the Rescue
With the evolution of big data and Hadoop, new tools have been brought to bear that can provide help in the overall ETL computing process. However, with Hadoop, the ETL model needs to be revisited. Hadoop can bring tremendous computing resources to more efficiently transform data into target models. While Hadoop can serve as part of your overall processing fabric and can be leverage directly for OLAP and itself be used for data warehousing (e.g. HBase data store), it can also serve as a intermediate staging area that can be used to populate traditional relational data marts.

Using Hadoop in this way allows it to be used as an intermediate store for data until it can later be transformed into target models. We can accomplish this "load first" approach using Hadoop, by changing the ETL model around a bit. Instead of extracting and transforming data first, we can instead extract and load data into Hadoop storage, for staging, and then take full advantage of the Hadoop compute infrastructure to transform (using Map Reduce, Impala, Drill…etc) the data into target models that can feed traditional relational data marts and OLAP engines. See diagram for example:

Hadoop for Transformation
This essentially allows organizations to use Hadoop as the transformation platform that allows developers to perform more complex transformations that were not practical in the normal ETL universe. So think of Hadoop as the new super charged "T" in the "ELT" paradigm, where data is moved as efficiently as possible from operational stores and loaded ("L") into HDFS (and HBASE or Cassandra) as fast as possible. Then the "T" can be performed within the Hadoop ecosystem. This allows Hadoop to be a powerful intermediary layer that can drive new analytics and allow existing analytics to keep up with the deluge of data. This also allows existing OLAP and data warehouses to continue to consume data out of Hadoop for existing analytics.

So let us start getting used to the concept of "ELT" as the new big data cousin of ETL. Hadoop is more than just a historical archive or dumping ground for unstructured data. It can be a powerful transform computing layer that can drive better data warehousing for new and existing analytics solutions.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop consulting services and that maximize your Big Data investment.

Friday, April 26, 2013

Data In and Out of the Hybrid Cloud

The continued adoption, and accleration of interest, of cloud computing has creating an interesting problem for the large enterprise. Most large enterprises have existing private network and intranets. As parts of the organization begin to adopt public and private clouds, there is a challenge of moving data in and out of these clouds.

Data movement is more challenging for a number of reasons including security and network accessability. For example, you can usually can't walk up to these clouds and load a tape or hard drive. And badwidth is often times more restricted than that between internal networks. Appliations you have running in your private network may not be able to talk directly to the cloude without going through web services or other new networking schemes.

Then there is the question of cost. When does a business decide it is time to bring back data from the cloud? There is a point where keeping certain data in the cloud could become cost prohibative. While the cost of cloude computing and storage is always going down and newer services popup up all the time (like Amazon Glacier for example), this issue is not going away.

Also for Big Data type computation and data storage, at what point is keeping your data running in an Amazone EMR or stored in S3 beceome prohibatively expensive? All these questions are important for organizations to understand as the adoption of cloud computing and Big Data analytics accelerate. There is no simple answer of ourse, but it is important for organizations to consider these questions with both from an IT and financal perspective.

Saturday, April 20, 2013

Databases are Cool Again

It is definitely interesting what is happening in the database space these days. It is good to see the NoSQL and NewSQL folks spark a fire under the traditional relational vendors. This is the only way to inspire innovation both in the commercial and open source space. To a large extended, the establishted relational player were slow to jump on the cloud and even to this day they are not moving as aggressively as they need to in order to reclaim the cloud database market from the NoSQL and NewSQL upstarts.

Fundamentally, the scale-out capabilities of traditional RDBMS engine still don't hit the sweet spot developers and cloud operations people need these days. I do expect the market will consolidate somewhat in the next several years as there are just too many players at the moment, especially on the NoSQL side. But I expect there to remain a large selection of NoSQL engines over time, as many of the NoSQL players target specialized areas so it is definitely not a one size fits all like it has historically been with relational database. For example, many of the NoSQL engines have made deliberate enginering trade-offs in their products such as in their storage models, consistency, replication, aggregation capabilities, and scale-out…etc. For example, if you need a NoSQL with strong aggregation functions you might choose MongoDB but if you need something that scales out writes and data center replication you might go with Cassandra. So, in the long-term I do not see a single NoSQL that can rule them all.

Tuesday, April 16, 2013

Cloud Job Scheduler on EC2

Grand Logic is pleased to announced a new release of our JobServer Cloud edition product. JobServer Cloud edition allows our customers to access and use the powerful job scheduling, job processing, workflow and SOA messaging features available in JobServer from a cloud environment. JobServer now supports deployments on Amazon EC2, allowing our customers to lower their IT costs and free themselves to focus on their core applications. If you are using EC2 to host your applications, and you are in need of a job scheduling and processing solution, then JobServer is a perfect choice.

JobServer Cloud delivers all the same great features and capabilities found in our core JobServer software and can now be hosted in the clouds. This frees customers from the IT burden of buying and maintaining hardware and installing and managing their own IT environment for JobServer. With the cloud edition of JobServer, you are freed from dealing with upgrades and managing maintenance tasks and dealing with hardware issues. We can add more job processing capacity, quickly, as you need it. You just need to focus on building and deploying your jobs and java Tasklets. We will ensure that your JobServer environment is well maintained, managed and backed up and running efficiently. We will alert you if we notice performance issues or slow running jobs and processes or if more capacity is needed. Daily and weekly reports are delivered that provide detailed job scheduling and processing statistics. Contact our support team to get setup with a fully managed instance of JobServer.

With our fully managed JobServer Cloud solution, there is no hardware or software to install! With JobServer Cloud, your environment will be deployed and run from the Amazon cloud with secure connectivity to your private network using Amazon VPC (Virtual Private Cloud). You can access your JobServer instance securely to run and manage your jobs and apps with full access to your private corporate network. Many customers also need their JobServer environment to connect and access local IT systems and services within their private corporate network. With Amazon's VPC (Virtual Private Cloud) we can securely bridge between a company’s existing IT infrastructure and your JobServer Cloud environment. Amazon VPC enables enterprises to connect their existing infrastructure to a set of isolated JobServer compute resources via a Virtual Private Network (VPN) connection, and to extend their existing management capabilities such as security services, firewalls, and intrusion detection systems to include their JobServer resources that are in the Amazon cloud.

If you want to install and manage your own JobServer instances, then download JobServer today and install on your EC2 environment to start scheduling and processing jobs. Start with one EC2 instance or scale JobServer to run on hundreds of EC2 instances. JobServer scales easily and effectively on EC2 to allow you to run thousands of jobs. JobServer can scale to meet your needs by taking full advantage of your Amazon cloud environment.

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

Saturday, March 9, 2013

Putting NoSQL in Perspective

Deciding between a NoSQL database or a relational database system is about understanding the trade-offs that led to the creation of NoSQL to begin with. NoSQL systems have advantages over traditional SQL databases because they give up certain RDBMS features in order to gain other performance, scalability and developer usability capabilities.

What NoSQL gives up (this varies by NoSQL engine):

Relationships between entities (like tables) are limited to non-existent. For example, you usually can't join tables or models together in a query. Traditional concepts like data normalization don't really apply. But you still must do proper modeling based on the capabilities of the particular NoSQL system. NoSQL data modeling varies by product and whether you are using a document vs column based NoSQL engine. For example, how you might model your data in MongoDB vs HBase varies because each solution offers significantly different capabilities.
Limited ACID transactions. The level of read consistency and atomic write/commit capabilities across one or more tables/entities varies by NoSQL engine.
No standard domain language like SQL for expressing ad-hoc queries. Each NoSQL has its own API and some of the NoSQL vendors have limited ad-hoc query capability.
Less structured and rigid data model. NoSQL typically forces/gives more responsibility at the application layer for the developer to "do the right thing" when it comes to data relationships and consistency. Think of NoSQL as a schema on read instead of the traditional schema on write.

What NoSQL offers:

Easier to shard and distribute the data across a cluster of servers. Partly because of lack of data relationships it is easier to shard and distribute data across a cluster and capacity more incrementally and horizontally. This can give much higher read/write scalability and fail-over capabilities, for example.
Can more easily deploy on cheaper commodity hardware (and in the cloud) and expand scalability more incrementally and economically.
Don't need as much up-front DBA type of support. But if your NoSQL gets big you will spend a lot of time doing admin work regardless.
NoSQL has a looser data model, so you can have sparser data sets and variable data sets organized in documents or name/value column sets. Data models are not as hard wired.
Schema migrations can be easier but puts burden on application layer (developer) to adjust to changes in the data model.
Depending on what type of application you are building, NoSQL can make getting started a little easier since you need less time planning for your data model. So for collecting high velocity and variable data, NoSQL can be great. But for modeling a complex ERP application it may not be such a great fit.

Like with most things, there is no sliver bullet here with NoSQL. There are many different products available these days and each has its own particular specialties and pros/cons. In general, you should think of NoSQL as a complementary data storage solution and not as a complete replacement of your relational/SQL systems, but this will depend on your applications and product functionality requirements. For example, how you use NoSQL in a analtyics environment vs a OLTP setting can greatly effect how you use NoSQL and which specific NoSQL engine you choose. Keep in mind you may end up using more than oen NoSQL product within your environment based the specific capabilities of each - this is not uncommon.

Relational databases are also evolving, for example, new hybrid NoSQL oriented storage engines are coming out based around MySQL. Also products like NuoDB and VoltDB (what some are calling NewSQL) are trying to evolve relational databases beyond the vertical scaling and legacy storage computing restrictions of the past, by using a fundamentally different architecture from the ground up. Keep you seat belts fastened, the database landscape has not been this innovative and fast moving in decades.

Tuesday, March 5, 2013

SQL and MPP: The Next Phase in Big Data

Over the past couple years we have all by now heard about the Big Data movement. Two key enablers in this remaking of analytics, data warehousing and general computing have been the NoSQL database technology movement and the emerging Hadoop compute stack. While not directly related to each other, both NoSQL and Hadoop have become associated with the rapidly accelerating Big Data revolution as more companies look to manage larger and larger data sets more effectively and economically. NoSQL has been the new kid on the block in the database space by attempting to take applications and data to the promised land of web-scale computing where traditional relational databases have fallen short. Over the past decade, SQL and relational database technology have failed to effectively keep up with developer needs and the scaling demands of a new generation of social and data heavy applications and this has opened the door to a different approach from the NoSQL camp. Hadoop is also in the same position by promising to deliver analytics and offline batch computing power not practical or cost effective before the emergence of HDFS and Map Reduce, which are for the most part, currently only found in expensive and proprietary analtyics products.

As with many proclaimed revolution such as Big Data, this is just the tip of the iceberg as they say. Like with most transformations in technology there is more to come as these technologies penetrate into more industries and gain wider adoption and broader acceptance by the open source community and the established heavy hitters. The next wave of Big Data technology will push the edges into other domains and go beyond the offline computing boundaries of HDFS and Map Reduce. While SQL and relational database centered analytics has taken a back seat lately because of the emergence NoSQL, SQL as a domain language will get an uplift with the emergence of the next Big Data wave as we move past the basic offline Map Reduce paradigm and look towards more real-time computing engines that can enable MPP (massively parallel processing) computing. This will allow IT organizations to continue to benefit from the low cost of commodity hardware and horizontal scaling benefits brought about by Map Reduce and HDFS and now generalized further for real-time analytics.

While NoSQL has established itself as a technology that is here to stay, the traditional relational database paradigm is not gone by any stretch and still provides an invaluable ad hoc query function to analysts and developers alike. NoSQL products like Cassandra, HBase and MongoDB (to mention a few) solve a unique problem and are becoming key foundations of any web-scale computing stack whether for online CRUD apps or for offline analtyics. But that does not eliminate the need or diminish the power of relational SQL engines and SQL as a powerful expressive domain language. NoSQL is not a silver bullet but can be a powerful complementary solution to traditional relational data storage models. The NoSQL folks have used the classic engineering trade-off where they have exchanged certain features found in relational databases to gain greater horizontal scalability. I will not get into the details of this but I do not want to over simplify what NoSQL has done. At the heart of the trade-off is eliminating relationships between data entities for the benefit of allowing for greater horizontal scalability. NoSQL also give the developer a more flexible "on-read" schema model that has its benefits.

So what does this all mean? Well, expect NoSQL and the the current 1.0 Hadoop stack to continue to mature and become more mainstream - that is no-brainer. But for the next phase I see SQL (for ad hoc querying) and real-time MPP becoming part of this Big Data fabric and this will bring back the ad hoc capabilities of relational database but now with the horizontal scaling and cost effectiveness found in HDFS and Map Reduce.

You can see this next phase is already happening by just observing all the commercial products rushing to extend their traditional analtyics engines to work on top of Hadoop and all the investment going into taking Hadoop beyond its current it offline Map Reduce roots. They very from open source next generation MPP platforms, to cloud providers offering analytics as a service, to traditional data warehouse vendors extending their products to run on time of Hadoop to next generation relational database start-ups. Here is a sample of some of the players and products to watch:

Hadoop 2.0 Players

Cloudera - Impala
Cloudera is leading the charge to create a next generation open source MPP platform that builds on the core components of Hadoop (HDFS, Zookeeper, MR...etc) to enable real-time analytics of Big Data. The initiative is open source but primarily driven (at least for now) by Cloudera. This is also partly a recognition that Map Reduce and tools like Hive are fine for certain offline analytics and processing but are not a complete solution for real-time reporting and analytics.

MapR - Apache Drill
This is a similar project to Impala but channeled through the Apache organization and primary driven by MapR (Cloudera Hadoop competitor).

Hadapt
Vertical solution for Hadoop for organizations wanting a more SQL friendly interface to their Hadoop data sources.

Datameer
Another Hadoop vertical player that is trying to make analytics and reporting easier for the Hadoop stack.

Cloud Players

Google - Big Query
This is Google's cloud services that is a combination of a distributed data store coupled with a powerful SQL like ad hoc query engine (based on the Dremel language).

Amazon - RedShift
Amazon service to help businesses more economically build data warehouses in the clouds with ad hoc SQL query interface. Partially based on technology from ParAccel.

Old School Players

IBM - Netezza
While traditionally focused on enterprise data warehousing, IBM is evolving their stack to fit and play nice with Hadoop and other Big Data solutions.

HP - Vertica
HP's Big Data play. Like IBM and Terradata, HP acquired their way into the Big Data space.

Teradata - Aster Data
Teradata is a true old school player in the Big Data space when the world only centered around relational databases. Their acquisition of Aster Data changed that.

Next Generation SQL Players to Watch

NuoDB
NuoDB is the new kid on the block promising a new way to scale and build relational databases in the cloud. Their approach is more or less based on a peer to peer model that allows them to scale out (as they claim) while still delivering on the traditional capabilities of relational database such as read consistency and ACID transactions. While NuoDB is more focused on OLTP type processing its claim that it can scale horizontally while supporting a SQL relational model makes it potentially powerful for real-time analytics as well.

VoltDB
Another new age relational database engine that delivers horizontal scaling yet retaining SQL capabilities. Differs from NuoDB by taking a caching approach to meet scaling challenge.

For the next wave of Big Data innovation, the landscape is rapidly changing with both old and new industry players getting into the game. Big Data will no longer be limited to offline and long latency based analytics processing. The lines between OLTP, OLAP and Enterprise Data Warehousing are blurring as offline computing, real-time analytics and data storage models evolve and converge. Expect better technology options and improved cloud scalability at lower price of ownership as the competition heats up and the next evolution of Big Data matures. Pick a horse and run with it. Stay tuned.

Wednesday, February 27, 2013

Grand Logic a Featured Cloudera Partner

We are proud to be selected by Cloudera as part of their featured partner list and to join the largest and fastest growing Apache Hadoop ecosystem. Grand Logic has been a strong proponent of Apache Hadoop and the potential of Big Data computing.

Grand Logic has integrated support for Hadoop and other Big Data technologies into our flagship product, JobServer. This has brought enterprise job processing and automation to Big Data computing and converged traditional business automation and job processing with Big Data analytics and computing. There are no islands or barriers here. JobServer allows enterprises of all sizes from startups to Fortune 500 to converge their back office businesses processing, SOA assets, and Big Data computing infrastructure such as MapReduce, Big Query, Hive queries, Impala queries, Pig…etc, all under one job scheduling management platform.

Download and test drive JobServer today and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs (Hadoop jobs, SOA jobs, ETL jobs, BigQuery jobs, Hive Jobs…etc) and backed automated services easier.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop consulting services and that maximize your Big Data investment.

Tuesday, December 18, 2012

Turning Scripts into GUI Web Applications

How would you like to be able to turn any Linux bash script, command line program or Windows batch script into a GUI driven web application that any business user can invoke and manage from a GUI interface? And all in just a few clicks. Well, JobServer provides some great features to make this happen. JobServer, allows you to embed or associate any script or command line program to a Tasklet that can then be run in a server-see job. And the job can be configured to be invoked and run by any business user from JobServer's web UIs. The script can be manually launched by the user (and customized on the fly) from the GUI or scheduled to run at later time or frequency.

For example, an IT administrator or developer can use JobServer to embed their Linux script into JobServer and then expose it as a user interface based web application for any business user to manually run, schedule, monitor and track. The administrator can easily customize and parameterize the job/tasklet to allow custom input parameters that the business user can pass in directly from the web GUI.

Give JobServer a try, and turn any Linux or Windows script or command line program into a GUI application that be run and tracked by business users. What is also great about this is that the developer or IT administrator has full control over the scripts being run and can throttle capacity and disable/enable the scripts at any time. And they can track who has been running the scripts/jobs. All this with just a few clicks and some cut and past you can turn your scripts into GUI web applications!

Download and test drive JobServer now and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop consulting services and that maximize your Big Data investment.

Friday, December 7, 2012

JobServer 3.4.14 for Oracle RAC

We are happy to announce the release of JobServer 3.4.14 which brings expanded enterprise features to JobServer's scheduling engine and job processing platform. JobServer has always been designed from the ground up for massive job scheduling and processing scalability while being highly resilient in the face of hardware, network and database interruptions. Reliable, repeatable, reportable, and measurable job scheduling, processing and management has always been the centerpiece of our focus with the JobServer platform.

With this release, JobServer now supports Oracle RAC 11g and allows for hot failover at the database layer. By enabling Oracle SCAN configuration, JobServer can leverage Oracle RAC's dynamic failover and database routing capability allowing JobServer to continue to access critical database data and transactions during critical job scheduling and job processing functions.

"We are excited about our Oracle RAC support in JobServer as this brings another level of enterprise fault tolerance into the JobServer Platform". JobServer tames your job processing and scheduling environment in a way that is a joy for Java developers to develop and customize upon while providing powerful management and administration features for business users and IT operations administrators.

Download and test drive JobServer 3.4.14 now and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop consulting services that maximize your Big Data investment.

Wednesday, September 26, 2012

BigQuery: Data Warehouse in the Clouds

There are a lot of changes occurring these days with the Big Data revolution such as cloud computing, NoSQL, Columnar stores, and virtualization just to mention a few of the fast moving technologies that are transforming how we manage our data and run our IT operations. Big Data, powered by technologies such as Hadoop and NoSQL, is changing how many enterprises manage their data warehousing and scale their analytics reporting. Storing terabytes of data, and even petabytes, is now in the reach of any enterprise that can afford to spend the money on potentially hundreds or thousands of commodity cores and disks to run parallel and distributed processing engines like MapReduce for instance. But is Hadoop the right fit for everyone? Are their alternatives, especially for those that want more reat-time big data analytics? Read on.

A Little Background on Hadoop

With Hadoop and many related types of large distributed clustered systems, managing hundreds if not thousands of cpus, cores and disks is a serious system administration challenge for any enterprise big or small. Cloud based Hadoop engines like Amazon EMR and Google Hadoop make this a little easier, but these cloud solutions are not ideal for typical long-running data analytics because of the time it takes to setup the virtual instances and spray the data out of S3 and into the virtual data nodes. And then you have to tear down everything after you are done with your MapReduce/HDFS instances to avoid paying big dollars for long running VMs. Not to mention you have to copy your data back out of HDFS and back into S3 before your ephemeral data nodes are shutdown - not ideal for any serious Big Data analtyics.

Then there is the fact that Hadoop and MapReduce are batch oriented and thus not ideal for real-time analytics. So while we have taken many steps forward in technology evolution, the system administration challenges in managing large Hadoop clusters, for example, is still a problem and cloud based Hadoop has many limitations and restrictions as already mentioned. In its current form, cloud based Hadoop solutions are too expensive for long running cluster processing and not ideal for long-term distributed data storage. Not to mention the fact that virtualization and Hadoop are not a great fit just yet given the current state of virtualization and public cloud hardware and software technology - this is a separate discussion.

The BigQuery Alternative

So if I want to build a serious enterprise scale Big Data Warehouse it sounds like I have to build it myself and manage it on my own. Now, enter into the picture Google BigQuery and Dremel. BigQuery is a serious game changer in a number of ways. First it truly pushes big data into the clouds and even more importantly it pushes the system administration of the cluster (basically a multi-tenant Google super cluster) into the clouds and leaves this type of admin work to people (like Google) that are very good at this sort of thing. Second it is truly multi-tenant from the ground up, so efficient utilization of system resources is greatly improved, something Hadoop is currently weak at.

Put your Data Warehouse in the Cloud

So now given all this, what if you could build your data warehouse and analytics engine in the clouds with BigQuery? BigQuery gives you massive data storage to house your data sets and powerful SQL like language called Dremel for building your analytics and reports. Think of BigQuery as one of your datamarts where you can store both fast and slow changing dimensions of your data warehouse in BigQuery's cloud storage tables. Then using Dremel you can build near real-time and complex analytical queries and run all this against terabytes of data. And all of this is available to you without buying or managing any Big Data hardware clusters!

Modeling Your Data

In a classical Data Warehouse (DW), you organize your schema around a set of fact tables and dimension tables using some sort of snowflake schema or perhaps a simplified star schema. This is what is typically done for RDBMS based data warehouses. But for anyone who has worked with HDFS, HBase and other columnar or NoSQL data stores, this relational model of a DW no longer applies. Modeling a DW in a NoSQL or columnar data store requires a different approach. And this is what is needed when modeling your DW in BigQuery's data tables.

Slow Changing Dimensions

Slow Changing Dimensions (SCD) are straight forward to implement with a BigQuery data warehouse. Since typically in a SCD model you are inserting new records each time into your DW. SCD models are common when you are creating periodic fixed point in time snapshots from your operational data stores. For example, quarterly sales data is always inserted into the DW tables with some kind of time stamp or date dimension. With a BigQuery data store you would put each record into each BigQuery table with a date/time stamp. So your ETL would like something like this:

Nothing special here with this ETL diagram other than the data is moving between your enterprise to the Google Cloud. The output ETL is directed to BigQuery for storage in one or more BigQuery tables (note this can be staged via Google Cloud Storage). But now keep in mind that when creating a Big Data Warehouse, you are typically storing your data in a NoSQL, Columnar or HDFS type data store and thus you don't have a full RDMBS and all the related SQL join capability, so typically you must design your schemas to be much more denormalized than what is normally done in a DW. But BigQuery is a hybrid type data store so it does allow for joins and provides rich aggregate functions. How you model the time dimension is of particular importance - more on this later. So your schema for a SCD table might look like something like this:

Key(s)... | Columns... | EffectiveDate

The time dimension in this case is directly collapsed into what would normally be your fact table and you would want, as much as possible, to denormalize the tables so your queries require minimal joins. As noted Dremel allows for joins but requires that at least one of the tables in the join be "small". Where small means less than 8MB of compressed data.

So now in Dremel's SQL language to select a specific record, for a particular point in time, you would simply perform a normal looking SQL statement such as this:

SELECT Column1 FROM MyTable WHERE EffectiveDate=DATE_OF_INTEREST

This query will select a record at a known date. With this approach, you can for example query for sales quarterly data where you know the records must exist for that particular date. But what if you want the most "current" record at any given point in time? This is actually something Dremel and BigQuery excel at, because it gives you SQL functionality, such as subselects, that are not typically found in NoSQL type storage engines. The query would look like this:

SELECT Column1 FROM MyTable WHERE EffectiveDate = (SELECT EffectiveDate FROM MyTable WHERE EffectiveDate <= EffectiveDate)

This query can sometimes be considered bad practice in a standard RDBMS (especially for very large tables), because of performance considerations of the subselect. However, with Dremel, this is not a problem given the way Dremel queries scale out and the fact that they do not rely on indexes.

Fast Changing Dimensions

Fast Changing Dimensions (FCD) require a bit more effort to create in a typical DW, and this is no different with BiqQuery. In a FCD, you are often capturing frequent or near real-time changes from your operational data stores and through your ETL moving the new data into your DW. Your ETL engine must normally pay mind to when to insert a new fact or time dimension record and it often involves "terminating" the previously current record in the linage of a record history set. But buy leveraging the power of Dremel, FCD can be supported in BigQuery by just inserting a new record when the on-premises ETL engine detects a change, without terminating existing current records. And because you can perform the effective date based sub select, noted above, there is now no reason to maintain an effective/termination date fields for each record. You only need the effective date.

This makes the FCD schema model, stored in BigQuery, identical to the SCD model for managing the time dimension, however there is a catch. The ETL process must maintain a "Staging DW" of the records that exist on the BigQuery side. This Staging DW only holds the most current records of your table that exists in BigQuery, so this keeps it lean and it will not grow larger over time.

So with this model your ETL will only send changes to the Google Cloud. This overall approach for FCD is useful for modeling ERP type data, for example, where records have effective and termination dates and where tracking changes is critical. Here is a diagram of the FCD ETL flow:

Note, for the case of FCD model that is non ERP centric (data model does not depend on effective/termination date semantics), the Staging DW will not be required. This is typically the case when you are just blasting high volume loosely structured data into BigQuery, such as logs events or other timestamped action/event data. In this case, you don't have to detect changes and can just send the data to BigQuery for storage as it comes in.

Put your Data Warehouse in the Cloud

At Grand Logic we offer a powerful new way to build and augment your internal data warehouse with a BigQuery datamart in the Google cloud. Leveraging our real-time and batch capable ETL engines we can move your fast or slow moving dimensional data into unlimited capacity BigQuery tables and allow you to run real-time SQL Dremel queries for rich reporting that will scale. And do all this with little upfront costs and infrastructure compared to managing your own HDFS and HBase cluster in Hadoop, for example.

With our flagship automation engine and ETL engine, JobServer, we can help you build a powerful data warehouse in the Google cloud with rich analytics with little upfront investment that will scale to massive levels. Pay as you go with full control over your data and your reporting.

Stay tuned to this blog for more details on how Grand Logic can help you build your Data Warehouse in the clouds. We will be discussing more details of our JobServer product and how our consulting services can get you going with BigQuery.

Contact us to learn how our JobServer product can help you scale your ETL and Data Warehousing into the cloud.

Tuesday, September 18, 2012

The Big Data Evolution Will Continue - No Kidding

Big Data is very much about discovering information locked in your mountains of data that come out of your production center, IT operations, enterprise systems, and back office databases. Information is all in the eye of the beholder so one person's junk is another person's gold. These days with the volumes of social data and device data growing at astronomical levels there is a lot of data to sift through and make sense out of.

While it is true that the more data you can capture the more possible information to discover there is a limit to this. I think we are going through a cycle where capturing and trying to make sense out of vast volumes of data (social data, sensor data....etc) is becoming more economical and somewhat mainstream with respect to technology and tools. However, this is a cyclic I believe, at some point business will realize that maybe they are getting diminishing returns on all this data they are capturing and storing. For example, do I really care what I tweeted 20 years ago (20 years from now). I probably will never have the time to go back and look at it and I am not sure it is valuable to any marketing person (but who knows).

There is definitely gold to be mined in many data sets that now go untapped and technologies like Hadoop, BigQuery, Storm to name a few are good tools to use but not everything fits into the Big Data tent either.

There has been a lot of hype around Big Data these days and I see a lot of people trying to fit problems that really have no reason being shoehorned into Hadoop, other than it being the cool thing to do. You could do the data crunching in easier ways for example. However, the tool sets are expanding to give developers, scientist and business people more options when deciding how to store and analyze their data.

When thinking of Big Data first ask yourself the following question:

1) How much data do I want to capture and store (do you need to persist detailed records/data?)
2) How fast is this data being created (velocity).
3) How long do I want to keep it (forever?).
4) How long am I willing to wait to get "information" when I run my analysis (batch/hourly/daily or real-time).
5) What will cost me to keep all this data around and do I have the system admin muscle to do this?

This might help you determine in which of the particular emerging Big Data technology buckets your problem best fits and which approach to take (cloud cluster, on-premises cluster...etc).

Sunday, July 8, 2012

Big Data Automation in the Cloud

Grand Logic is happy to announce expanded support for cloud analytics and big data automation services through our flagship product, JobServer. With JobServer, enterprises of all sizes, from startups to Fortune 100 companies can leverage the power of the cloud to tap the full potential of cloud based Big Data computing and analytics processing.

With solutions such as Amazon EMR and Google BigQuery growing in adoption and becoming economically advantageous, business now more than ever need to automate the flow of data between their enterprise storage systems and the cloud. Moving data and information between corporate intranets and the cloud is vital for efficient cloud based Big Data processing.

JobServer's point and click automation and scheduling tools are ideal for centrally managing the flow a data between your Big Data cloud systems such as Amazon EMR and Google BigQuery. JobServer can manage the flow of data to orchestrate the loading and retrieval of data between your Big Data processing systems in the cloud while tracking all your Big Data job processing jobs to give you one place to see everything that is happening in your Hadoop or BiqQuery analytics processing.

In a typical deployment, JobServer sits on your corporate intranet and can load and move data between your in-house storage systems into the cloud for efficient processing then track all Big Data job processing activity to return the necessary critical data and results back in-house or to move it around in the cloud (for example, move data into and out of S3...etc). Alternatively, JobServer can also be easily deployed on the Amazon EC2 or Google Compute Engine instances and run in the cloud. There are multiple topologies possible based on your business operations.

JobServer comes with a built-in and open source plugin API that makes it easy to script and customize Amazon or Google web services apis and create custom tasks and jobs using Java, web services, GWT and python/ruby/bash scripts. For example, you can create complex map reduce jobs in JobServer and get notified when processing is completed and be alerted of any issues at every stage of processing. JobServer lets you also schedule and track detailed realtime and historical reports on all job processing activities whether you are running a Hadoop job, loading a table into the cloud, pulling data back out of BigQuery temp tables, or tracking the progress of BiqQuery batch processing jobs.

JobServer gives you central control over any automation task you want to perform in the cloud or between activities happening in the cloud and your local enterprise storage and database systems. Try JobServer today and see how you will wonder how you operated without it.

About Grand Logic
Grand Logic delivers software solutions that automate business processes and tame your Big Data operations. Grand Logic delivers automation software and Hadoop consulting services that maximize your Big Data investment.

Friday, February 17, 2012

JobServer Support on Mac OS X

Grand Logic is happy to announce the release of JobServer 3.4.4. For all those Apple fans, this release provides support for JobServer on Mac OS X. You can now install and deploy JobServer on your favorite Mac. This release includes minor bug fixes.

Download and test drive JobServer 3.4.4 now and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier, while using some of the best Java/AJAX and web/SOA open source technology available to developers.

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

Tuesday, February 14, 2012

Enterprise Job Scheduling for Big Data & Hadoop

Businesses of all sizes are looking beyond traditional business intelligence taking a more broader approach to BI that goes beyond the traditional data warehouse and operational database technologies of the past. With the explosion of social communication, mobile device data and many other forms of unstructured data coming into focus, businesses are now more interested than ever to ask questions about their data and their customers that they could not ask before.

Hadoop type solutions lets businesses build out this new BI 2.0 type architecture and begin to leverage their data and operations in new ways in order to ask questions that they could not have imagined possible in the past. Hadoop analytics lets businesses ask questions and build reporting solution that effectively leverage massive (yet commodity) processing power and manipulate terabytes of data that where not practical for the average enterprise to do before.

Hadoop provides a broad stack of solutions from cpu/compute clustering, parallel programming, distributed data management, advanced ETL and NoSQL type data management....etc. Hadoop is also moving quickly to build more advanced resource management to allow more efficient job flow processing on larger clusters for the bigger deployments that may have hundreds or thousands of nodes and need to run many jobs concurrently.

Hadoop comes with a few internal capacity type schedulers for managing internal cluster load and resource management, but these are strictly for internal cluster capacity scheduling between nodes and are not functional or calendar based job scheduling tools. Vanilla Hadoop distributions do not include often necessary features required by enterprises to manage and automate the full ecosystem and life-cycle of data processing typically needed by an enterprise to effectively support an end to end BI solution. In most cases an enterprise's IT group must build the necessary infrastructure to smoothly integrate Hadoop into their IT environment and avoid a lot of manual labor and impedance mismatches between their Hadoop operations and their traditional enterprise operations.

This is where JobServer, an enterprise job scheduler, comes into play. JobServer integrates with Hadoop at an enterprise IT level, letting analysts and IT administrators schedule and integrate their IT operations into the Hadoop stack. JobServer leverages a very open and flexible Java plugin API to let Java developers integrate their customizations tightly into JobServer and into Hadoop. Often times what is needed is high level job and workflow automation in order to schedule ETL processing from operational data stores in order to pump data into your Hadoop stack and to schedule jobs to run on regular interval based on business rules and business needs.

JobServer provides the job automation and job scheduling needed to accomplish this, plus it offers key features such as audit-trails to track what jobs where run, when, and edited by whom for example. JobServer, for example, can be used to coordinate and orchestrate a number of Hadoop job flows together into a larger job flow and then take the output and pump it back out into your enterprise reporting systems and enterprise data warehouses. JobServer provides a number of GUI reporting features to let enterprise users from programmers and IT staff to track what is going on in your Hadoop and IT environment and to be alerted quickly of problems.

If you need to tame your Hadoop operations and provide automated and tight integration with your existing IT environment, applications and reporting solutions, give JobServer a look. It can be a great asset to help you run your Big Data operations more efficiently. Visit the JobServer product website for more details.

Contact Grand Logic and see how we can help you make better sense of your Big Data environment. JobServer is also partnering with other Big Data solution providers and major distributions to provide complete Big Data solution for both your in house and cloud Hadoop deployments. Please contact Grand Logic for more information to see how our products can services can make your Hadoop deployment a success.

Tuesday, February 7, 2012

Native Multi-Tenant Hadoop - Big Data 2.0

For Hadoop to gain wider adoption and lower the barrier of entry to a broader audience it must become much more economical for businesses of all sizes to manage and operate a Hadoop processing cluster. Right now it takes a significant upfront investment in hardware and IT knowhow to provision the hardware and the necessary IT admin skills to configure and manage a full blown Hadoop cluster for any significant operation.

Cloud services like Amazon Elastic Map Reduce help reduce some of this but they can quickly become costly if you need to do seriously heavy processing and especially if you need to manage data in HDFS as opposed to constantly moving it between your HDFS cluster and S3 in order to shutdown datanodes to save cost as is the standard with Amazon EMR. Utilities like Whirr also help push the infrastructure management onto the EC2 cloud but again here for serious data processing this can quickly become cost prohibitive.

Operating short lived Hadoop clusters can be q useful option, but many organizations need long running processing and need to leverage HDFS for longer-term persistence as opposed to just a transient storage engine during the lifespan of MapReduce processing as is the case of Amazon EMR. For Hadoop, and Big Data in general, to make the next evolutionary leap for the boarder business world, we need a fully secure and multi-tenant Hadoop platform. In such as multi-tenant environment organizations can share clusters securely and manage the processing load in very controllable ways. And also allow each tenant to customize their Hadoop job flows and code in an isolated manner.

Hadoop already has various capacity management scheduling algorithms but what is needed is higher order resources management that can full isolate between different organizations for HDFS security and data processing purposes to support true multi-tenant capability. This will drive wider adoption within large organizations and by infrastructure services providers because it will increase the efficient utilization of unused CPU and storage just in same way that SaaS has allowed software to achieve greater economies of scale and services and democratize software for small and big organizations alike.

Native multi-tenant support in Hadoop will drastically reduce the upfront cost of rolling out a Hadoop environment and make the long-term costs much more cost effective and open the door for Hadoop and Big Data solutions to go mainstream in much the same way that Salesforce, for example, has created a rich ecosystem of solutions around business applications and CRM. This will also allow organizations to keep long-running environments and keep their data in HDFS for longer periods of time allowing them be more creative and spontaneous.

Thursday, January 12, 2012

End to End Big Data Solution

Grand Logic announces end to end Big Data solution. Our flag ship product, JobServer, and its supporting open source SDKs provide a superior platform for taking your raw data and creating business solutions that will drive ROI and deliver on the promise of Hadoop.

Hadoop is a great solution, but alone it is an island of data processing, algorithms and open source tools. JobServer integrates Hadoop into your enterprise to automate the flow of data and manage ETL processing to efficiently organize and track your Hadoop processing. Then it delivers rich visualization for your Hadoop results to allow you to maximize your business objectives with Big Data. Whether you are targeting mobile, tablets or desktop/web devices, JobServer's powerful GWT based SDK can deliver a rich user experience and visualization for your reports and applications.

All this allows you to manage, monitor and track your Hadoop processing to deliver the control and central management you need to empower your developers and business analysts. JobServer with Hadoop allows you to acquire your data, process it and then visualize it. See this architecture diagram of our end to end JobServer/Hadoop solution stack.

Contact Grand Logic and see how we can help you make better sense of your Big Data environment. JobServer is also partnering with other Big Data solution providers and major distributions to provide complete Big Data solution for both your in house and cloud Hadoop deployments. Please contact Grand Logic for more information to see how our products can services can make your Hadoop deployment a success.

Friday, December 30, 2011

Big Data Predictions for 2012

Well 2011 has been a great year for Hadoop and its supporting ecosystem. There is a growing base of sub projects evolving to fill the many niches in and around Hadoop and there are companies coming out of the wood work to claim their piece of the pie. Not to mention the VC money pouring into Big Data related startups and many established tech players changing their business plans to account for Hadoop. So what can we expect in 2012?

Here are seven predictions for what might be in store for Big Data in 2012:

1) Going Mainstream
Discovering all of what you can do with Big Data analytics in the enterprise is only in its infancy. Right now solutions like Hadoop are the secret weapon of the rich and social who can afford the investment in time, resources and infrastructure. Companies like Facebook and Twitter are using solutions like Hadoop to do things not possible before with traditional relational BI and analytics solutions. We will see in 2012 the window widening with more traditional enterprises seeing the potential benefits that Hadoop analytics can offer. We will see more companies in various industries look to leverage Hadoop to ask questions about their operations and customers not possible before. Look for Hadoop to go more mainstream and loose some of that exoticness that currently relegates it only to the big boys.

2) Put it in the Cloud
The barrier of entry is lowering for Hadoop with players like Amazon offering low cost of entry platforms for initial Hadoop deployments. In 2012 we will see continued acceptance of using the Cloud as the infrastructure of choice for deploying your Hadoop. With Amazon and others improving virtual private network services it will make integrating private Cloud solutions for Hadoop more palatable for security conscious enterprises. Cloud will be the target platform of choice for Hadoop in 2012. This will also open the door for smaller enterprises to dip their toe into Hadoop to discover what they have been missing in their volumes of consumer and operational data warehouses.

3) Automation and Integration
Right now most Hadoop deployments are islands of data and processing infrastructure. In the coming year we will see more tech companies begin to offer better tools to enable businesses to tie their back office data stores and data warehouses with their Hadoop environments in a more seamless fashion. Efficiently moving customer and business data out of traditional data stores such as relational databases, and processed and prepared for Hadoop consumption will be critical for successful Hadoop deployments. We anticipate a new category of ETL that will be focused on the management of data movement in and out of Hadoop and HDFS. This will gain more traction in 2012. There are already Hadoop projects focusing on related areas and we will see more Hadoop type connectors popping up from traditional software vendors eager to get their products integrated with Hadoop.

4) Analytics and Visualization
Traditional BI reporting tools are not geared well toward the type of output generated from Big Data type environments. A new breed of reporting tools and analytics solutions will emerge to better consume the output coming out of Big Data systems. Look for many traditional BI vendors to begin to tailor their front-end reporting solutions to fit with Hadoop and distributed data stores including NoSQL type of data stores. But much of what traditional BI vendors will offer will not be a natural fit since most of the BI vendors and their tools are more comfortable dealing with highly structured data. Also as business analytics in companies start to get a taste of the kind of problems that can now be solved with Big Data, that were not possible before, they will begin to think of new problems to ask that will drive the need for more visualization and reporting of the data coming out of Big Data. So keep an eye out for startups and tech companies offering Big Data native analytics solutions tailored from the ground up for visualizing the statistical kinds of data coming out of Hadoop. Turning statistical questions, common when dealing with Big Data, into visual reports that can be understood by business users will be a big leap forward to turning the raw data in many enterprises into meaningful value and actionable results.

5) Going Mobile
We will see in 2012 apps and solutions that allow business users to get a glimpse of their Hadoop operations and resulting output presented on mobile and tablet devices. This one is not a big stretch considering the growing proliferation of mobile computing. But look for Hadoop to get a bit more mobile in 2012. Visualization of BI on mobile is natural trend and Big Data is no exception.

6) Going Vertical and Healthcare
Healthcare is the perennial elephant in the room when it comes to needing operational efficiency improvements and managing exploding volume of patient data (not to mention making sense of patient data). From both the billing dimension and the diagnostic patient data aspect, healthcare will benefit greatly from the type of problems that Big Data can solve. In 2012 we will see healthcare providers and healthcare IT companies begin to seriously invest in Big Data to help them solve problems not possible before with traditional healthcare IT. Look for healthcare providers to tap Hadoop to better understand their patients inorder to deal with the volumes of digital patient data and to help them deal with government regulations and compliance.

7) Real-Time Big Data?
This might be a stretch, but look for some early signs of various tech player looking to deliver more real-time business solutions around Big Data. Hadoop brings tremendous processing power to bear to solve problems that were not practical before. With computing power growing and virtualization easer to manage and deploy, look for business users to demand Big Data type problems to be solved in more near real-time situations. This will open the door for even more interesting applications of Big Data for business and even end consumers.

Let's regroup in twelve months and see how well these predictions panned out :)

Wednesday, December 21, 2011

Big Data is more than Map/Reduce

Companies of all sizes are looking for ways to make sense of their unstructured data. Data is growing at tremendous volumes and keeping track of it is becoming more challenging and expensive. Enter solutions such as Hadoop that allow you to make sense of this data using a highly distributed architecture that is based on horizontal scaling among other things.

Products like Hadoop are a critical layer of the solution, but they solve only one part of the overall tapestry needed to make sense of your Big Data. There are two key areas that are critical to your overall Big Data solution.

Imagine your Hadoop environment sitting in the center. Data must obviously be fed into it and data will flow out. An important aspect of a successful Hadoop deployment is managing these inputs and output points. Efficient management of your data as it comes in raw and then leaves as much more easily to consume information is key to having a successful Big Data environment.

Data In - Preparing your Big Data
First, you need to prepare your data for processing by Hadoop. Unstructured data must be prepared and loaded and sometimes integrated with relational data from enterprise database sources, for example. All this must be automated and fed into your Hadoop environment. This is a critical step especially as companies get into more real-time Big Data processing. The need for real-time analysis is becoming more critical with the explosion of social information and online commerce.

This requires automation to feed and prepare your Hadoop environment to minimize manual labor and many potentially error prone steps. Having this fully automated with as minimal human intervention is critical. Solutions such as JobServer, make this automation much more manageable. JobServer is ideal for integrating with your back office databases and can meld well with your IT environment using solutions such as SOA, Mule and ETL just as a few examples. JobServer can be used to centralize the logic and management for this preparation work so that steps like loading data into your HDFS are all automated easy to monitor and track.

Data Out - Data Analytics
As large amounts information are extracted out of solutions like Hadoop, visualizing the results is critical. Here to is where JobServer and its soafaces developer API can step in to address this challenge. soafaces is based on an open source API that allows for building rich reporting and analytic solutions that can capture data as it is coming out of your Hadoop environment and visualizing it in an easy to manage way. soafaces is based on the Google GWT framework for GUI development and can support rich web-based reporting and graphing technologies to provide for rich visualization of your results. JobServer also can be leveraged here to show, in a very organized manner, the results of each Hadoop job run and can be used to provide access to the final results to the right people in your organization through web reports, spreadsheets and email alerts...etc.

Contact Grand Logic and see how we can help you make better sense of your Big Data environment. JobServer is also partnering with other Big Data solution providers and major distributions to provide complete Big Data solution for both your in-house and cloud Hadoop deployments.

Please contact Grand Logic for more information.

Wednesday, December 7, 2011

Tame your Hadoop - Hadoop Professional Services

Grand Logic is pleased to announce our expanded consulting services specializing in Hadoop solutions. Hadoop is quickly becoming the tool of choice for Big Data analytics and our expertise with Hadoop and applying our tools like JobServer (job workflow/scheduling engine) and soafaces (open source framework) enable us to build complete enterprise solutions around Hadoop for our customers.

Hadoop comes with many great supporting modules but needs additional tools and features to make it easy to manage and organize all the activity and content around your Hadoop operations. With our consulting services we can quickly come in and build a management and integration layer to help you automate and manage your Hadoop deployment. This will allow you to tie your back office data and IT systems to feed data to your Hadoop operations to automate and streamline data movement between your business data and your Hadoop analytics. This is vital to having a successful ROI for Hadoop. If you can't efficiently feed data into Hadoop and extract it (and visualize it) your Hadoop number crunching will be for not. With JobServer as part of your Hadoop environment and our expertise and professional services you will be able to:

Effectively build custom ETL processing between your back office data and your Hadoop data stores
Build, package and reuse custom logic and server-side tasks for managing and editing your Hadoop jobs
Compose complex Hadoop workflows from multiple simpler Hadoop jobs (and non-Hadoop jobs) to build support for rich scenarios where data is moved between HDFS and local storage and between multiple Hadoop jobs.
Integrate easily with modules like Cascading and Pig.
Security on job by job basis to restrict all aspects of job configuration, monitoring, reporting and execution on user by user basis.
Application permissions - control what tools are available to which users.
Detailed alerting (via email or sms) to report on status and failures by jobs and by job groups.
Organize jobs into groups and partitions for user organization resource management.
Powerful job scheduling - any scheduling pattern you can think of for running your Hadoop jobs can be be built by our consulting team.
Easily create custom reports per Hadoop job. Using the soafaces framework we can build custom reports that let you view the status and results of each of your Hadoop jobs.
We are experts with soafaces and GWT and can build rich visualization for your Hadoop generated data. And we can help you visualize this on web, mobile and tablets devices to develop rich analytics that can be consumed by decision makers in your company.

So if you want to tame your Hadoop environment, Grand Logic has the technical expertise and tool chest of tools and frameworks to get you going and organized. Contact us today to see what we can do for you.

Thursday, December 1, 2011

Hadoop Meets JobServer - Big Data Job Scheduling

Grand Logic is pleased to announce the release of JobServer 3.4. This release delivers support for Hadoop allowing Hadoop customers to use JobServer as their central access point to schedule, track and report on their Hadoop jobs and environment.

Hadoop is quickly becoming the tool of choice for open source Big Data Analytics and related computing. The JobServer team saw a great opportunity to extend JobServer's awesome job processing, scheduling and reporting capabilities to make the lives of Hadoop users better.

Hadoop and JobServer are a natural fit. JobServer offers, out of the box, extensive customization and extensibility through its soafaces developer API. This allows developers to quickly build simple or complex server-side Java jobs and manage them from a central location. Extending JobServer to schedule, manage, monitor and track Hadoop jobs was a natural fit for JobServer.

With the 3.4 release, JobServer comes bundled with standard components that let Java developers bundle their Hadoop jobs into the JobServer environment. This allows Hadoop users to launch and monitor all their Hadoop jobs from one location and track all their Hadoop processing activity from one place using JobServer's web based monitoring, reporting and tracking tools. JobServer runs along side your Hadoop environment and provides both developers and administrators with the ability to build, orchestrate, schedule and deploy Hadoop jobs from one central location.

Use JobServer to build, schedule and track a single Hadoop task or to compose multiple Hadoop tasks into more complex jobs and workflows. For example, a job can be composed of multiple Hadoop tasks where one task directs its output to the input of other Hadoop tasks in the job chain. JobServer makes it easy to centrally schedule, track and monitor all your Hadoop tasks and jobs from one place for real-time monitoring as well as for historical reporting and auditing. JobServer's workflow orchestration allows users to manage and manipulate files and content between your local storage system and your Hadoop HDFS. You can also build custom web GUIs (using GWT) for your Hadoop tasks to visualize all input and output content and show real-time status as the tasks are running or to report on final status and results after a Hadoop task has finished processing.

Start using JobServer today with your Hadoop environment and see all the benefits for yourself. Download and test drive JobServer 3.4 now and improve your Hadoop experience!

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

Tuesday, September 6, 2011

JobServer 3.2.24 - Advanced scheduling and productivity features

Release 3.2.24 introduces many new features and productivity improvements including advanced scheduling enhancements and GUI based import and export capabilities. JobServer gives you the tools to make developing and deploying your jobs and backend services efficiently and reliably with as little hassle as possible.

The new scheduling features in this release give greater flexibility in scheduling daily, weekly, monthly and interval pattern jobs. You can now schedule jobs at multiple reoccurring times of the day and customize the reoccurring times to be customized differently based on the day of the week. Other enhancements in this release include advanced GUI based import and export tools to allow administrator and developers the ability to copy jobs between Development, QA and Production environments.

Also included with this release are a number of other minor enhancements and bug fixes including improved reporting and UI usability features. Check release notes for details.

Download and test drive JobServer 3.2.24 now and learn more about JobServer's powerful developer SDK, SOAFaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier, while using some of the best Java/AJAX and web/SOA open source technology available to developers.

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

Sunday, January 16, 2011

JobServer 3.2 - Easier to Deploy and Manage

Grand Logic is pleased to announce the release of JobServer 3.2. This release introduces new capabilities that enabling end users to deploy and manage multiple JobServer instances more easily and move jobs and applications between multiple environment (such as Development, QA and Production) using improved import and export capability.

JobServer's super scalable, threaded and distributed architecture delivers powerful job scheduling and job processing. JobServer can be deployed on one machine or on a grid of machines to power your job processing/scheduling and SOA environment. Now, with release 3.2, jobs and applications can be moved between multiple environment more easily. Developers can build and test jobs on their development environment then export/import them to QA environments for testing and finally import/export them to your final Production system.

With release 3.2, you can import and export jobs using command line tools or from the JobServer GUI administration panel. JobServer gives you the tools to make developing and deploying your jobs and SOA apps efficiently and reliably with as little hassle as possible.

Included with this release are a number of other minor enhancements and bug fixes including an improved upgrade process that allows easier upgrade of existing JobServer installations with the latest patches. Check release notes for details.

Download and test drive JobServer 3.2 now and learn more about JobServer's powerful developer SDK, SOAFaces, that makes extending and customizing JobServer and developing custom jobs and SOA applications easier, while using some of the best Java/AJAX and web/SOA open source technology available to developers.

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

Thursday, October 14, 2010

JobServer in the Clouds

Grand Logic is pleased to announced the release of JobServer Cloud edition. JobServer Cloud allows our customers to access and use the powerful job scheduling, job processing, workflow and SOA/Mule messaging features available in JobServer from a hosted on-demand environment. JobServer now supports deployments via a SaaS model, allowing our customers to lower their IT costs and free themselves to focus on their core business.

JobServer Cloud delivers all the same great features and capabilities found in JobServer and can now be hosted in the clouds. This frees customers from the IT burden of buying and maintaining hardware and installing and managing their own IT environment for JobServer. With the cloud edition of JobServer, you are freed from dealing with upgrades and managing maintenance tasks and dealing with hardware issues. We can add more job processing capacity, quickly, as you need it. You just need to focus on building and deploying your jobs and SOA applications and related services. We will insure that your JobServer environment is well maintained, managed and backed up and running efficiently. We will alert you if we notice performance issues or slow running jobs and processes and if more capacity is needed.

No hardware or software to install! With JobServer Cloud, your environment will be deployed and run from the Amazon cloud with secure connectivity to your private network using Amazon VPC (Virtual Private Cloud). You can access your JobServer instance securely to run and manage your jobs and apps with full access to your private corporate network. Many customers also need their JobServer environment to connect and access local IT systems and services within their private corporate network. With Amazon's VPC (Virtual Private Cloud) we can securely bridge between a company’s existing IT infrastructure and your JobServer Cloud environment. Amazon VPC enables enterprises to connect their existing infrastructure to a set of isolated JobServer compute resources via a Virtual Private Network (VPN) connection, and to extend their existing management capabilities such as security services, firewalls, and intrusion detection systems to include their JobServer resources that are in the Amazon cloud.

We are looking forward to all the great benefits that JobServer Cloud offers to our customers. JobServer Cloud is a natural evolution for JobServer given the web based nature and highly modular SDK extensibility of JobServer. This takes the JobServer solution to the next evolution in software which is cloud computing and virtualization. The Cloud edition lowers costs for customers and allows them to focus on their vertical business needs. We are looking forward to a great next chapter in the evolution of JobServer in the clouds!

We are also very excited about all the great benefits of Amazon's cloud computing environment such as EC2, S3, RDS, EBS, VPC and other services. Amazon is proven provider of virtualization and cloud computing services and offers enterprise security and mission critical reliability. Amazon and JobServer Cloud edition are a great combination.

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

Thursday, September 9, 2010

Grand Logic Partners with Mobile Innovator

Grand Logic is pleased to announce a partnership with Exalt Technologies that will drive mobile innovation. "Through our partnership with Exalt, we hope to bring great mobile enabled software solutions to our customers and into our products. Exalt is a leader in developing great user experience and rich mobile client technology. They will be a great asset and technology partner."

Exalt specializes in innovative solutions for mobile devices and web applications and is a proven leader in delivering unparalleled mobile client technology combined with thoughtful user centered design. "Exalt is happy to be working with Grand Logic and their customers to combine our mobile client expertise and UI design with Grand Logic's proven business automation software and specialized web 2.0 solutions."

About Grand Logic Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

About Exalt Exalt is a proven technology innovator taking business mobile through the application of cutting edge mobile technology and superior UI design. Exalt specializes in web and mobile solutions leveraging a highly skilled team of Software Engineers, Information Architects and UI Designers.

JobServer 3.0 Beta Now Available

Grand Logic is pleased to announce the release of JobServer 3.0. This release introduces key architectural enhancements that enable distributing job processing and clustering within JobServer. Job processing can now be distributed across a network of computers allowing for high-availability and virtually unlimited job processing scalability.

Combining JobServer's super scalable scheduling engine with its new scalable distributed job processing architecture makes JobServer the ideal platform for managing and automating IT environments of any size. From large banks, to EDI processors and SaaS startups, JobServer is a solution that will allow you to automate and manager your IT environment and back-office functions.

With this release, JobServer is now scalable and highly available at all levels, from job scheduling, to job processing to web GUI applications. The 3.0 scalability enhancements allow for dynamically adding new computers (called Agents) to the JobServer environment/pool to incrementally increase job processing capacity.

Download and test drive JobServer 3.0 now and learn more about JobServer's powerful developer SDK, SOAFaces, that makes extending and customizing JobServer and developing custom jobs and SOA applications easier, while using some of the best Java/AJAX and web/SOA open source technology available to developers.

About Grand Logic
Grand Logic is dedicated to delivering software solutions to its customers that help them automate their business and manage their processes. Grand Logic delivers automation software and specializes in mobile and web products and solutions that streamline business.

Thursday, September 10, 2009

GWT and Mule Playing Together

Mule is a leading open source solution for organizations to integrate and harmonize disparate systems within their enterprise. It also provides a fast track to SOA enable many of the functions within the organization and bring down barriers to application development and integration.

GWT (Google Web Toolkit) is a popular and leading framework for building AJAX web applications. There are a number of ways for GWT applications to communicate, the standard one is through its built-in RPC mechanism but there is no easy way for your GWT applications to talk with Mule endpoints directly.

SOAFaces is a lightweight framework that ties GWT and Mule together and makes it possible for GWT apps to access the Mule API and services directly from the AJAX client. This is done through the SOAFaces UniversalClient java interface which is very similar to the MuleClient interface found in the Mule APIs. The UniversalClient allows for messages to be invoked on Mule endpoints (SOA endpoints) and objects/data to be passed and returned directly from the GWT client code. All this is done without the need to use the GWT RPC mechanism. Simply use the UniversalClient interface in much the same way you would traditionally use the MuleClient interface in the Mule framework.

SOAFaces provides three sub APIs to develop with:

org.sofaces.services.*

This is the main package that includes the UniversalClient interface and facilitates the communication and marshaling (JSON or GWT/Java serializable objects) between GWT client and Mule/SOA back end services. This package can be used independently with any custom GWT application and in any j2ee app server. Mule endpoints can be invoked and messages will be routed to a MuleClient proxy on the j2ee back end. This is all transparent to the GWT/SOAFaces developer. Mule requests from GWT are proxied through the webserver and out to the Mule services bus, with all marshaling handled by SOAFaces.

org.soafaces.bundle.client.*

This package in SOAFaces makes it easy to build modular GUI applications (called Weblets) in GWT that have convenient access to SOA services like Mule (using UniversalClient interface). This package also provides a simple and optional API to persist the state of GWT POJO objects, so building modular web apps is easier and with less hassle using SOAFaces!

org.soafaces.bundle.workflow.*

This package in SOAFaces allows building back end server components called Tasklets. Tasklets can be used in jobs and workflows to execute back end batch type processing. Tasklets can be chained together to form more complex workflows and allowing message passing between the Tasklets which provides for rich functionality to be supported. Refer to JobServer for an example of a job scheduling engine that implements the Tasket API and for running and managing jobs.

The SOAFaces framework brings together these APIs for building Mule and GWT powered applications making accessing Mule a quick hop away from your AJAX apps.

For more information please refer to the SOAFaces website.

Wednesday, June 24, 2009

JobServer 2.6.0 beta released - packed with enterprise features

Grand Logic is pleased to announce the release of JobServer 2.6.0. This release includes improved SOA and enterprise capabilities along with significant architectural improvements and reliability features.

Some of the new capabilities in this release include:

Support for the latest version of Mule (version 2.2.1)
High-availability features with support for hot fail-over of job scheduling
Tiered architecture allowing web tier, job processing/scheduling tier, and database tier to be on separate host machines
Support for web tier clustering - running JobServer's web GUI Workbench tools across a cluster of Tomcat servers
Support for running Weblets/GWT and Mule/SOA services across cluster of machines

This release presents a number of key enterprise features that allows JobServer to play vital role in an organization's IT computing environment. Run and schedule jobs easier than ever before and now you have access the best SOA tools and APIs to build rich web based and SOA applications using GWT and Mule.

Never before is has it been easier to build Mule applications and deploy them in your IT environment. Build web GUIs for your Mule services in highly modular way using JobServer's tools and plugin SDK. Build your Mule services and deploy them in JobServer and easily build GWT powered GUI interfaces to your Mule services. JobServer makes using Mule easy.

JobServer Weblet/Mule architecture diagram

JobServer Job scheduling/processing architecture diagram

Download and test drive JobServer 2.6.0 now and learn more about JobServer's powerful plugin SDK, SOAFaces, that makes extending and customizing JobServer and developing custom jobs and SOA applications easier while using some of the best Java/AJAX and web/SOA open source technology available to developers.

About Grand Logic
Grand Logic is a privately held company focused on delivering quality software to its customers. It is founded on the principle that when innovation, dedication, and hard work come together great things can happen.