Sunday, September 14, 2014

JobServer Release 3.6.14

We are happy to announce the release of JobServer 3.6.14 which introduces LDAP support and improved shell script processing to allow turning any standalone program or shell script into an easy to automate and track application. Yes, with JobServer you give your shell scripts and batch standalone programs a GUI front-end that you can use to customize your shell scripts and leverage powerful reporting and monitoring to easily track all input and output related to your batch scripts and standalone programs.

With this release, JobServer now supports improved tracking of shell script output via the JobServer JobTracker reporting and tracking application. You can now preview the standard output of every shell script right from the top level JobTracker search report. You can also now run shell script jobs manually and pass custom input parameters to the shell scripts. Using JobServer with batch scripts just got a whole lot more fun and productive.

Want to simplify user authentication for you and your JobServer end users? Now with LDAP support, you can integrate JobServer with your Active Directory and LDAP compatible environment for more seamless user authentication.

Download and test drive JobServer 3.6.14 today and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop and predictive analytics consulting services that maximize your Big Data investment.

Saturday, August 16, 2014

Tableau for Agile Oracle Essbase Financial Reporting

Oracle Hyperion Essbase is an established multidimensional database platform often used by accounting departments to model and store their company's financial data. Essbase comes with some out of the box Oracle web reporting utilities to help you visualize your financials for management and also comes with integration with tools such as MS Office for reporting via Excel. The typical model is you end up passing lots and lots of excel spreadsheets around your organization and with your executives - a bit antiquated in this day and age to say the least. You will however find that Excel, to its credit, is commonly used to build fairly advanced reports with Essbase using home grown Excel and VB programming. But there has to be a better way that does not involve building complex data warehousing that typical third party BI tools require.

Now Tableau is a fast growing and popular visualization BI solution that enables business analysts (without advanted technical expertise) to perform data discover and build rich and sophisticated visualizations that can be more easily shared than traditional Excel sheet sharing. Tableau has emerged as a powerful replacement for Excel based reporting and a challenger as well for the established enterprise BI platforms such as Microstrategy and Cognos to mention a few. Tableau fits well as an agile replacement for Excel reporting while allowing users to build very powerful next generation reporting and dashboards that out perform the traditional enterprise BI vendors.

Tableau still has a way to go on the enterprise end, but it is coming on strong and if you know how to deploy implement Tableau Server you can build highly agile and visually rich enterprise grade BI solutions. For financial reporting, Tableau allows you to take your legacy Essbase reports and spreadsheets out of the dungeon and into the light of day by allowing you to build sophisticated dashboards that can be easily accessible across your orgaization via Tableau Server by all your executives.

With Tableau you can just say no to having to build yet another data warehouse and complex ETL when architecting your business intelligence strategy. But be aware, Tableau can be used to extract data from Essbase directly using the built-in Tableau to Essbase connector, but say no to this also, this will not work (needs another blog). We strongly suggest not using the Tableau Essbase cube connector for a number reasons (not all Tableau related). This connector has many challenges. A hint - extract your Essbase data using the Essbase Excel plugin and mix with a little ETL and output to denormalized flat data structures. Say what? Yes this approach rocks! Remember that Tableau is great at extracting dimensionality out of your data (that is one of its claims to fame actually).

At Grand Logic, we have developed an elegant and straight forward approach to extracting data from Oracle Essbase for agile and efficient consumption by Tableau. This in turn can be used to build advanced financial reports and dashboards without a huge investment in data warehousing and ETL processing. Our approach to integrating Tableau with Oracle Essbase leads to a powerful solution that will leave your executives wanting more and frees your accountants and financial analysts from building cumbersome to maintain Excel reports. Get your financial reporting and dashboarding in Tableau today for centralized access and one governed version of the truth and put actionable and insightful data in the hands of your executives.

Tableau and Essbase can be a great combination for building rich reporting and dashboards and without the overhead and complexity of traditional data warehousing and BI. Get your financial data out of Essbase and into Tableau and into the hands of our executives and decision makers. Contact Grand Logic to learn more.

Wednesday, February 5, 2014

Machine Learning: The Brains Behind Big Data

The first round of the data revolution has focused around commoditizing computing and storage. Platforms such as Hadoop and NoSQL have helped to propel this and have enabled businesses to economically deploy more powerful scale out infrastructure than before. It has also changed and improved the way data warehousing and business intelligence is approached and managed. The storage and performance capabilities of Big Data have been a game changer. Traditional descriptive BI and reporting will never be the same. But this is just step one. The best is yet to come.

The industry is now going through a learning processes with how to manage all this data at massive scales. Storing and managing more data is great, but people and businesses will get smarter at how much data to keep as it starts to hurt more (hurt the pocketbook). How much data you keep and mine will depend on statistically driven best practices and not just about data warehousing or how big your HDFS cluster is. The mainstreaming of Big Data has provided the muscle to store and process massive amounts of data at near linear scale, but we will not see the real value of all this Big Data storage and processing until machine learning and data science tools become more assessable (to the non-PHD data scientists among us) and mainstream and businesses learn how to apply these tools and disciplines effectively.

Machine Learning will provide the brains to go along with the Big Data muscle. In the long-run businesses will decide how much data to keep around based on statistical measures and best practices as they grow to understand their data and their business better as they build out developing their predictive and prescriptive analytics.

Sunday, October 20, 2013

JobServer and Mesos Make a Great Pair

We are happy to announce the release of JobServer 3.6 beta1 with support for Mesos clustering and distributed job processing. Release 3.6 is an early access release of JobServer with integrated support for Mesos. With this release of JobServer, you can now schedule and run jobs on a Mesos cluster of any size and configuration. Say goodbye to cron jobs!

JobServer has always had support for distributed job scheduling and processing and a great replacement for cron. Now, with Mesos integration, JobServer takes this to next level by incorporating support for dynamic resource management and reliability by leveraging all the advantages of Mesos. JobServer also brings powerful scheduling, reporting and monitoring features to Mesos environments. Distributed job scheduling and batch processing just got more interesting!

With this release you can track and manage jobs as they run across a dynamic and highly resilient cluster of servers. JobServer with Mesos allows you to run scripts and jobs across your cluster of servers and manage how resources are utilized and managed. If you are a Mesos user today, give JobServer a try and say goodbye to cron. If you are a JobServer user, get your compute resources under control with Mesos.

Download the beta release of JobServer v3.6 and tame your IT environment using all the advantages of Mesos and JobServer.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop consulting services that maximize your Big Data investment.

Friday, October 18, 2013

Mesos: The Datacenter is the Computer

The data center is the computer. The pendulum is swinging. Traditional cloud and virtualization level resource management in the data center are no longer good enough to efficiently manage the growing demands for computing services needed in the enterprise. The answer for this challenge, to offer more compute and storage services more efficiently, are solutions such as Mesos and YARN. These emerging cluster management platforms are the next evolution for fine grained and efficient resource management of your data center infrastructure and services. As the need for more processing and storage grows, solutions like YARN and Mesos take center stage.

Big Data, mobile, and cloud computing have driven a tremendous amount of growth and innovation, but the byproduct has been more and more computing infrastructure needed to service the growth and manage the explosion of data. This has especially been the case as we have moved to using more clustered commodity hardware and distributed storage. You now have start-ups and smaller companies managing complex multi-node computing infrastructure for things like Hadoop, real-time event streaming, social graphs as well as for managing established core services like data warehousing, ETL and batch processing. All this has put a lot of demands in order to effectively manage and administrator a dynamic hardware computing environment and in many cases it has created isolated silos of resources dedicated to different tasks, for example, your Hadoop cluster is separate from your application services, database servers and legacy batch processing. This does not scale and it not cost effective.

These silos have created inefficiencies within that data center and the enterprise environment. For example, if your Hadoop cluster of 10 nodes is running only 70% of the time at maximum capacity, what are those 10 nodes doing the other 30% of the time? Same can be said for the other services running in the data center. Unless you can treat your entire data center as one shared cluster of resources, you will have inefficiencies and as the number of nodes and services you are managing grows, these inefficiencies will only increase. This is were solutions like Mesos can step in and give your applications and services one holistic view of your computing infrastructure. By using Mesos, you can reduce costs and more efficiently utilize the hardware and storage resources you already have and it allows you to grow more incrementally as more resources are needed.

Companies like Google, Twitter and Facebook are leading the charge to advance the state of art for efficient data center and enterprise computing. Mesos is a great tool and platform to leverage to reduce costs, improve reliability and overall operational efficiency of your operational IT environment. Give Mesos a look. Cheers!

Thursday, August 15, 2013

Protecting your Hadoop Investment

The hype and buzz around Big Data in the tech industry is at astronomical levels. There are many factors driving this (both technical and human), but I won't get into that here. The fact is that Big Data (define it as you like) is here to stay and many organizations need to find their path to the wonderland of bottomless data storage and boundless analytical computing where no byte of data is every thrown away and any question can be asked and answered about your data. Well, at least if you are Facebook or Google.

Hadoop is the leading contender to enable organizations to economically and incrementally take advantage of distributed storage and scalable distributed processing to tackle the Big Data challenges ahead. The days of buying expensive vertically scaling servers and expensive storage systems are over. Hadoop started from the humble beginnings of Map Reduce and distributed storage (HDFS) and now it has expanding to touch and integrate with all corners of the enterprise computing fabric from real-time business intelligence to ETL and data warehousing. These days, most any company with some kind of database or software analytics solution has now put the word "Big" in their title and offer some level of Hadoop integration. Nothing really bad about that, and it is great to see everyone gravitating to the Hadoop ecosystem as an open source standard of sorts for Big Data.

Hadoop presents a lot potential to solve problems that in the past required much more expensive and proprietary systems. Note, that Hadoop in many respects is no less complex (and is by no means free) from past and existing propriety Big Data platforms, as Hadoop has its own complexity challenges such as many distributed hardware moving parts and is a more or less a loose collections of many open source projects. Hadoop has a lot of creative minds and companies driving its fast evolution. But it is not out of the box a plug and play solution nor a one size fits all solution by any stretch of the imagination. Hadoop does not come cheap by any measure, but with Hadoop you have more opportunity to grow your Big Data system as you go, and with the potential with less vendor lock-in and more flexibility over what you pay for (note, I use the world potential here). The value you get out of Hadoop depends on your expectations and on your investment in people and training along with key decisions you make along the way.

So how does an organization begin down the road of figuring out how Hadoop fits into their existing ecosystem and how much and how fast to invest in Hadoop? Let's see if we can walk through some common questions, challenges and experiences one would go through as they begin their Hadoop quest.

First you need to understand what makes Hadoop tick.
It is important to understand that out of the gate Hadoop does not necessarily invent anything that has not existing before in other products. There are some novel concepts in Hadoop, but overall Hadoop offers nothing altogether new. There are some cool innovations in Hadoop, but fundamentally Hadoop is about a few key concepts. It is founded on the concept of distributed computing and distributed storage using commodity hardware. But ultimately Hadoop is about growing your data storage and processing in an incremental and economical way using largely open source technology and off the shelf hardware. Note, open source does not mean free of course.

Okay, so what problem do we want to solve with Hadoop? Please don't say all of them.
One of the nice things about Hadoop is that organizations of any size can adopt it. You can be a small startup with and simple idea and run your Hadoop on a small clusters on Amazon or you can be a larger enterprise and have a massive clusters performing high-end processing, such as crawling and indexing the entire web. Hadoop can be used in a variety of situations such as to reliably store large volumes of data on commodity storage or it can be used for much more complex computing, ETL, NoSQL and analytical processing.

For larger organizations that are getting started with Big Data, it is vital to identify some key problems you want solved with Hadoop and that might fit and integrate well with existing legacy systems. Hadoop is particularly good at being a holding area for unstructured data like web or user logs that you might want to keep in raw format for later analysis and auditing, for example. What is typically important is to start small and solve some specific problems on specific data sets and then expand your application of Hadoop as you go. This includes getting accustomed to the many programming and DSL packages that can be used to process Hadoop data.

Hey, in a Big Data universe we never throw anything away.
Some of the talk circling around Big Data often mentions how the typical application of Hadoop is to always store everything forever. Obviously this is not practical. Now, many vendors that are providing software and hardware for Hadoop would love for you try to do this, but the reality is that you still need to understand your data limits and have clear aging and time to live policies. Hadoop does let you scale your storage out to petabytes, potentially, but there is no free lunch here. Also, a critical aspect to this is understanding the format you store your data in, within Hadoop. Again here, you hear a lot of talk about storing all your data in "raw format" so you can have all the details in order to extract deep information form your data in the future. While this sounds great in theory, again this is not practical in most cases. In reality, you can keep some data in raw format, but you must typically transform your Hadoop data in other formats besides just unstructured HDFS sequence files, for example. Structure does matter as you get into more complex analytics in Hadoop. Storing your data in HDFS also often means transforming it into semi-structured column stores for use by tools such as Hive and HBase and other query engines, for better performance. So structure matters and expect to have your data stored in Hadoop in possibly multiple formats or at least transformed via Hadoop based ETL into formats other than the "raw" acquisition format. This all adds up to more and more storage requirements. So make sure you understand the math to properly size your Hadoop storage needs.

Now this software is open source which means mostly free, right?
Obviously we have all learned by now that open source does not necessary mean free. Red Hat, as an example, has a pretty good business around open source and they are quite successful at making a profit. Hadoop vendors are no different. There are several well funded start-ups that have Red Hat like business models around Hadoop, not to mention all the big boys trying to retrofit their existing Big Data solutions to be Hadoop friendly. None of them are free, but they all are different from each other. And it is important to understand each Hadoop vendor's strengths and weakness and where they are coming from. The vendor's history does matter for a lot of reasons that I will discuss in a later post.

Now, in theory you could go it alone, and use Hadoop completely free - just download most of the Hadoop packages from Apache (and a few other places). For example, I have downloaded and installed versions of Hadoop from the Apache Foundation and have been ale to run basic Map Reduce and HDFS jobs running on small clusters - all for free and without going through any Hadoop vendors. You can also use community versions from the various Hadoop distributions from the major Hadoop vendors. This can work, but you are on your own and how feasible this approach is depends who you are and how savvy your technical staff are. It is also important to understand how the various Hadoop distributions and players differ from each other and how much you are getting "locked in" with each Hadoop vendor. The retro-fitted Hadoop vendors (as I call them) have a lot more polish and savvy when they pitch Hadoop to you while some of the Hadoop startup vendors have varying degree's of proprietary software embedded in their Hadoop distributions. It is critical to understand these facts and it is important to consider how much you are willing to build on top of Hadoop yourself vs relying 100% on your Hadoop partner. These are important considerations that can sometimes get lost in internal management jockeying over who will be the Big Data boss. Vendor lock-in is very important to understand along with clearly planning for sizing, capacity and long-term incremental growth of your cluster.

This all leads to understanding the cost of Hadoop as you set expectations over what problems you want your Hadoop cluster to solve from day one. Sizing your Hadoop cluster for storage, batch computing, real-time analytics/streaming, and data warehousing must be considered. How you capacity plan your storage, HDD spindles, and cpu cores are critical decisions as you plan the nuts and bolts of your Hadoop cluster. Your Hadoop partner/vendor can help you with this sizing and planing, but again here, each vendor will approach it differently depending on who they are and who you are (how deep your pockets are). You have to be smart here and know what is in your best interest long-term.

Your Hadoop cluster is not an island.
It is vital to consider how your Hadoop cluster will fit in with your current IT environment and existing data warehousing and BI environments. Hadoop will typically not totally replace your existing ETL, data warehousing and BI systems. In many cases, it will live alongside existing BI systems. It is also vital to understand how you will be moving data efficiently into your Hadoop cluster and how much processing and storage is needed to put data into intermediate formats for optimal performance and efficient consumption by applications. These are critical questions to answer in order to get your Hadoop cluster running efficiently to effectively feed downstream systems.

You mean my Hadoop cluster does not run itself?
One under estimated area concerning Hadoop, is planning for the operations and on-going management of your Hadoop cluster. Hadoop is good technology, but is fast evolving and has many move parts both at an infrastructure level (lot of nodes and HDDs) and from software package perspective (lot of software packages that are fast evolving). This makes running, monitoring and upgrading/patching Hadoop a non-trivial task. For example, many of the Hadoop vendors offer both open source and proprietary solutions for managing and running your clusters. This obviously requires your operations and production IT staff to be included in the planning and management of your clusters.

Some other important questions and considerations as you get started with Hadoop.
  • How will multi-tenancy and sharing work if more than one group is going to be using your cluster.
  • Should I have one or a few big Hadoop clusters, or many small clusters
  • Understand your storage, processing, and concurrency needs. Not all Hadoop schedulers are created equal for all situations.
  • Do you need or want to leverage virtualization and or cloud bursting?
  • Choose your hardware carefully to keep costs per TB low. How to mange TB vs cpu/core is important.
  • Understand what you need in your edge nodes for utility and add-on software.
  • Plan your data acquisition and export needs between your Hadoop cluster and the rest of your ecosystem.
  • Understand your security needs at a data and functional level.
  • What are your up time requirements? Plan for rolling patches and upgrades.

Maybe I should have stated this in the beginning, but the reason I called this blog Protecting your Hadoop Investment, is because many organizations enter into this undertaking without a clear understand of:
  1. Why they are pursuing Big Data (other than it is the hot thing to do).
  2. How Hadoop differs from past propriety Big Data solutions.
  3. How it can fit along side existing legacy systems.
  4. How to ultimately manage costs and expectations at both a management and technical level.
If you do not understand these points, then you will waste a lot of time and money and fail to take effective advantage of Hadoop. So, strap in and enjoy your Hadoop and Big Data adventure. It will be a journey as much as a destination and it will transform your organization for the better if you plan appropriately and enter into it with your eyes wide open.

Thursday, July 25, 2013

JobServer 3.4.28 - Isolated JVM Containers

We are happy to announce the release of JobServer 3.4.28 which adds a number of new features for administrators along with supporting the latest version of Google Web Toolkit and expanded remote management APIs.

With this release, JobServer now supports expanded remote web services programatic APIs. Also included in this release is the capability to run distributed jobs under customizable Linux/Unix userspace accounts on a job by job basis, which gives administrators fined grained control over how they run their jobs. This allows users to run jobs inside isolated JVMs in a more granular fashion.

It has always been our focus to make JobServer the most developer and IT friendly scheduling and job processing platform on the planet. We are proud of our focus on taking customer and developer feedback to continuously make JobServer the best scheduling and job processing engine around. JobServer tames your job processing and scheduling environment in a way that is a joy for Java developers to customize upon while providing powerful web UI management and administration features for business users and IT operations administrators.

Download and test drive JobServer 3.4.28 today and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop consulting services that maximize your Big Data investment.