Wednesday, November 18, 2015

Understanding Apache Spark - Why it Matters

Apache Spark has come on the scene in the past few years and has taken the computing world by storm. It is dubbed as the replacement for Hadoop and often seen as the next evolution in Big Data. Spark is one of the most active Apache projects and has developed a strong ecosystem. Even the Big Data players themselves are adopting it in their stack and positioning it as a key player in their overall open source and productized solutions.

Why has Spark been so successful? How is it better or different than the first incarnation of Big Data (aka Hadoop). Well Spark does not abandon the principles that were realized by Hadoop and companies that helped bring the Big Data philosophy to the masses. Spark builds on the basic building blocks of such technologies, such as HDFS and programming constructs such as Map-Reduce and it does it in a way that makes building application on top of Spark much more efficient and effective than its predecessors.

Spark like Hadoop supports building a computing fabric that can be deployed and can run a commodity type hardware and inherently supports horizontal scaling. Spark lowers the barriers for helping application developers parallelizable their applications and spreading the computing and data access on a cluster of computers for processing. Hadoop does many of the same thing, but Spark does it better from both a technology implementation perspective (more efficient use of memory, garbage collection handling...) and much better application programming API.

What Spark does is raise the bar from a programming interface perspective. It has strong support for Java, Scala, Python and R. Its core operations for managing data (such as RDDs) and computing are very well designed interfaces and APIs. When working with Spark you still have to look at your application and the problem you are trying to solve and think how to parallelize it, but the Spark APIs are intuitive to understand and to use for the typical application programmer. Spark gives you the tools to essentially access the same power a grid computing platform has or distributed database engine might have internally and makes it available to the average programming to embed that same sophistication in their own application.

Spark is a game changer. It can be used for everything from ETL to basic application OLTP computations that drive a GUI to backend batch processing to real-time streaming applications and graph modeling. Spark is truly a game changer that will bring some of the powerful technology pioneered by the internet giants for leveraging distributed computing into applications at levels of the enterprise. Strap your boots and starting learning Spark. It is the next evolution in not just Big Data but in general purpose application programming that can leverage true distributed grid computing and bring it to the programming masses.

Monday, July 27, 2015

Unbundling Database Architecture: Turning Databases Inside-Out

Relational database technology has been around for a few decades now. In the last several years we have seen a resurgence of innovation around data storage and data processing. This has pushed us into the realm of thinking outside of traditional SQL and big iron monolithic computing.

NoSQL, NewSQL and distributed commodity/cloud storage is changing how we build persistence into our applications. However the fundamentals of databases have not changed much. Lower cost memory and the availability of cheaper cloud computing has created a lot of innovation, but how databases function under the hood has not changed very much.

The fundamentals of how transaction atomicity, replication and considerations such as CAP theorem are still tackled in much the same way as they were with the earlier database engines. But is there a different way to look at how applications manage persistence for OLTP type of transactions? Well, Apache Samza presents an interesting approach to how data is managed. While it takes things from a streaming centric approach, this could present a new way for how applications can manage general data storage in the future.

Here is an interesting blog that presents a breakdown how the Apache Samza architecture and how this can facilitate more general purpose application data management by using an "unbundled" architecture in the heart of the database engine. Is this just another specialized data storage engine geared toward steaming data and analytics, or a whole new way to think about database architecture?

Sunday, June 7, 2015

Isomorphic Web Apps: Back to the Future, Again

As web application development evolves, we continue to see the pendulum swing between client and server. Over the past two decades we have moved from simple multi-page HTML applications that are rendered exclusively on the server to ultra fat single page applications (SPA) containing more javascript than anyone would have imagined a few years ago.

Over the past couple of years, many large hosted sites (i.e. Airbnb, Facebook and others) have run into challenges with building heavy javascript client apps and have rediscovered the value of rendering some of the web content on the server. Technology such as Node.js has made this easier and so has the creation of frameworks such as ReachJS. This rediscovering of using the server for rendering UI now has a new cool name, Isomorphic Javascript. The name seams to have stuck, so we will need to add it our lexicon :)

The technology around this new approach is gaining some steam of late. Here is a good blog from from Airbnb on what led them to consider this architecture for their hosted web application services. While the idea for moving away from SPA has been around for while, it is gaining more steam of late and we will for sure start to see more of the established front-end JavaScript frameworks incorporating it in one way or another as well as new frameworks such as ReachJS.

ReactJS is one of the more popular frameworks that leverage server side rendering and that advocates for this hybrid web application development. While Node.js is the leading container for supporting this application delivery model, we will start to see JVM support and integration as well with Java 8 Nashorn.

There are many benefits to building your web application with an isomorphic javascript architecture that I will try to cover in an up coming blog. There are already some good blogs covering the subject. Also expect AngularJS 2.0 to offer support for server side rendering, but we will have to wait and see what Google comes up with as AngularJS 2.0 gets further along.

So keep an eye out for this new twist in web application development. It will will be a boost for mobile development as well since mobile can certainly benefit from some server-side offloading of processing. But like most things, this new technology approach is no free lunch. Isomorphic javascript does add some complexity to constructing your web applications. Some of this maybe alleviated as web application frameworks evolve and as HTML web component standard mature. Stay tuned.

Saturday, May 9, 2015

A Future Writen in TypeScript?

Web developers! Get your TypeScript engines started. Sad to say that Dart is dead, but TypeScript is a much more natural evolution toward ECMAScript 6 and a more team scalable, structured and manageable extension to JavaScript programming (long live static typing :) to help bring web development out of the wild wild west.

Here is how AngularJS 2.0 is influencing the future of web development:

Thursday, November 13, 2014

Web Components are Real

Web Components are not another internet buzzword. Web Components are a collection of web browser constructs and standards that will modernize client side web development and improve the web design process overall. This is a long time in the making, but these are the missing building blocks (along with continued ECMAScript maturity) that are needed to bring web development on par with traditional structured programming languages and environments without the need the crazy hacks we have today.

The key standards behind Web Components include:
  • Shadow DOM: Finally DOM trees that don't step on each other. Modular DOM structures can exist and interact with each other.
  • Custom HTML Elements: HTML building blocks where each custom element can have encapsulated properties functions and events. Elements can exist in a hierarchy/nesting and look and act like native HTML elements.
  • HTML Imports: Import HTML pages and source files like other programming languages.
  • CSS Grid Layout: Table and grid layout done in a more intuitive way and more akin to how most client GUI frameworks handle widget layout.

These standards will impact low level frameworks such as jQuery, but will also change the way higher order client side frameworks like AngularJS, GWT, Ember, Knockout evolve over time and how they provide wiring, plugin and extension capability to their developers.

So get ready for Web Components. They are real and will finally bring modular and structured web programming to the web to support more robust, scalable and maintainable, extensible client side development frameworks.

P.S. Keep an eye on the Polymer project if you want to experiment with Web Components today. This client side framework, packages many of the emerging standards into a developer friendly API and programming model. But keep in mind that Polymer is not Web Components, it is just a project that demonstrates the power of these new Web Component standards.

Sunday, September 14, 2014

JobServer Release 3.6.14

We are happy to announce the release of JobServer 3.6.14 which introduces LDAP support and improved shell script processing to allow turning any standalone program or shell script into an easy to automate and track application. Yes, with JobServer you give your shell scripts and batch standalone programs a GUI front-end that you can use to customize your shell scripts and leverage powerful reporting and monitoring to easily track all input and output related to your batch scripts and standalone programs.

With this release, JobServer now supports improved tracking of shell script output via the JobServer JobTracker reporting and tracking application. You can now preview the standard output of every shell script right from the top level JobTracker search report. You can also now run shell script jobs manually and pass custom input parameters to the shell scripts. Using JobServer with batch scripts just got a whole lot more fun and productive.

Want to simplify user authentication for you and your JobServer end users? Now with LDAP support, you can integrate JobServer with your Active Directory and LDAP compatible environment for more seamless user authentication.

Download and test drive JobServer 3.6.14 today and learn more about JobServer's powerful developer SDK, soafaces, that makes extending and customizing JobServer and developing custom jobs and backed automated services easier.

Grand Logic delivers software solutions that automate your business processes and tame your IT operations & Big Data analytics. Grand Logic delivers data and job automation software, Hadoop and predictive analytics consulting services that maximize your Big Data investment.