Cloud Analytics & ML with Sam Taha

Thursday, August 9, 2018

Choosing Between Spark ML, scikit-learn, and DNNs

Now these aren't the only considerations when deciding on how to build your data science stack and the related tooling you will need around it, but it is a place a lot of organizations tend to begin their opening questions. Sometimes the answer may be, all the above. But you have to first reflect on your organizations goals and the level of your investment in any transformation effort, especially one that involves such a fundamental shift in how you to turn data into business value.

There are a number considerations that can influence your data science architecture that should be examined before establishing your AI platform. They include:

ETL and data prep tools? AI does not work without data. Find it, mine for it and create it.
Cloud, on-prem or hybrid for building your data science stack?
How big is your data? Really how big is your data? Not everyone has "big data".
What are you modeling? What kind of outcomes are you looking to solve for?
Build, buy, partner. What kind of skills do you want to invest in for in-house data science, ML engineering and ML operations?

The bullets above are only touching on much deeper considerations that need to be assessed by any organization looking to transform their business with AI. But let's step back a bit and just discuss the question posed by the title of this blog to avoid turning this blog into a long drawn out analysis that goes down too many rabbit holes.

Spark ML

It is natural for a lot of organizations who have been doing "Big Data" to get their first exposure to data science through Spark's MLlib. Spark ML is a nice module/framework the comes with Spark and comes packaged with most major Hadoop distributions. The ML APIs and algorithms include many of the popular model building options from decision trees, to survival analysis (time-to-live), to allowing you to build recommendations engines (ALS), to unsupervised learning with clustering and topic modeling. Spark ML is nice and convenient for those coming from the Big Data universe. One nice advantage is that you can often leverage Spark's inherent distributed architecture to build models that can operate at large petabyte scale when needed. Is Spark ML ideal for all data sources and outcome objectives and it is the most efficient (you can hack DNN into if you have the stomach for it) - the answer as you might guess is obviously no. Why, well that is for another day to dive into, but suffice it to say that it may always be the most accurate way to build models and may not always be the best bang for CPU/GPU buck.

scikit-learn

Then there is good old scikit-learn. Any Python developer with a math or data background or has done any statistical modeling (or ML work) will know and likely love scikit-learn and all the other related Python packages such as numpy, scipy and pandas to name the most popular. scikit-learn is a treasure trove of algos and APIs. It is an awesome framework for ML developers and data scientists. Does it scale in same ways that Spark can - unfortunately no. But do you always really need it to? Look at your data before you answer that.

Deep Neural Nets

Then there is the new kids on the block, DNNs (back from the future). Tensorflow and PyTorch just to name a couple of the most popular are claiming to be universal function approximators that can model anything and solve for everything. Note, you will need to bring data and lots of it. They are data hungry. They can solve anything from classifications to generating word embeddings to creating generative models. There isn't much a DNN and its offshoots can't do theoretically. Through their natural fit with GPUs, they can scale fairly efficiently, and you can sometimes sort of distribute them with some extra heavy lift.

Taking your Models Live

A lot of what we just reviewed is about building and training models. Now, how do then take what we just trained and turn it into a service that predicts, classifies or generates data? That is also a topic unto its own. Operationalizing machine learning models can be non-trivial but it can also be not so difficult at times. It just depends on the model you are creating. For example, sometimes discrete bounded models can just be exported into a database, but often times the solution (input and output space) is not finite and requires creating distributed your build models as inference engines - and that is a bit more work. Then there is the nagging issue of how and when to update your models. Again another subject all together.

Buy or Build

So should you build or buy? The big boys (google, aws, azure) are all making a lot of what we just described available as MLaaS offerings (to various degrees of completeness). So stay tuned and current as the AI technology world is changing fast.

Saturday, February 10, 2018

Don't Build your NLP Bot like a GUI

There has been an ongoing buzz about conversational user interfaces (CUI) and how they can enhance human-to-computer interaction. Many of us have experienced them to some extent both in play with voice interfaces such as Siri and Alexa and at work with messaging interfaces in tools like Slack. When CUIs are infused with a good dose of NLP (and context aware, machine learning..etc), they have the potential to become more than just a primitive command line or simple minded messaging/voice interface. The potential for building NLP powered virtual assistants and benefiting from the productivity that they can provide does exist, but you will have to get out of your comfort zone as a developer.

The Mental Leap to CUIs

Building CUIs using NLP and AI can take some getting used to when you have spent your career or education building graphical user interfaces (GUI). Be wary of getting caught in the trap of building your natural language interactions like you would build a GUI. A virtual assistant, like how some NLP bots function, can behave in many of the same ways you would interact with another human that would be working on your behalf. Now, a GUI is a very capable and functional interface, but is not a virtual human. And you have to keep reminding yourself that the bot you are building needs to behave like a virtual human, because it is easy to get caught in the trap of trying to build your NLP virtual assistant like you would a traditional GUI. It ends up being the worst of both worlds if you end up doing that.

Not that there is Anything Wrong with GUIs

Now there is nothing wrong with GUIs. For some tasks and functions they will always be superior to natural language. But for many things we do in our daily lives, language is a superior medium especially if the engaging entity is intelligent.

Fundamentally how you would ask (via text or voice) a virtual assistant to perform an action or make an inquiry on your behalf is different than using a GUI. The way you reach a decision or action point in a CUI can be different than a GUI. Let's take a very simple example to compare.

Say you are using a business application to manage pending requests that you must take action upon on a daily basis. And there are different types or classes of requests you must review. Some requests are specific to you, some to your direct team members, and other requests are company wide actions you must review, but you are required to review and make decision/action on all of them at some point during the day or week.

In a GUI you might navigate to the screen and see some quick filters that let's you select which requests types to view or all requests might be shown in some sorted of grouped order on a scrolling page. And you would navigate through the information to decide what to do first. There are many ways to represent the request and perform filtering and sorting. It also depends on the your preferences for what tasks you like to tackle first and how you manage your day. The ways to make the GUI optimal for your particular usage pattern depends on many factors, many of whom are specific to you. This is where GUIs begin to breakdown. They can sometimes overwhelm us with information or not adapt to our unique usage patterns - basically they lack the ability to easily adapt to the extreme personalization you can achieve with a CUI.

All Things being Equal CUIs Win

Now this all depends on your implementation of the GUI and CUI. You can obviously build a horrible CUI and a wonderful ultra personalized GUI, but all things being equal I claim that a CUI will always be superior. The advantage the CUI has is that the user experience does not have to change significantly to improve personalization. The inherent nature of conversational interaction is something natural to all humans. So as a developer, as you improve the conversational flow of your virtual assistant, and as you release new versions of the bot, the user can adapt much more easily because the flow is fundamentally the same, the interaction is still a conversation and it is the bot that is getting smarter/better and always assisting you through the same conversational experience to help you accomplish your objectives.

So let's go back to our request/action application we described earlier. In the CUI case, a smart virtual assistant might look at all the pending requests and tell you have many pending requests of different types and suggest you start with your direct reports first, since there is only one of those, if that was the case. The ability of the CUI to branch off in different directions is much more dynamic than what a GUI can do, and this can be done in a way that does not force the user to learn a brand new interface with each new release of the software, since the virtual assistant is your interface and your personal guide at the same time. The CUI sort of has built in help since it is a virtual assistant by design.

Mixed Mode CUI and GUI

Many of the popular bot platforms such as Slack and Facebook Messenger have incorporated some very convenient visual GUI components into their messaging and flows which allow developers to mix conversation question/answer dialog with short-cut GUI actions and interactions. This can be great way to meld conversational with GUI, but at the same time it can get CUI developers sucked back into the GUI world, and get developers focusing too much on injecting too many GUI interactions into the CUI flow. So use these tools wisely and keep the interactions focused on the conversational flow between the user and the human like virtual assistant.

Grow your Bot Like You are Raising a Child

So be careful, always think like you are building a smart virtual assistant not a GUI. You will not have all the smarts built into the bot from day one, but make that your mission to keep the conversational flow focused on the interactions and dialog between the human user and the virtual assistant, and everything else will follow as your bot gets smarter.

Saturday, February 3, 2018

Can AI Put the Human Back into HR?

Over the past couple of decades Human Resources (HR) has moved from paper and manual processes to more automation and providing employees with more and more web and mobile powered experiences. But has HR lost its human essence and personalization in the process? Nowadays managing a company's "human" assets seems to involve less and less human interaction between employees and the actual people in the HR organization. Try finding an HR person when you need one these day!

In the Beginning there were Humans

I recall my first few jobs at both major corporations and at startups, there was always a HR person I could reach on the phone, or simply walk over to their office to get small and big matters resolved. HR personnel where typically visible and accessible. Human Resources representatives interacted at a personal level with employees and often knew you on a first name basis.

It seems with more technology that HR has lost its human to human interaction and become more impersonal and mechanical. Today, if you need some payroll or benefits issue resolved, you typically need to submit a "ticket" in some online system and wait for someone (you have never met) to contact you back over email or if you are luck over the phone. Or if you are lucky you can click your way through a labyrinth of HR GUI applications to find what you need.

Virtual Assistants are the New Humans

So what is the solution, more or less technology? Maybe the answer is better technology. AI powered HR virtual assistants have the potential to be your personalized guide to do everything from answering general HR questions to requesting time-off and helping guide you to finding the information or actions you need to take quickly. Machine learning powered intelligent assistants could help resolve problems and questions with your payroll, for example. Interactive NLP powered bots could help guide you through, an often times, complicated benefits and open enrollment process by knowing your preferences and history to match you with the optimal recommendations for your particular situation.

The Future is an AI Powered Voice and Messaging-First World

The future of HR is not more technology, but more intelligent technology powered by machine learning, natural language processing, personalized recommendations engines, and other AI enabled technology that bring hyper-personalization and a human-to-computer interaction model that goes beyond impersonal graphical user interfaces. Voice and messaging-first interfaces (endowed with NLP) are a step in the right direction and can bring back a bit of humanity to your technology overloaded workplace.