tag:blogger.com,1999:blog-26464625711244763262024-02-19T11:16:35.958-08:00Cloud Analytics & ML with Sam Taha | Adventures in Big Data, Analytics and MLUnknownnoreply@blogger.comBlogger98125tag:blogger.com,1999:blog-2646462571124476326.post-20670731112718772592022-11-22T04:30:00.006-08:002022-11-22T04:42:53.480-08:00The Web3 and Blockchain Wasteland<p> </p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0Y0KQqb8pv32OinZduqU9PeZSM4fd9Aet22egjtcTY9b9Jjr_jz6qs0NeRZHVSmo0PwW9UavJD6244H5Y_cBoxjOlhGwQybBWaplCRBvfwOSiI_zCV7KbxLienwSqryAjQKg1x7dYBfL9_xZxCHDgtgRJGOwEn0qydF1mogF6FgYgnHv59eyvdreP/s1200/bitcoin-radioactive.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="650" data-original-width="1200" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0Y0KQqb8pv32OinZduqU9PeZSM4fd9Aet22egjtcTY9b9Jjr_jz6qs0NeRZHVSmo0PwW9UavJD6244H5Y_cBoxjOlhGwQybBWaplCRBvfwOSiI_zCV7KbxLienwSqryAjQKg1x7dYBfL9_xZxCHDgtgRJGOwEn0qydF1mogF6FgYgnHv59eyvdreP/w400-h216/bitcoin-radioactive.jpg" width="400" /></a></div><p></p><p>Bubbles happen. But you hope what is left behind after the bubble supernova is at least useful to science or society. Like the dot.com bubble for example - we at least ended up with the web and internet/mobile innovations after all the dust cleared. There are other bubbles brewing with no clear end game as well such as fusion technology and self driving cars to mention a few but these as well will leave useful science behind as they cycle from winters and summers. Unfortunately with crypto, blockchain and web3 what will be left behind is only radioactive waste and greed. </p><p>Other than ransomware, war zones and failed states where have digital coins been used for commerce? And no, saying “but blockchain is awesome” doesn’t fly either. Ask the same question about the technological mythical Satoshi Nakamoto blockchain marvel. Decentralization is great in theory to free us of the shackles of centralized control - but rule by the mob and the digital immutable consensus powered blockchain ledger of decentralized servers and exchanges (btw still ruled by a few) is not going to lead to a happy ending.</p><p><a href="https://americanaffairsjournal.org/2022/11/web3-the-metaverse-and-the-lack-of-useful-innovation/">https://americanaffairsjournal.org/2022/11/web3-the-metaverse-and-the-lack-of-useful-innovation/</a></p><p></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-89837153950373745462021-12-31T08:30:00.007-08:002021-12-31T08:36:11.994-08:00Machine Learning is Not AI<p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjSB2lwWhkiFyoakziNj27A_YCc8k4EbkJLNSpbCkJ3cQFdxUVfrx_u4v5p6UtW04_ZD1zuv0MQKaFZ3Ea7ka7JDEP3lygacoIZoTMECxf9j-POD8wHcWSOy9diqRwsjUthLsYnaOQtRCFh3ixh3AdecEbKFCdMSOM2a4NYfkFrYzLnQmOD0m0Xl_wN=s2496" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1430" data-original-width="2496" height="366" src="https://blogger.googleusercontent.com/img/a/AVvXsEjSB2lwWhkiFyoakziNj27A_YCc8k4EbkJLNSpbCkJ3cQFdxUVfrx_u4v5p6UtW04_ZD1zuv0MQKaFZ3Ea7ka7JDEP3lygacoIZoTMECxf9j-POD8wHcWSOy9diqRwsjUthLsYnaOQtRCFh3ixh3AdecEbKFCdMSOM2a4NYfkFrYzLnQmOD0m0Xl_wN=w640-h366" width="640" /></a></div><p></p><p>We need to stop referring to today's machine learning as AI. It is marketing techno spin no more and no less. There is nothing intelligent about it. We are nowhere near general or even narrow artificial intelligence. </p><h3 style="text-align: left;">Deep Learning has Come Far </h3><p>Machine learning, largely driven by deep learning has fueled an amazing explosion of solutions from image recognition to game play to language parsing and analysis, however none of this approximates human intelligence. Much like engineered systems there needs to be deeper structures and abstractions to model the world with and not just more parameters, layers and experimenting with activation functions. </p><h3 style="text-align: left;">More is Not the Answer <br /></h3><p>Just adding more neurons, more deep layers and experimenting with activation functions and hyper-parameters or understanding bias vs variance is not going to get us to anything like intelligence. These concepts are all useful for better machine learning and preparing data for achieving the best predictions possible given the data available, but it is not AI. As an engineer I find it amusing to sit there and fiddle with a LSTM to get it to predict temporal events given a set of historical events and independent variables. You definitely get statistical insights into the problem you are modeling (if your data is in good shape that is), but again this is not intelligence.</p><p>GPT-X and transformers are an example of how not to do intelligence with billions and parameters and massive power consumption. There have been some cool results and solutions that have come out of all this, but it is a brainless black box pattern recognition and association - nothing wrong with that - but don't call it AI. Without incorporating higher order abstractions including causality and relationships, I don't see how today's ML and DL can be called AI.<br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-36767813277734379162021-11-26T12:35:00.009-08:002022-01-03T15:59:28.320-08:00Modern Cloud Data Lake/Warehouse: Don't get Locked-in All Over Again<p style="text-align: left;"></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmHE7c8W61Wdb2z6lKwTrlt6siCHQn1d82R88nx2TkkREnf9_jo9PeajHVBZvDFU6Fa4766sswNcRF_0O8LOOi6w4jL5S4TQSHhJE9TxTj6Y3W8RcqS5RPQiNfjmniXMqqgTf1V2bYcSE/s1974/Screen+Shot+2021-11-26+at+2.31.35+PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="792" data-original-width="1974" height="257" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmHE7c8W61Wdb2z6lKwTrlt6siCHQn1d82R88nx2TkkREnf9_jo9PeajHVBZvDFU6Fa4766sswNcRF_0O8LOOi6w4jL5S4TQSHhJE9TxTj6Y3W8RcqS5RPQiNfjmniXMqqgTf1V2bYcSE/w640-h257/Screen+Shot+2021-11-26+at+2.31.35+PM.png" width="640" /></a></div><p></p><p style="text-align: left;">When relational databases, data warehouses, and data marts took root in the late 1990s, our data and our database systems were just down the hall in a rack sitting in a server room. Our data resided on our own property and in server rooms we controlled. While we had physical control over our data, we were tightly bound to the software vendor's proprietary software/hardware systems, storage formats and SQL dialect of provided by the database vendor. Everything from SQL dialect to storage formats where at the time much less standardized than what we have today. </p><div style="text-align: left;"><span style="font-size: medium;"><b>Proprietary Storage Engines </b></span></div><div style="text-align: left;"><p>We were entrusting our data storage and query engine interfaces to a software vendor (at the time Oracle, Sybase, IBM Informix, SQL Server...etc). Our data was in a proprietary storage format controlled by the database software vendor and we are their mercy of future support and licensing for continued access to our data.</p></div><div style="text-align: left;"><b><span style="font-size: medium;">Colocation and Hosting<br /></span></b></div><div style="text-align: left;"><p>As the internet grew, along came specialized hosting data centers. We moved our OLTP databases, OLAP data warehouses and our app servers to secured cages sitting in a remote climate controlled and network optimized shared complex. We owned the hardware but the software managing the data was still in proprietary storage formats from the big database vendors like Oracle, SQL Server, Sybase...etc. With this transition our data moved a bit further from our control, since we were giving up some physical control and access to the infrastructure for the benefits of colocation.</p></div><div style="text-align: left;"><div style="text-align: left;"><b><span style="font-size: medium;">Moving to the Public Cloud<br /></span></b></div></div><div style="text-align: left;"><p>Then came the public cloud and infrastructure as a service. We moved our database systems on to virtual hardware and managed storage and networking controlled by a cloud provider. We no longer own or control the physical infrastructure or managed a physical space in a data center. This had many benefits with provisioning infrastructure, remote/automated management and brought the virtually unlimited incremental scalability of cloud compute and storage. However, our data is still locked up in proprietary storage engines. Either we are using up our own Oracle or Teradata software licenses on virtual machines in the cloud or we are using more cloud native data warehouse services such as Redshift or BigQuery.</p></div><div style="text-align: left;"><p>Why does all this history matter? The less control we have over our data systems, the more restrictions we will have on future opportunities for using the raw data, metadata and related processing logic (e.g. SQL, DML, UDFs....etc) and not to mention manage costs and licensing. When your data is stored in a vendors storage engine (Oracle, Teradata...etc) your data is stored in their proprietary format and you are typically limited to using only their query engine and tooling to access your data. </p></div><div style="text-align: left;"><p>When your servers are in a remote site outside your property you rely on the data center for security and management. Then when you host your server in the public cloud there are additional layers of software involved from compute/storage virtualization to shared infrastructure services. These all add to the loss of control over your data and your ability access your data and utilize it without paying a service fee of some kind. This loss of control can mean lack of options (end of life scenarios for example) that impact portability, scalability and managing costs overall. </p></div><div style="text-align: left;"><div style="text-align: left;"><b><span style="font-size: medium;">The Rise of Cloud Data Warehouse Vendors</span> </b></div></div><div style="text-align: left;"><p>Being dependent on a cloud provider and database software vendors (sometimes one in the same) in how you use your data needs to be front of mind in your cloud data warehouse architecture. While it is not practical to have 100% portability from cloud infrastructure providers or from your software vendors, one needs to consider how best to leverage open source and storage standards and keeping the door open to hybrid cloud options (or cloud provider portability) whenever possible when it comes to your data platform. I am a firm believer in making these conscious decisions upfront or you will just be repeating history of the last two decades of Teradata, Netezza and Oracle type lock-in, like many enterprises are trying to unwind today.</p></div><div style="text-align: left;"><p>The lock-in scenarios with proprietary data warehousing storage engines has not changed much with the public cloud providers. Data warehouse engines such as Redshift and BigQuery still store the data in proprietary formats. They offer much greater data integration flexibility with other cloud services than legacy data warehouse vendors do, but you are still at the mercy of their proprietary storage. <br /></p></div><div style="text-align: left;"><span style="font-size: medium;"><b>Does Your Cloud DW Reside in Your Cloud Account</b></span><span style="font-size: medium;"><b>?</b></span></div><div style="text-align: left;"><p>There are now other newer players in the cloud data warehousing space with the leader being Snowflake and others coming online to provide Data-as-a-Service solutions. With SaaS data warehousing providers now your data is residing in S3 but is controlled by a data systems SaaS provider such as Snowflake in a different AWS account (or different Azure account). This does not bring you any more control over your data. With the SaaS data-as-a-service vendors, you are still at their data mercy and lock-in and even worse your data is not residing in your cloud account. It is one thing to have your data in the public cloud, it is another to have your data in another AWS account. This can be fine for for some enterprises, but this needs to be clearly understood that with solutions such as Snowflake, what kind of control (for better or worse) you are delegating to your database vendor.</p></div><div style="text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpAIuePDsnhFC1bsQIYfGCYvjwWeftaYD5pPlHiSgYAq7eCMAjAPN_gXDE0ke2EDmMHH8NuBlxMK6ozg8zpRwo-qoaL64ZtTSJK8Q2vXk2PzhqbfZzTj3AvneFk1X_QYNqvf0FzFHB_2c/s2526/Screen+Shot+2021-11-26+at+2.34.02+PM.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1020" data-original-width="2526" height="161" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpAIuePDsnhFC1bsQIYfGCYvjwWeftaYD5pPlHiSgYAq7eCMAjAPN_gXDE0ke2EDmMHH8NuBlxMK6ozg8zpRwo-qoaL64ZtTSJK8Q2vXk2PzhqbfZzTj3AvneFk1X_QYNqvf0FzFHB_2c/w400-h161/Screen+Shot+2021-11-26+at+2.34.02+PM.png" width="400" /></a></div><div style="text-align: left;"><br /></div><div style="text-align: left;"><span style="font-size: medium;"><b>So what is the solution to this lock-in?</b></span></div><p style="text-align: left;">Big Data with open source Hadoop attempted to address the proprietary lock in problem. In the late 2000s, Hadoop took off and started to at least put some of your data into open standard data formats and on commodity hardware with less vendor lock-in (to a fair degree). While Hadoop had its challenges that I won't get into here, it did usher in a new era of Big Data thinking and a Cambrian like explosion of open source data technology and democratization. Data specs such as ORC, Avro, Parquet and distributed file systems such as HDFS gave transparency to your data and modularity to managing growth and costs. You no longer depended exclusively on proprietary data storage engines, query engines and storage formats. So with Hadoop at the time we could claim to gaining some degrees and freedom and improved control over our data and software.</p><p style="text-align: left;">Well now that on-premises Hadoop is dying off (it is dead for the most part) and cloud storage engines and data lakes are taking over. Some of these cloud native storage solutions and data lake storage engines in the cloud have largely adopted the many open data standards of Hadoop (Parquet, Avro, ORC, Snappy, Arrow....etc). These cloud native data lake house products can keep you close to your data. Solutions such as Athena, Presto, and managed Databricks let you manage your data in open data formats and while storing the data on highly elastic and scalable cloud object storage.<br /></p><p style="text-align: left;">However, other cloud data warehousing vendors have emerged and bringing back the lock-in, meaning your data resides outside your cloud account and in proprietary storage and with proprietary query engines.Vendors such as Redshift, Snowflake, BigQuery and Firebolt each have pros and cons with the type and level of lock-in they impose.</p><div style="text-align: left;"><span style="font-size: medium;"><b>It's All About the Data Lake<br /></b></span></div><p style="text-align: left;">Many of these engines to offer descent integration with open standards. For example Redshift, Snowflake and BigQuery all for example do allow fairly easy ingestion and export to open data standards such as Parquet and ORC. Lock-in is not a bad thing if the solution rocks and is cost effective in the long-term. Sometimes specialized proprieties compression and unique architectures do things not possible with open standards of the present day. You be the judge. Or just let your successor in four years deal with it :)</p><p style="text-align: left;">The one bit of advise I would give when building a cloud data platform, is to always base your architecture on a data lake house foundation using open data storage standards, elastic cloud storage and a distributed SQL query engine. Your choice of Redshift, Snowflake, BigQuery and other downstream storage engines and other downstream analytics are critical but secondary.<br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-32798882998635070862021-04-15T21:18:00.013-07:002021-04-15T21:27:27.808-07:00Data Driven vs Data Model Driven Company<p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgKXj4g7zk4N-XawSe2Ieqf-eqOzRJpqNBUYh2-L81YqkG_5CitFITkAVmVufab-WZLQYhHkjxVcFK4DM_lMgEVV2YLdrZ9PzHpTFZDaU0ZUyJDDTmDJzkJb7OUHZ5djQXZ5hZDRM9T1_0/s899/no_datalake_dumping.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="852" data-original-width="899" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgKXj4g7zk4N-XawSe2Ieqf-eqOzRJpqNBUYh2-L81YqkG_5CitFITkAVmVufab-WZLQYhHkjxVcFK4DM_lMgEVV2YLdrZ9PzHpTFZDaU0ZUyJDDTmDJzkJb7OUHZ5djQXZ5hZDRM9T1_0/s320/no_datalake_dumping.jpg" width="320" /></a></div><p></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="background-color: white; font-family: arial;">Somehow along the way data lakes got the rap that you can dump "anything" into them. I think this is carry over from the failed hippie free data love days of Hadoop and HDFS. No, a data lake is not a place you dump any kind of json, text, xml, log data...etc and just crawl it with some magic schema crawler then rinse and repeat. Sure you can take an approach of consume raw sources and then crawl them to catalog the structure. But this is a narrow case that you do NOT do in a thoughtless way. In many cases you don't need a crawler. </span></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="background-color: white; font-family: arial;">Now with most data lakes you do want to consume in data raw form (ELT it more or less) but this does not mean just dump anything. You still must have expectations on structure and data schema contracts with the source systems you integrate with including dealing with schema evolution and partition planning. Formats like Avro, Parquet and ORC are there to transform your data into normalized and ultimately well curated (and DQ-ed) data models. Just because you got a "raw" zone in your data lake does not mean your entire data lake is a dumping ground of data of any type or your data source structures can just change at random.</span></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEVhBxMTTtStPlSpj5M3JBVo0Yy2aK0DTqDFRJERfxoCoHMISh7ZHW-p9dhZuL1AYwz0SStbXDUSFmsrSRtLsXyZKsV-jNhbMJb3iV0vVzoGeXZV3GeJD5LynuIfg3lY8CqyExEXxIitk/s800/mirical_data_science+.jpeg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="292" data-original-width="800" height="146" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEVhBxMTTtStPlSpj5M3JBVo0Yy2aK0DTqDFRJERfxoCoHMISh7ZHW-p9dhZuL1AYwz0SStbXDUSFmsrSRtLsXyZKsV-jNhbMJb3iV0vVzoGeXZV3GeJD5LynuIfg3lY8CqyExEXxIitk/w400-h146/mirical_data_science+.jpeg" width="400" /></a></div><p></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="background-color: white; font-family: arial;"><span style="color: rgba(0, 0, 0, 0.9);">Miracles required? This is what most of today's strategic AI and even BI/Analytics engineering and planning looks like. If you don't have your data modeled well and your data orchestration modularized and under reins then achieving the promise of cost effective and maintainable ML models and self-service BI is a leap of faith at best. Forget about being a data-driven company if you are not yet a data-model-driven company yet.</span></span></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="background-color: white; font-family: arial;">A data lake is a modern DW built on highly scalable cloud storage and compute and based on open data formats and open federated query engines. You can't escape the need for well thought out and curated data models. Does not matter you are using Parquet and S3 vs Snowflake and Redshift. Data models are what make BI and Analytics function.</span></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="background-color: white; font-family: arial;"><br /></span></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-44712657833275929402021-01-21T06:58:00.001-08:002021-01-21T06:58:43.679-08:00The AI Lesson for All of Us<p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"></span></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQKJ4Gv9vvWBB1ph9JjCmqpQtsRWhm9oJhvlO65Akpo-hEPBpY6S-lw3VHjzNb2o6rbjbpr-S3YY0wyTtTMUDEuH6hrczJbUC2lOtTl7mv56MoWv2VE0iGgs4esk5_kUFGeZAu0gc6faU/s800/human-robot-hand-head-header.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="249" data-original-width="800" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQKJ4Gv9vvWBB1ph9JjCmqpQtsRWhm9oJhvlO65Akpo-hEPBpY6S-lw3VHjzNb2o6rbjbpr-S3YY0wyTtTMUDEuH6hrczJbUC2lOtTl7mv56MoWv2VE0iGgs4esk5_kUFGeZAu0gc6faU/w640-h200/human-robot-hand-head-header.jpg" width="640" /></a></div><br /><p></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">There is no doubt that the brute force ML (aka deep learning) approach to achieve general AI or some level of human decision making by using more and more compute and more data has been successful over the past decade. </span></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">I am fond of believing that there is more to AI than optimizing an objective function with more data and better hyper parameters - for example, integrating symbolic AI, knowledge graphs, causality...etc. However, trying to build systems to think the way we think we think may not be the future of AI, at least not yet. </span></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">There is likely something beyond just bigger deep learning models - maybe it is software program synthesis or other genetically founded approaches - no one knows, as there is not enough research in these areas yet. But some form of AI is already here, self driving cars already use and construct 3D world models and utilize hand crafted rules mixed with deep learning sensor data analysis to give us the perception of AI decision making is going on. Efficiency also matters as we get into bigger and bigger models will billions of parameters. It is no joke how much energy some of the ML training (compute resources) that is required by many of these models (e.g. GPT-3). It is important to make sure we separate the hype (companies selling us on autonomous cars vs the value of some useful ML driver assistance) as companies use the AI hype to raise more capital but the reality is not aligned with the capabilities of generalized AI, at least in this current age of AI.<br /></span></p><p><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">ML </span><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-decoration-thickness: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">algorithms from the likes of </span>Youtube and Facebook already manipulate our digital lives and behaviors with massive data they collect about us. Maybe AI is already here and in control and we are just the data simulation to generate more data for our AI overlords :) Anyway, my main point with sharing this post to share the <a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">post from Sutton (The Bitter Lesson)</a> is to make us think about the data we control in business and enterprise world. Curating our data and more of it is what will still continue to drive ML and AI for the foreseeable future. So make sure to get your data quality and your data lakehouse BI/analytics in order ;)</span></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-41294564968248485322020-12-16T16:44:00.008-08:002020-12-17T10:13:40.947-08:002021 Data and Analytics Predictions<p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixUiFWf8VlD1lPBSogkVOk_jhDxc49nURKAWobcmeKRzvhjHraQNAosAw7Akn5wDiIVLer_B24JL59jsAzQpNTwD4U-rbHNlVzwMrjCamJUZ__-KfQN790pvqCs2goWX7NhgaagzOmenA/s800/Prediction-1.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="400" data-original-width="800" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixUiFWf8VlD1lPBSogkVOk_jhDxc49nURKAWobcmeKRzvhjHraQNAosAw7Akn5wDiIVLer_B24JL59jsAzQpNTwD4U-rbHNlVzwMrjCamJUZ__-KfQN790pvqCs2goWX7NhgaagzOmenA/w400-h200/Prediction-1.jpg" width="400" /></a></div><span color="rgba(0, 0, 0, 0.9)" face="-apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif" style="-webkit-text-stroke-width: 0px; background-color: white; display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"></span><p></p><p>Cloud data platforms really gained momentum in 2020. It has been a real breakout year for both cloud data lakes and cloud data warehouses (yeah, I am making a distinction). Cloud data warehouses started several years ago with Redshift and the first iteration of BigQuery. Databricks, AWS, Presto and others re-established the data lake in the cloud and made it very SQL friendly. Redshift and BigQuery have improved and made it possible and easier to now query external data lake storage directly (partitioned parquet, avro, csv....etc) and started to blend data lakes with data warehouses (somewhat). And to top it off this year, Snowflake put a massive stamp on everything with its financial market boom and accelerating adoption.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrZ_lkFPRRiFuCXqYmzAnF8F2G5x8_OfyVgMiHxMLoxoC0qoREyxLgefo0OBwuvV1EwQTMNt8DHTmREmddmEiKs6pxkfcxqXsUvDl6uFBE85Fz_yCC7Pa9h4hvBy75ZKVoyvqE6bogyFU/s696/data_viz.jpeg" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="483" data-original-width="696" height="222" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrZ_lkFPRRiFuCXqYmzAnF8F2G5x8_OfyVgMiHxMLoxoC0qoREyxLgefo0OBwuvV1EwQTMNt8DHTmREmddmEiKs6pxkfcxqXsUvDl6uFBE85Fz_yCC7Pa9h4hvBy75ZKVoyvqE6bogyFU/w320-h222/data_viz.jpeg" width="320" /></a></div><p></p><p>But we are still in the early days of the cloud data platform journey. We got a ways to go. Even with the cloud many of the solutions mentioned, along with others, still lock you into their propriety data walled gardens. In 2021 we will begin to see the next evolution of cloud data lakes/warehouses. It is not enough to separate compute from storage and just leverage the endless sea of elastic cloud storage and object storage. While this is an important step forward for data and analytics platforms, we need to go still further. We need to separate the query engine itself from the data and storage. This is the next step and it will be guided in part by leveraging data virtualization and establishing the physical storage structure of the data itself upon open standards.<br /></p><p>Data Virtualization will gain more traction (especially in the cloud) and begin to eclipse and encompass data warehousing and in particular for low latency BI and analytics where it already plays a big role. Minimizing data copying in your data lake/warehouse is important especially for your the semantic and BI layers in your lake which can often demand highly curated and optimized models.<br /></p><p>The key building blocks will include an open data lake foundation combined with data federation and high performance virtualization query engines coupled with cloud storage. And all on open standards. Think Apache Iceberg, Apache Hudi, Delta Lake, Apache Arrow, Project Nessie and other emerging open and cloud optimized big data standards.<br /></p><p>Solutions such as Snowflake, Redshift, BigQuery, and Databricks are still potential plug-able building blocks, but should not be confused as the sole foundation or centerpiece for your cloud data platform, otherwise you will be walling yourself off all over again with another Teradata, Netezza or Oracle, just this time in the cloud.<br /></p><p></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-90135274087922726992020-11-26T08:12:00.004-08:002020-11-26T08:12:43.911-08:00Are Open Cloud Data Lakes the Future?<p> </p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkmmEinUTyY0sxAMVSs8zZ10jjgtsvSBOgPI8VbjzqC66RCklhuWiyvCCRhpslQFnzIdoBZb77Bdm_9FgndxVj8FDgZYoexuWn_jaW74rLxlVBW_FsQXBjBnALezcWAFTrd5NlhLv8hxw/s800/whirlpool.jpeg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="800" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkmmEinUTyY0sxAMVSs8zZ10jjgtsvSBOgPI8VbjzqC66RCklhuWiyvCCRhpslQFnzIdoBZb77Bdm_9FgndxVj8FDgZYoexuWn_jaW74rLxlVBW_FsQXBjBnALezcWAFTrd5NlhLv8hxw/w400-h225/whirlpool.jpeg" width="400" /></a></div><span style="font-family: georgia;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Building a cloud data platform? First question: open Data Lake or proprietary DW or maybe a mix of both? Not a simple question or architecture decision to make given the flood solutions and players in the space from the large cloud platforms to new entrants such as Snowflake.<br /></span></span></span><p></p><p><span style="font-family: georgia;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">I see the Fivetran argument from<span> </span></span></span></span><a href="https://www.blogger.com/u/1/#">George Fraser</a><span style="font-family: georgia;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span> </span>that decoupled storage/compute cloud MPP DW engines such as Snowflake are the way to go. On the flip side I also see Dremio's<span> </span></span></span></span><a href="https://www.datanami.com/2020/11/05/data-lakes-are-legacy-tech-fivetran-ceo-says/" target="_blank">Tomer Shiran</a><span style="font-family: georgia;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span> </span>argument that an open data lake on open data storage standards (apache parquet & arrow) along with data virtualization is the way to go. </span></span></span></p><p><span style="font-family: georgia;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">What is the right answer? Well as with most things in engineering and technology there is no one size fits all. I do believe that data virtualization in the cloud along with cloud storage has been a game changer. Presto paved the way with demonstrating that data and query federation is possible, especially in a cloud environment. While HDFS/Hadoop largely fizzled for reasons I won't get into here, Parquet, Arrow and other Apache projects have taken off and brought us the modern data lake. Big data for both compute and storage has proved its scale and manageability in the cloud. </span></span></span></p><p><span style="font-family: georgia;"><span style="font-size: small;"><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">How much of your data to keep in a priority cloud DW vs an open cloud data lake is an important decision. There is a balance that does not lock you in totally and at the same time lets you use the best technology of the day while managing costs. Be wise.</span></span></span></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-85990751019822169112020-08-12T05:24:00.019-07:002020-11-26T07:56:53.103-08:00The Lost Art of Data Lineage<div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiV2a0WSh8OFKa28gy-jddR0Ypl40R91ZuyHP3QSIgYc74wU7jQQEhnWHLm4Av2kFzKIzYJqKMuB-zf3n1BXOrFUdpvwh9TIGnZ5Up2VBUBfBLOers1irwPG6IeQcqSSy0q1K7D8QYCYo4/s512/data-lineage.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="243" data-original-width="512" height="243" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiV2a0WSh8OFKa28gy-jddR0Ypl40R91ZuyHP3QSIgYc74wU7jQQEhnWHLm4Av2kFzKIzYJqKMuB-zf3n1BXOrFUdpvwh9TIGnZ5Up2VBUBfBLOers1irwPG6IeQcqSSy0q1K7D8QYCYo4/w512-h243/data-lineage.png" width="512" /></a><span style="font-family: arial;"> <br /></span></div><span style="font-family: arial;"><span style="font-size: medium;"> </span></span></div><div><span style="font-family: verdana; font-size: small;"><span>Maybe it is more of mangled and ill defined art than a lost art. Data lineage is one of the aspects of data governance that gets lost in the shuffle of data analytics and data warehousing/lake projects. It is vital for many reasons, least of them compliance and auditing. Often time data lineage never makes on the train to the final destination when building large scale data warehousing and analytics solutions.<br /><br />Part of the problem is that it gets turned into an all or nothing effort leading to very little getting delivered relating to data lineage or it gets turned into a mishmash of concepts and solution features. Often it gets dumped under auditing and logging, irrespective of business or technical metadata and real-time vs historical, and then forgotten.<br /><br />Let's quickly breakdown what data lineage actually entails. There are three dimensions of data lineage to consider as part of a general data governance strategy. These include: <br /><br /></span></span><div style="margin-left: 40px; text-align: left;"><span style="font-family: verdana; font-size: small;"><span>1) Logical Data Processing Flows<span> <i>(logical and/or visual DAG representation)</i></span></span><br /></span></div><div style="margin-left: 80px; text-align: left;"><span style="font-family: verdana; font-size: small;"><span>Defining the high-level visual graph and code module level relationships between data processes stagings and steps that produce and generate data models.</span><br /></span></div><div style="margin-left: 40px; text-align: left;"><span style="font-family: verdana; font-size: small;"><br /><span>2) Metadata Relationship Management <i><span>(low level data relationships and logic/code)</span></i></span><br /></span><div style="margin-left: 40px;"><span style="font-family: verdana; font-size: small;"><span>Tracking metadata relationships between data models (schemes/tables/columns) and the related source code used in the transformation. This includes showing the transformation logic/code used going from one or multiple source data models to a target data model. </span><br /></span></div><span style="font-family: verdana; font-size: small;"><br /><span>3) Physical Data Processing <span>History <i>(what happened, is happening, and going to happen)</i></span></span><br /></span></div><div style="margin-left: 40px; text-align: left;"><div style="margin-left: 40px; text-align: left;"><span style="font-family: verdana; font-size: small;"><span>At a data set level (sets of records, rows and columns), this shows the record level and data set linkage between data sets that have happened in the past or will happen in the future. There is a temporal (real-time and historical) aspect to this showing transaction events from one or multiple data sets feeding a downstream data set and the transactional breadcrumbs and events involved.</span><br /></span></div></div><span style="font-family: verdana; font-size: small;"><span> <br />Note, that the term <i>data model</i> denotes a static structure (more or less your schema/tables definitions) while <i>data sets</i> are live physical structures (the actual data at a record level) across one or multiple data models.<br /><br />Before you get started on your data lineage journey you need to decide to what extent you will implement one or all of these dimensions of data lineage as part of your overall data governance strategy. There are varying degrees of exactness and completeness to each one of them as well. And make sure to keep them distinct. <br /> <br />No one commercial tool will do the complete job. It is usually a combination of multiple tools, hand stitched software services, and best practices/conventions that will be necessary to do the job well and depending on your criteria for success.</span></span><p></p></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-82667479314081508262020-07-02T19:04:00.000-07:002020-07-02T19:04:40.611-07:00Tuning the Snowflake Data Cloud<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjz0XGuXu9uTx8by_gej37XDXX3cEU2VNqxUUvGyfNMoZ23a_y5YHVXE1mgseod76bHG5zcCSl2-QttYwQ-hsDQGl-GgjMnqwaJvKk3BCVyaGitmkOwk_IHrv_2aiudwq-tUxsC51vp3qE/s800/snowflake.jpeg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="500" data-original-width="800" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjz0XGuXu9uTx8by_gej37XDXX3cEU2VNqxUUvGyfNMoZ23a_y5YHVXE1mgseod76bHG5zcCSl2-QttYwQ-hsDQGl-GgjMnqwaJvKk3BCVyaGitmkOwk_IHrv_2aiudwq-tUxsC51vp3qE/s320/snowflake.jpeg" width="320" /></a></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">To be clear, I do not classify Snowflake as an OLAP or MPP database. It has these capabilities for sure, but being born in the cloud and only for the cloud, it has much more to offer. I consider it a "Data Fabric". Yea that is broad term, but Snowflake is really what Big Data and Hadoop were aspiring to achieve, but never did for reasons I won't get into here.</span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">What makes Snowflake a game changer for OLAP engines and data warehousing? The below listed features, shown in the diagram, are all true and not just marketing spin. How can Snowflake accomplish this? Built in the cloud and only for cloud - what does that really mean - Snowflake takes full advantage of two key superpowers only available in the cloud. 1) elastic and virtually limitless highly durable immutable storage and 2) spinning up virtually limitless compute. Starts with these two things, and lot more in Snowflake to deliver full package solution.</span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm9f-tJq3cUyZ4MNxgtSNZVKAg6_gaEjSHKIMbS4v4otPi3L1aMJxHxFq3rKmW7uBhmUJlNc9_kpSexRRVZXLt7nOIcPE5YU09-2MjhmKR9GBS3vJ2PfFdQPkPSX8FZ11kPy6c7n43WHM/s638/snowflake_just+works.jpeg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="359" data-original-width="638" height="244" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm9f-tJq3cUyZ4MNxgtSNZVKAg6_gaEjSHKIMbS4v4otPi3L1aMJxHxFq3rKmW7uBhmUJlNc9_kpSexRRVZXLt7nOIcPE5YU09-2MjhmKR9GBS3vJ2PfFdQPkPSX8FZ11kPy6c7n43WHM/s320/snowflake_just+works.jpeg" width="435" /></a></div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span></div><div><br /></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">If Snowflake<span></span></span><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline; float: none; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span> </span>has no developer/DBA configurable indexes, partitioning, distribution keys, vacuuming, stats tuning, storage tuning...etc, like other MPP/OLAP engines, is there anything I can really tune (be-careful with auto re-clustering)? With great power comes great responsibility. There are multiple things you can do to tune and optimize for performance. This means being careful to monitoring and manage costs because it can be too easy to scale up and out and this will cost you. From a schema modeling/design perspective there are some optimizations you can do to minimize compute scale up/out requirements (and thus costs). One of them is using cluster/sort keys, one of the few DDL things you can tune at metadata level. Also how you use materialized views and manage joins vs de-normalization are important considerations. All these things are highly dependent on downstream consumption/usage patterns. So yes you still need good data engineers and architects :)</span></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-76705029920479573552020-06-11T09:21:00.004-07:002020-06-12T07:42:40.447-07:00Is ML Curve Fitting The Best We Got?<div class="feed-shared-update-v2__description-wrapper ember-view" id="ember1280" style="-webkit-text-stroke-width: 0px; background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; margin: 0px; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; vertical-align: baseline; white-space: normal; word-spacing: 0px;"><div class="feed-shared-inline-show-more-text feed-shared-update-v2__description feed-shared-inline-show-more-text--minimal-padding feed-shared-inline-show-more-text--expanded ember-view" data-artdeco-is-focused="true" id="ember1281" style="-webkit-line-clamp: initial; background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; display: block; font-size: 16px; line-height: 2rem; margin: 0px 16px; max-height: none; max-width: 928px; outline: currentcolor none medium; overflow: hidden; padding: 0px; position: relative; vertical-align: baseline;" tabindex="-1"><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" id="ember1282" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJwNQrgVK34wylSVN0xkKAxz4bCgebBJNDnbk1phws_rnsBnaWQ1G-nma36ecVpfhVYq7Xm35wU6KaUwJz0dDi5fschI1cYgIh4V1j6uan8RVJJbVE0hSZedvtQSi3-5NBY1YVICo4eno/s328/causality.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="154" data-original-width="328" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJwNQrgVK34wylSVN0xkKAxz4bCgebBJNDnbk1phws_rnsBnaWQ1G-nma36ecVpfhVYq7Xm35wU6KaUwJz0dDi5fschI1cYgIh4V1j6uan8RVJJbVE0hSZedvtQSi3-5NBY1YVICo4eno/s320/causality.png" width="383" /></a></div><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;"><br /></span></span></div><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;">Curve Fitting is for the most part what most machine learning boils down to, not that that is a bad thing. How do go be beyond the correlation of the black box? I see the rediscovery of symbolic AI and the introduction of casualty into purely probabilistic ML analogous to what happened in software decades ago when we evolved from assembler and procedural languages and we started to model software/data as richer abstractions with relationships. Not the same thing, but a similar evolution in engineering and computer science.</span></span></div><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;"><br /></span></span></div><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNiGzx_0AvG6PaemNKBjofisq9E-KGLjeMIOvr1QXEZjtvJ0J1Dhm5NlY_KT5oDQelFJsQ5Msbc4o8ua_xib0ktTSs58-es-hcit3BFTfL4URRFn6JAW9IZxSRR47tGkdFhx-musSfKYQ/s300/causal_balls_imact.jpeg" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="168" data-original-width="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNiGzx_0AvG6PaemNKBjofisq9E-KGLjeMIOvr1QXEZjtvJ0J1Dhm5NlY_KT5oDQelFJsQ5Msbc4o8ua_xib0ktTSs58-es-hcit3BFTfL4URRFn6JAW9IZxSRR47tGkdFhx-musSfKYQ/" /></a></div><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;"></span></span></div><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;">Causal relationships exist in the world and can influence how we collect our data and engineer the features that drive our ML model training. This includes everything from how we analyze covariance in the data and in how we manage and monitor data distributions. Collecting data and engineering features is not enough. Understanding causal relationships can sometimes be gleaned from the data we observer, but often times we must look at how we can develop experiments and interventions with A/B test strategies and multi-armed banded processes to uncover the causality in order to better train our models. <br /></span></span></div><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;"><br /></span></span></div><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;">Intervention and experiments can help us answer some "what if questions" and then you have counterfactual, which are beyond the reach of most experiments, yet understanding causal relationships have the potential to offer us insights and help business make better since of the world and their opportunities. We need better tools and engineering processes to incorporate these skills into our ML frameworks and ML processes.<br /></span></span></div><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;"><br /></span></span></div><div class="feed-shared-text relative feed-shared-update-v2__commentary ember-view" dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; padding: 0px; position: relative; vertical-align: baseline;"><span class="break-words" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; word-break: break-word !important;"><span dir="ltr" style="background: transparent none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; font-size: 14px; line-height: inherit; margin: 0px; outline: currentcolor none 0px; padding: 0px; vertical-align: baseline;">This is starting to happen in AI and ML today across disciplines that are applying ML. This is a <a href="https://www.inference.vc/untitled/" target="_blank">good article</a> on the topic that I suggest all ML engineers and data scientists to read.<br /></span></span></div></div></div><article class="feed-shared-update-v2__content feed-shared-article ember-view" id="ember1283" style="-webkit-text-stroke-width: 0px; background: rgb(243, 246, 248) none repeat scroll 0% 0%; border: 0px none; box-sizing: inherit; color: rgba(0, 0, 0, 0.9); display: block; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; margin: 8px 0px 0px; overflow: hidden; padding: 0px; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; vertical-align: baseline; white-space: normal; word-spacing: 0px;"></article><br class="Apple-interchange-newline" /><br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-87481237380704711882020-06-05T15:02:00.000-07:002020-06-05T15:02:06.232-07:00Choosing an ML Cloud Platform: GCP vs AWS<div>ML cloud services are evolving fast and furious. GCP and AWS are the leading players. Here is a quick visual peak at both ML tech stacks.</div><div><br /></div><div>AWS has SageMaker as the centerpiece: <br /></div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYYuQR_pB5A_H-sDhOu_fIpgnKUvj5frfn8A_T8Hq9f1EFs-iJl0rs54s_dNDjMCMLkLKLv3ZnwSfgGH0YpURzHVbiC3-EiBVxOCM6JBVdgevdmnaQKtXz6Xn3UWURuwvIQxZ7GBHO8hE/s800/aws_ml_tech_stack.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="437" data-original-width="800" height="274" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYYuQR_pB5A_H-sDhOu_fIpgnKUvj5frfn8A_T8Hq9f1EFs-iJl0rs54s_dNDjMCMLkLKLv3ZnwSfgGH0YpURzHVbiC3-EiBVxOCM6JBVdgevdmnaQKtXz6Xn3UWURuwvIQxZ7GBHO8hE/s320/aws_ml_tech_stack.jpeg" width="502" /></a></div><div><br /></div><div>Then there is GCP with its Kubeflow angle and on-premises hybrid cloud options:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb40ERdNedTObxhmIXh0qFDRCnqyux-xsxdy70YW-FNs6bXlPMOszfWWQPrUeXOpygoTWdeovuFsNQLRCqnjEPfDsKw-qDPlba0a1iG1Xmxn-Ci5MuD0WkW_YUMDBo0eDZ3Cx5pyH4eis/s1022/gcp_ml_stack.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="494" data-original-width="1022" height="265" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb40ERdNedTObxhmIXh0qFDRCnqyux-xsxdy70YW-FNs6bXlPMOszfWWQPrUeXOpygoTWdeovuFsNQLRCqnjEPfDsKw-qDPlba0a1iG1Xmxn-Ci5MuD0WkW_YUMDBo0eDZ3Cx5pyH4eis/s320/gcp_ml_stack.jpeg" width="548" /></a></div><div><br /></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-45100745885749588972020-06-02T11:11:00.000-07:002020-06-02T11:11:09.076-07:00Cloud OLAP: Choosing between Redshift, Snowflake, BigQuery or other?<div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjShoHell7-4Woe0bHd2HgZG1NNtQB0YoswEtKZTVVXILdXEQEm8-Ol13JzP1Jsj5Xje0u8DRGajacY6gvGbnHsPNncKavXq3PqSRWaV_J-Az2LZmixngsDJm-ukBTGvw6j3_DCcSJDDSA/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="519" data-original-width="800" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjShoHell7-4Woe0bHd2HgZG1NNtQB0YoswEtKZTVVXILdXEQEm8-Ol13JzP1Jsj5Xje0u8DRGajacY6gvGbnHsPNncKavXq3PqSRWaV_J-Az2LZmixngsDJm-ukBTGvw6j3_DCcSJDDSA/s320/snowfake_vs_bq.jpeg" width="320" /></a></div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><br /></span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">Which to choose for your cloud OLAP engine? There are a lot of choices when it comes to cloud based analytics engines. All the major clouds have their homemade solution (GCP/BigQuery, AWS/Redshift, Azure) and their are plenty of independent options from Snowflake to Databricks to mention a few.</span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><br /></span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">Which is right for your business and in what situation? Needs can vary from internal data exploration to driving downstream analytics with tight SLA. I am a strong proponent of the approach that no matter what you do that you have start with a foundational data lake blueprint and you then choose to build that with either an open source analytics engine on top of your cloud data lake or license a commercial analytic engine such as Redshift, Snowflake or BigQuery.<br /></span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><br /></span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"> There is no one answer without looking at your business needs, existing technical foundation and strategic direction, but I have to say have I am getting more impressed with Snowflake as the product matures. Without getting to deep into the details, Snowflake is sort of an in memory (backed by public cloud object storage) data lake with a highly elastic in-memory MPP layer. There are many pros and cons in selecting the best option for your business. The edge Snowflake has it is cloud agnostic (sort of the Anthos of the data cloud) and I really like their cross cloud and data center replication feature (recently released feature) and cross cloud management.</span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><br /></span></div><div><span style="-webkit-text-stroke-width: 0px; background-color: white; color: rgba(0, 0, 0, 0.9); display: inline !important; float: none; font-family: -apple-system, system-ui, system-ui, "Segoe UI", Roboto, "Helvetica Neue", "Fira Sans", Ubuntu, Oxygen, "Oxygen Sans", Cantarell, "Droid Sans", "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Lucida Grande", Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">If you want to discuss how to approach making this decision process look me up!<br /></span></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-89592620378240285382020-01-24T08:58:00.002-08:002020-02-06T07:52:54.350-08:00Why Spark is the Wrong Abstraction<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9Hmkp-3L1K6XPhCbTpTuEkbzy9vNia2RZZUcEJHMJaUIqwPNLoQYvgpeP05RIRkB6KHUOmOorlM3avKRUDdGF__x67QP17JTTxq_HlpNM1M9ePHMjIu_JcbG5sggSqloWardrPxsQQbg/s1600/sunset_on_spark.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="560" data-original-width="900" height="248" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9Hmkp-3L1K6XPhCbTpTuEkbzy9vNia2RZZUcEJHMJaUIqwPNLoQYvgpeP05RIRkB6KHUOmOorlM3avKRUDdGF__x67QP17JTTxq_HlpNM1M9ePHMjIu_JcbG5sggSqloWardrPxsQQbg/s400/sunset_on_spark.jpg" width="400" /></a></div>
<br />
Is the sun setting on Spark? I don't want to knock Spark and frameworks like it, they have had their moment in the sun. Spark was a reasonable and important successor to Map/Reduce & HDFS/Hadoop, but its time has come to be exiled to the fridges of the big data ecosystem and used only when absolutely necessary. Spark is still has usefulness for some specialized ETL and data processing data applications, but overall Spark can be a massive overkill and burden to program and operate (expensive too). In many cases it is inefficient both in development, troubleshooting and the overhead of infrastructure management can be expensive relative to other options.<br />
<br />
<h4>
Not Everything is a Nail</h4>
I see many projects using Spark (and tools related to it such as aws emr for example) for transforming and moving data into and out of data lakes when often times simpler tools can be used. Spark is a pretty big and complex sledge hammer and many problems can often be solved with more effective tooling. The ever growing ubiquity of serverless technology, especially database and analytics capable services, have presented engineers with many more options and it is time to dial back when it is most appropriate to consider bringing out the Spark sledge hammer. <br />
<br />
In a lot of cases, with Spark development, you end up writing a one-off database engine for your specific ETL scenario and using Spark's distributed compute and storage abstractions and DAG engine makes that convenient. While it is possible to use Spark as a database engine of sorts, the reality is databases are better at optimization and using the available compute/storage resources. And this is specially the case with serverless technologies that support SQL as a first class citizen. For Spark SQL is really does a bolted it on.<br />
<br />
The only big challenge is that most database platforms are not designed for the cloud or for elastic compute/storage like Spark sort of is. I say sort of, because Spark leaves too much responsibility on the developer/DevOps to make data/compute optimization decisions and infrastructure decision which is something databases are intrinsically good at.<br />
<br />
<h4>
Declarative vs Imperative Analytics</h4>
Now, there are Spark serverless type of services as well (like aws glue and other
managed Spark services), but given the general purpose nature of Spark, this
still leaves optimizations and resource alloation a challenge for developers. What are the alternatives? I really like Presto and in particular the serverless aws flavor of it (Athena) as well as services like BigQuery. Tools like these are the future of big data ETL and analytics. Spark can still be useful in heavy data transformation scenarios and complex feature engineering, but not as a general data analytics engine and data movement engine. Streaming is one specialized area where solutions like Spark can still play, but there are many other solutions better designed out of the box for streaming and cloud scale-out. Spark has in many respects tried to be all things to all people. It has continuously expanding support for SQL semantics and has incorporating APIs for streaming...etc. This has made Spark a versatile framework and API for developers, but as a general purpose ETL and data analytics engine, I think there are now other options.<br />
<br />
While the sun may not completely set on Spark, and tools like it, the declarative power of SQL will win in the end over the imperative programming model of Spark. This has been proven time and time again in the database and analytics tech space.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-7444761860845699002020-01-21T09:37:00.001-08:002020-01-21T09:37:07.616-08:00Data Lakes before AI/ML/Analytics (cart before horse thing)<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcD0OtEsw5P4LwYt2NOHkA33daFM7b7TA69ISTDW7UTdA21DfQQ0O9GWYcvvZwoIATjtQbpOh2pccYX5ORbWbSzDRpD8VjRSQACSbufHJwnA_S8SWSbu_tLb_h2MAHDgVoWndGPlskxXA/s1600/datalake.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="323" data-original-width="321" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcD0OtEsw5P4LwYt2NOHkA33daFM7b7TA69ISTDW7UTdA21DfQQ0O9GWYcvvZwoIATjtQbpOh2pccYX5ORbWbSzDRpD8VjRSQACSbufHJwnA_S8SWSbu_tLb_h2MAHDgVoWndGPlskxXA/s400/datalake.png" width="397" /></a></div>
<br />
Don't start or continue your AI and predictive analytics journey without building the necessary data infrastructure underpinnings. And that starts, first and foremost, with building a cloud data lake that is designed to meet the data and compute hungry needs of your . Why build a cloud data lake first?<br />
<br />
1) Economics<br />
2) Elastic compute<br />
3) Elastic Storage <br />
4) Storing (almost) everything<br />
5) ML Model engineering<br />
6) Feeding downstream analytics<br />
7) Feeding downstream operational data stores<br />
8) Data exploration, experimentation and discovery<br />
<br />
A cloud data lake makes all the above possible at scale.<br />
<br />
Building a cloud data lake securely and in an architecturally effective manner is achievable and will make your downstream AI/ML/Analytics journey attainable and long-term sustainable. Don't start your journey without this foundation.<br />
<br />
<br />
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-18863165469484044732019-09-09T18:19:00.002-07:002019-09-21T09:11:30.080-07:00Know Where Your Data Lake Has Been?<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcFBAIA4tZTwUEvomgMVccq30fKZ4f_4Wz5SQimX2DihvB73T3okI_OYjFAUiWheRhZR3ODciZdDBSiA_6xuKx9Zs69T6Dj63wVvvCognv1UWeXZJo3EwhTX6sTDxuqGl8pAhrTlzaAxY/s1600/The+Journey+of+Data+%25287%2529.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="404" data-original-width="596" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcFBAIA4tZTwUEvomgMVccq30fKZ4f_4Wz5SQimX2DihvB73T3okI_OYjFAUiWheRhZR3ODciZdDBSiA_6xuKx9Zs69T6Dj63wVvvCognv1UWeXZJo3EwhTX6sTDxuqGl8pAhrTlzaAxY/s1600/The+Journey+of+Data+%25287%2529.png" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">The
foundation of a good data management strategy is based on number skills
including data policies, best practices, and technology/tools as shown
in the diagram above.</span> The
most commonly discussed term of the bunch, from the ones listed in the
diagram, is Data Governance. This term is tossed around a lot when
describing the organizational best practices and technical processes
needed to have a sound data practice. Data governance can often be
ambiguous and all encompassing, and in many cases what exists in
organizations falls short of what is needed in our modern big data and
data lake oriented world where ever increasing volumes of data are
playing an ever more critical role in everything a business does. </span></span><br />
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></span>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">What
is often left out and is missing in many modern data lake and data
warehousing solutions are the two lesser known cornerstones I show in
the diagram: <i>Data Lineage</i> and <i>Data Provenance</i>. And
without these additional pieces your data lakes (with ever increasing
volumes and variety of data) can quickly become an unmanageable data
dumping ground.</span></span><br />
<br />
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Who needs Data Lineage and Data Provenance management tools, APIs and visualization services?</span></span><br />
<ul>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Data Engineers (building and managing data pipelines)</span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Data Scientists(discovering data & understanding relationships)</span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Business/System Analysts (data stewardship)</span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Data Lake / BI Executives (bird's eye view of health and sources/destinations)</span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Data Ops (managing scale and infrastructure)</span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Workflow/ETL Process Operators (monitoring & troubleshooting)</span></span></li>
</ul>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Most
of the ETL processing and dataflow orchestration tools out there in the
market (open source and commercial) such as NiFi, Airflow, Informatica,
and Talend among others, do not directly address this gap. What is the
gap? The gap is knowing where your data is coming from, where it has
been and where it is going. Put another way, having visibility into the
journey your data takes through your data lake and overall data fabric.
And doing this in a lightweight fashion with out a lot of complex and
expensive commercial tools.</span></span><br />
<br />
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Let's
spend a bit of time talking about data linage and data provenance in
particular and why they are important parts of a modern and healthy
overall data architecture. First let's touch on the broader Data
Governance ecosystem.</span></span><br />
<h2>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">Data Governance</span></span></h2>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Data
Governance can be an over used term in the industry and sometimes is
all encompassing when describing the data strategy, best practices and
services in an organization. You can read a lot of differing definitions
of what data governance is and is not. I take a simple and broader
definition for data governance which includes:</span></span></span></span></span><br />
<ol>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Data Stewardship - org policies for access/permissions/rights</span></span></span></span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Data Map - location of data</span></span></span></span></span> </span></span></span></span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">MDM - identifying and curating key entities</span></span></span></span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Common Definitions - tagging and common terminology</span></span></span></span></span></span></span></span></span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Taxonomy/Ontology - the relationships between data elements </span></span></span></span></span> </span></span></span></span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Data Quality - accuracy of data</span></span></span></span></span></li>
<li><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Compliance - HIPAA, GDPR, PCI DSS...</span></span></span></span></span></li>
</ol>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">These
are all important concepts, yet the do not address the dynamic nature
of data and the ever more complex journey data is taking through modern
data lakes and the relationships between data models residing in your
data lake. This relates to needing to know where is your data going and
where did it come from. This is where data lineage and data provenance
comes into play to compliment data governance and allow data engineers
and analysts to wrangle and track the run-time dynamics of data as it
moves through your systems and gets combined and transformed on its
journey and its many destinations.</span></span></span></span></span><br />
<h4>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><b>
</b></span></span></span></span></span></h4>
<h4>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><b>Data Control Plane == Data Lineage and Data Provenance </b></span></span></span></span></span></h4>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">I
view data lineage and data provenance as two sides of the same coin.
You can't do one well without the other. Just like digital networks have control planes for visualizing data traffic, our data lakes need a
Data Control Plane. And one that is independent of whatever ETL
technology stack you are using.</span></span></span></span></span><br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgw4xp2naX8aPAsF_GnII99teRMylec6EDs1R5rVEB1NPozsn90sFnTNbuFJnkV0uKDRNvpYSO_J2sk4Ws8n_9HahkSe7_SB7W7eYgEOTFKHvmcBpF6unR5ALT1X5GURB2fqFoUQFX_XxI/s1600/datalineage.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="168" data-original-width="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgw4xp2naX8aPAsF_GnII99teRMylec6EDs1R5rVEB1NPozsn90sFnTNbuFJnkV0uKDRNvpYSO_J2sk4Ws8n_9HahkSe7_SB7W7eYgEOTFKHvmcBpF6unR5ALT1X5GURB2fqFoUQFX_XxI/s1600/datalineage.png" /></a><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">In
our modern technical age, data volumes are ever increasing and data is
being mashed and integrated together from all corners of an
organization. Managing this from both a business level and technical
level is a challenge. A lot of ETL tools exist today to allow you to
model your data transformation processes and build data lakes and data
warehouses, but they all consistently fall short of giving you a
comprehensive and I would say orthogonal and independent view of how
your data is interconnected; where your data came from, where it is
right now and where it is going to go (and where it is expected to go
next) in its journey through your data fabric and destinations.</span></span></span></span></span><br />
<br />
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Many
modern ETL tools let you build visual models and provide some degree of
monitoring and tracking, but these tools are proprietary and can't be
separated from the ETL tools themselves which creates lock-in and does
not allow one to mix and match best of breed ETL tools (Airflow, aws
step functions, lambda, Glue, Spark, EMR...etc) that are now prevalent
in the cloud. If you are using multiple tools and cloud solutions it
gets ever more complicated to have a holistic view of your data and its
journey through your platform.</span></span></span></span></span><br />
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span></span></span></span></span>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">This
is why I strongly believe that data lineage and data provenance should
be completely independent from what underlying ETL tooling and data
processing technology you are using. And if it is not, then you are both
locking yourself in unnecessary and greatly limiting the potential of
your data ops teams, data engineers and limiting your overall management
of the data and the processes carrying your data through its journey in
your data lake.</span></span></span></span></span><br />
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span></span></span></span></span>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;">Data
Provenance and Data Lineage are not just fancy words; they are a
framework and tool set for managing your data and having a historical
audit trail, reat-time tracing/logging and control plane graph of where
your data is going and how it is interconnected.</span></span></span></span></span><br />
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;"><span style="background-color: white; color: #222222; display: inline; float: none; font-style: normal; font-weight: 400; letter-spacing: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><br /></span></span></span></span></span>
<span style="font-size: small;"><span style="font-family: "arial" , "helvetica" , sans-serif;">So
do not build your data lake without a benefits of Data Control Plane. Set your
data free on its journey while still maintaining viability, traceability
and control.</span></span>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-11937298297308644932019-08-15T15:38:00.002-07:002019-08-15T15:40:28.270-07:00Data Lake vs Data Warehouse<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsBrA2AKhyjdeRFp7bS4qByaQ4G8RvlTMCRBnaN-pK7oDecfGfbN2g5ZFNei0jBLPioqtxUAQKk_JuzKOITqe8XvtU9m7AFVAJnBHgkuv5jHMIOn8AT-hkbB6x6JOe2dUsFyewG-mSQP4/s1600/datalake-vs-dw.jpeg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1035" data-original-width="800" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsBrA2AKhyjdeRFp7bS4qByaQ4G8RvlTMCRBnaN-pK7oDecfGfbN2g5ZFNei0jBLPioqtxUAQKk_JuzKOITqe8XvtU9m7AFVAJnBHgkuv5jHMIOn8AT-hkbB6x6JOe2dUsFyewG-mSQP4/s400/datalake-vs-dw.jpeg" width="308" /></a></div>
<br />
<span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;">Is a data lake part of your data warehouse platform or does the data lake sit beside it? There is a fair amount of ambiguity as to what a data lake is and how it should fit into your overall data strategy. </span><br />
<span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"><br /></span><span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;">I believe data lakes (coupled with elastic cloud storage and compute) are a game changer in both the DW and BI world. Your data warehousing strategy should be part of the data lake not the other way around. While you don't have to throw away everything you have done or learned in your traditional ETL and DW world, the fundamentals have changed. </span><br />
<span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"><br /></span><span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;">To take advantage of your data and build better BI/analytics you must build atop a sold data lake foundation. And this going well beyond the many failed Big Data and Hadoop projects of the recent past that many enterprises have experienced. </span><br />
<span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"><br /></span><span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;">While Hadoop was a necessary step forward at the time, it was and is an evolutionary dead end - RIP Hadoop. Cloud data lakes are the future and it is more than putting your data into S3 buckets. </span><br />
<span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"><br /></span>
<span style="background-color: white; font-family: , , , "segoe ui" , "roboto" , "helvetica neue" , "fira sans" , "ubuntu" , "oxygen" , "oxygen sans" , "cantarell" , "droid sans" , "apple color emoji" , "segoe ui emoji" , "segoe ui emoji" , "segoe ui symbol" , "lucida grande" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;">Well architected data lakes are the culmination of a succinct data management strategy that leverages the strengths of cloud services and many traditional DW best practices and data governance policies.</span><br />
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-70689833436742340222019-05-15T22:41:00.000-07:002019-05-15T22:42:31.818-07:00R.I.P. HDFS | The Cloud Wins! <div class="feed-shared-update-v2__description feed-shared-inline-show-more-text feed-shared-inline-show-more-text--expanded ember-view" id="ember1269" style="-webkit-line-clamp: initial; background: 0px 0px; border: 0px; box-sizing: inherit; display: block; font-size: 16px; line-height: 2rem !important; margin: 0px 16px; max-height: none; max-width: 928px; outline: 0px; overflow: hidden; padding: 0px; position: relative; vertical-align: baseline;">
<div class="feed-shared-update-v2__commentary feed-shared-text ember-view" dir="ltr" id="ember1270" style="background: 0px 0px; border: 0px; box-sizing: inherit; color: rgba(0, 0, 0, 0.75); font-size: 1.4rem; font-weight: 400; line-height: 1.42857; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">
<div class="feed-shared-text__text-view feed-shared-text-view white-space-pre-wrap break-words ember-view" id="ember1271" style="background: 0px 0px; border: 0px; box-sizing: inherit; font-size: 14px; line-height: inherit !important; margin: 0px; outline: 0px; overflow-wrap: break-word; padding: 0px; vertical-align: baseline; white-space: pre-wrap; word-break: break-word;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCj2TngeuEk2oSAuUNohY9PuWVlbMFNK3GqkBenFVWJxvfWG0acfRP4nJOR5MoN-rxkmZhYD_onxnp2G5jSh1js1_uOrTV1LLWvlPmFDnxXpHoh2lIWMZw4fDyBeH3gVlDdgBS0cpB6Hg/s1600/hdfs_vs_s3.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="720" data-original-width="1280" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgCj2TngeuEk2oSAuUNohY9PuWVlbMFNK3GqkBenFVWJxvfWG0acfRP4nJOR5MoN-rxkmZhYD_onxnp2G5jSh1js1_uOrTV1LLWvlPmFDnxXpHoh2lIWMZw4fDyBeH3gVlDdgBS0cpB6Hg/s400/hdfs_vs_s3.jpg" width="400" /></a></div>
<br />
<span style="background: 0px 0px; border: 0px; box-sizing: inherit; font-size: 14px; line-height: inherit !important; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><span class="ember-view" id="ember1274" style="background: 0px 0px; border: 0px; box-sizing: inherit; font-size: 14px; line-height: inherit !important; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><span style="background: 0px 0px; border: 0px; box-sizing: inherit; font-size: 14px; line-height: inherit !important; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">HDFS is an evolutionary dead end in the tree of big data. Data lakes based on S3 object storage deliver on the promise of separating storage from compute and make it possible to scale your processing and downstream analytics/AI and data marts on top of a data lake in an agile and elastic fashion. The HDFS architecture always bugged me when it was first released (besides the fact it is written in Java). Moving the code to the Hadoop data node (usually only three replicas available by the way), seemed to be inherently limiting to me. It was not really better than using big unix SMP servers other than you got to use cheaper commodity hardware and grow incrementally. Good stuff, but not good enough - 1 step forward and a half step backwards.</span></span></span><br />
<span style="background: 0px 0px; border: 0px; box-sizing: inherit; font-size: 14px; line-height: inherit !important; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><span class="ember-view" id="ember1274" style="background: 0px 0px; border: 0px; box-sizing: inherit; font-size: 14px; line-height: inherit !important; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"><span style="background: 0px 0px; border: 0px; box-sizing: inherit; font-size: 14px; line-height: inherit !important; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">While the idea of moving code to the data sounded cool at the time, it is fundamentally a bad data processing design for a truly scalable data lake that allows for rolling up an arbitrary number ephemeral compute clusters on top of your storage. There is a place for HDFS and traditional Hadoop clusters, if you have big fixed and slow evolving predictable cluster of compute/storage environment. For the rest of us, a cloud based data lake architecture will win in the end and allow for agile development to meet the fast paced needs of downsteam today's BI, analytics and AI/ML</span><span style="background: 0px 0px; border: 0px; box-sizing: inherit; font-size: 14px; line-height: inherit !important; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;"> applications that need to sit on top of the mythical data lake.</span></span></span></div>
</div>
</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-2249557669088889752018-08-09T11:37:00.002-07:002018-08-09T11:39:13.565-07:00Choosing Between Spark ML, scikit-learn, and DNNs<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-tnv-B0hiPVDEI8UarYNK481_WN5xqxHq9eiWSaZWQ9vkACFx-PD46Iqc0Q6k1VErYYwUV5wE4nSpWZ-h1PuGOw02ZWh4cNiYaJocESLkJUx2fnkMMzTD7tl8dYqTEvsxXS9x-3-GulI/s1600/Screen+Shot+2018-08-09+at+2.32.34+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="465" data-original-width="1600" height="115" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-tnv-B0hiPVDEI8UarYNK481_WN5xqxHq9eiWSaZWQ9vkACFx-PD46Iqc0Q6k1VErYYwUV5wE4nSpWZ-h1PuGOw02ZWh4cNiYaJocESLkJUx2fnkMMzTD7tl8dYqTEvsxXS9x-3-GulI/s400/Screen+Shot+2018-08-09+at+2.32.34+PM.png" width="400" /></a></div>
<div>
<br /></div>
Now these aren't the only considerations when deciding on how to build your data science stack and the related tooling you will need around it, but it is a place a lot of organizations tend to begin their opening questions. Sometimes the answer may be, all the above. But you have to first reflect on your organizations goals and the level of your investment in any transformation effort, especially one that involves such a fundamental shift in how you to turn data into business value.<br />
<div>
<br /></div>
<div>
There are a number considerations that can influence your data science architecture that should be examined before establishing your AI platform. They include:<br />
<div>
<br /></div>
<div>
<ol>
<li>ETL and data prep tools? AI does not work without data. Find it, mine for it and create it.</li>
<li>Cloud, on-prem or hybrid for building your data science stack?</li>
<li>How big is your data? Really how big is your data? Not everyone has "big data".</li>
<li>What are you modeling? What kind of outcomes are you looking to solve for?</li>
<li>Build, buy, partner. What kind of skills do you want to invest in for in-house data science, ML engineering and ML operations?</li>
</ol>
<div>
The bullets above are only touching on much deeper considerations that need to be assessed by any organization looking to transform their business with AI. But let's step back a bit and just discuss the question posed by the title of this blog to avoid turning this blog into a long drawn out analysis that goes down too many rabbit holes.</div>
</div>
</div>
<div>
<br /></div>
<b>Spark ML</b><br />
<div>
It is natural for a lot of organizations who have been doing "Big Data" to get their first exposure to data science through Spark's MLlib. Spark ML is a nice module/framework the comes with Spark and comes packaged with most major Hadoop distributions. The ML APIs and algorithms include many of the popular model building options from decision trees, to survival analysis (time-to-live), to allowing you to build recommendations engines (ALS), to unsupervised learning with clustering and topic modeling. Spark ML is nice and convenient for those coming from the Big Data universe. One nice advantage is that you can often leverage Spark's inherent distributed architecture to build models that can operate at large petabyte scale when needed. Is Spark ML ideal for all data sources and outcome objectives and it is the most efficient (you can hack DNN into if you have the stomach for it) - the answer as you might guess is obviously no. Why, well that is for another day to dive into, but suffice it to say that it may always be the most accurate way to build models and may not always be the best bang for CPU/GPU buck.</div>
<div>
<br /></div>
<div>
<b>scikit-learn</b></div>
<div>
Then there is good old scikit-learn. Any Python developer with a math or data background or has done any statistical modeling (or ML work) will know and likely love scikit-learn and all the other related Python packages such as numpy, scipy and pandas to name the most popular. scikit-learn is a treasure trove of algos and APIs. It is an awesome framework for ML developers and data scientists. Does it scale in same ways that Spark can - unfortunately no. But do you always really need it to? Look at your data before you answer that.</div>
<div>
<br /></div>
<div>
<b>Deep Neural Nets</b></div>
<div>
Then there is the new kids on the block, DNNs (back from the future). Tensorflow and PyTorch just to name a couple of the most popular are claiming to be universal function approximators that can model anything and solve for everything. Note, you will need to bring data and lots of it. They are data hungry. They can solve anything from classifications to generating word embeddings to creating generative models. There isn't much a DNN and its offshoots can't do theoretically. Through their natural fit with GPUs, they can scale fairly efficiently, and you can sometimes sort of distribute them with some extra heavy lift.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1YA8bRO7H8pvaUyAeGd47qtV8lZ3Se_K3uoi5PMVDJNx_ID2fomZD_u4q6Sv2aTeYjfQyVTec1GnA8KqUg48DjJ42v-DUHHigFBYjb5xUqu2G_AtV-gK1rhaBq7RisBl58-A8_2qPCY8/s1600/data-science-loop.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="650" data-original-width="1024" height="253" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1YA8bRO7H8pvaUyAeGd47qtV8lZ3Se_K3uoi5PMVDJNx_ID2fomZD_u4q6Sv2aTeYjfQyVTec1GnA8KqUg48DjJ42v-DUHHigFBYjb5xUqu2G_AtV-gK1rhaBq7RisBl58-A8_2qPCY8/s400/data-science-loop.png" width="400" /></a></div>
<div>
<b>Taking your Models Live</b></div>
<div>
A lot of what we just reviewed is about building and training models. Now, how do then take what we just trained and turn it into a service that predicts, classifies or generates data? That is also a topic unto its own. Operationalizing machine learning models can be non-trivial but it can also be not so difficult at times. It just depends on the model you are creating. For example, sometimes discrete bounded models can just be exported into a database, but often times the solution (input and output space) is not finite and requires creating distributed your build models as inference engines - and that is a bit more work. Then there is the nagging issue of how and when to update your models. Again another subject all together.</div>
<div>
<br /></div>
<div>
<b>Buy or Build</b></div>
<div>
So should you build or buy? The big boys (google, aws, azure) are all making a lot of what we just described available as MLaaS offerings (to various degrees of completeness). So stay tuned and current as the AI technology world is changing fast.</div>
<div>
<br /></div>
<div>
<br /></div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-55629559424367453102018-02-10T14:45:00.001-08:002018-02-11T13:15:20.305-08:00Don't Build your NLP Bot like a GUI<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgv9ScriJcz9puQqXuzWS-MZOwync0oowuf3uR6Qp4gxHf780vOOWskyy2WDMBHJB90rhfmp3CF6NatJkGZaehCPQzYWjlA90cRZjf8TdvlB5bfGhWfHcMTqAJTNsn5TOjmLpWGimwljaU/s1600/nlp_vs_gui.JPEG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="720" data-original-width="1280" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgv9ScriJcz9puQqXuzWS-MZOwync0oowuf3uR6Qp4gxHf780vOOWskyy2WDMBHJB90rhfmp3CF6NatJkGZaehCPQzYWjlA90cRZjf8TdvlB5bfGhWfHcMTqAJTNsn5TOjmLpWGimwljaU/s400/nlp_vs_gui.JPEG" width="400" /></a></div>
<br />
There has been an ongoing buzz about conversational user interfaces (CUI) and how they can enhance human-to-computer interaction. Many of us have experienced them to some extent both in play with voice interfaces such as Siri and Alexa and at work with messaging interfaces in tools like Slack. When CUIs are infused with a good dose of NLP (and context aware, machine learning..etc), they have the potential to become more than just a primitive command line or simple minded messaging/voice interface. The potential for building NLP powered virtual assistants and benefiting from the productivity that they can provide does exist, but you will have to get out of your comfort zone as a developer.<br />
<br />
<h3>
The Mental Leap to CUIs</h3>
Building CUIs using NLP and AI can take some getting used to when you have spent your career or education building graphical user interfaces (GUI). Be wary of getting caught in the trap of building your natural language interactions like you would build a GUI. A virtual assistant, like how some NLP bots function, can behave in many of the same ways you would interact with another human that would be working on your behalf. Now, a GUI is a very capable and functional interface, but is not a virtual human. And you have to keep reminding yourself that the bot you are building needs to behave like a virtual human, because it is easy to get caught in the trap of trying to build your NLP virtual assistant like you would a traditional GUI. It ends up being the worst of both worlds if you end up doing that.<br />
<br />
<h3>
Not that there is Anything Wrong with GUIs</h3>
Now there is nothing wrong with GUIs. For some tasks and functions they will always be superior to natural language. But for many things we do in our daily lives, language is a superior medium especially if the engaging entity is intelligent.<br />
<br />
Fundamentally how you would ask (via text or voice) a virtual assistant to perform an action or make an inquiry on your behalf is different than using a GUI. The way you reach a decision or action point in a CUI can be different than a GUI. Let's take a very simple example to compare.<br />
<br />
Say you are using a business application to manage pending requests that you must take action upon on a daily basis. And there are different types or classes of requests you must review. Some requests are specific to you, some to your direct team members, and other requests are company wide actions you must review, but you are required to review and make decision/action on all of them at some point during the day or week.<br />
<br />
In a GUI you might navigate to the screen and see some quick filters that let's you select which requests types to view or all requests might be shown in some sorted of grouped order on a scrolling page. And you would navigate through the information to decide what to do first. There are many ways to represent the request and perform filtering and sorting. It also depends on the your preferences for what tasks you like to tackle first and how you manage your day. The ways to make the GUI optimal for your particular usage pattern depends on many factors, many of whom are specific to you. This is where GUIs begin to breakdown. They can sometimes overwhelm us with information or not adapt to our unique usage patterns - basically they lack the ability to easily adapt to the extreme personalization you can achieve with a CUI.<br />
<br />
<h3>
All Things being Equal CUIs Win</h3>
Now this all depends on your implementation of the GUI and CUI. You can obviously build a horrible CUI and a wonderful ultra personalized GUI, but all things being equal I claim that a CUI will always be superior. The advantage the CUI has is that the user experience does not have to change significantly to improve personalization. The inherent nature of conversational interaction is something natural to all humans. So as a developer, as you improve the conversational flow of your virtual assistant, and as you release new versions of the bot, the user can adapt much more easily because the flow is fundamentally the same, the interaction is still a conversation and it is the bot that is getting smarter/better and always assisting you through the same conversational experience to help you accomplish your objectives.<br />
<br />
So let's go back to our request/action application we described earlier. In the CUI case, a smart virtual assistant might look at all the pending requests and tell you have many pending requests of different types and suggest you start with your direct reports first, since there is only one of those, if that was the case. The ability of the CUI to branch off in different directions is much more dynamic than what a GUI can do, and this can be done in a way that does not force the user to learn a brand new interface with each new release of the software, since the virtual assistant is your interface and your personal guide at the same time. The CUI sort of has built in help since it is a virtual assistant by design.<br />
<br />
<h3>
Mixed Mode CUI and GUI</h3>
Many of the popular bot platforms such as Slack and Facebook Messenger have incorporated some very convenient visual GUI components into their messaging and flows which allow developers to mix conversation question/answer dialog with short-cut GUI actions and interactions. This can be great way to meld conversational with GUI, but at the same time it can get CUI developers sucked back into the GUI world, and get developers focusing too much on injecting too many GUI interactions into the CUI flow. So use these tools wisely and keep the interactions focused on the conversational flow between the user and the human like virtual assistant.<br />
<br />
<h3>
Grow your Bot Like You are Raising a Child</h3>
So be careful, always think like you are building a smart virtual assistant not a GUI. You will not have all the smarts built into the bot from day one, but make that your mission to keep the conversational flow focused on the interactions and dialog between the human user and the virtual assistant, and everything else will follow as your bot gets smarter.<br />
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-9693331787600165632018-02-03T18:11:00.002-08:002018-02-04T21:56:39.372-08:00Can AI Put the Human Back into HR?<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrnBAnOYCgiSFVLWTZfSzlrHGtCOzq6FeYpDVRGR_fNzZtsLZ_WjNEdUVU-VI1X6DjBGafz0xzbHp8JJVDvkXa8SegbfzB-qIHK4271QnpKaSvtvAZPkboQpjVB-WzdLFR5wy8P_Zuxl4/s1600/AI-in-HR.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="423" data-original-width="750" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrnBAnOYCgiSFVLWTZfSzlrHGtCOzq6FeYpDVRGR_fNzZtsLZ_WjNEdUVU-VI1X6DjBGafz0xzbHp8JJVDvkXa8SegbfzB-qIHK4271QnpKaSvtvAZPkboQpjVB-WzdLFR5wy8P_Zuxl4/s400/AI-in-HR.jpg" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Over the past couple of decades Human Resources (HR) has moved from paper and manual processes to more automation and providing employees with more and more web and mobile powered experiences. But has HR lost its human essence and personalization in the process? Nowadays managing a company's "human" assets seems to involve less and less human interaction between employees and the actual people in the HR organization. Try finding an HR person when you need one these day!<br />
<br />
<h3>
<b>In the Beginning there were Humans</b></h3>
I recall my first few jobs at both major corporations and at startups, there was always a HR person I could reach on the phone, or simply walk over to their office to get small and big matters resolved. HR personnel where typically visible and accessible. Human Resources representatives interacted at a personal level with employees and often knew you on a first name basis.<br />
<br />
It seems with more technology that HR has lost its human to human interaction and become more impersonal and mechanical. Today, if you need some payroll or benefits issue resolved, you typically need to submit a "ticket" in some online system and wait for someone (you have never met) to contact you back over email or if you are luck over the phone. Or if you are lucky you can click your way through a labyrinth of HR GUI applications to find what you need.<br />
<br />
<h3>
<b>Virtual Assistants are the New Humans</b></h3>
So what is the solution, more or less technology? Maybe the answer is better technology. <a href="http://toffy.work/" target="_blank">AI powered HR virtual assistants</a> have the potential to be your personalized guide to do everything from answering general HR questions to requesting time-off and helping guide you to finding the information or actions you need to take quickly. Machine learning powered intelligent assistants could help resolve problems and questions with your payroll, for example. Interactive NLP powered bots could help guide you through, an often times, complicated benefits and open enrollment process by knowing your preferences and history to match you with the optimal recommendations for your particular situation.<br />
<br />
<h3>
<b>The Future is an AI Powered Voice and Messaging-First World</b></h3>
The future of HR is not more technology, but more intelligent technology powered by machine learning, natural language processing, personalized recommendations engines, and other AI enabled technology that bring hyper-personalization and a human-to-computer interaction model that goes beyond impersonal graphical user interfaces. Voice and messaging-first interfaces (endowed with NLP) are a step in the right direction and can bring back a bit of humanity to your technology overloaded workplace.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-43447696563422985582017-12-25T12:02:00.001-08:002018-02-03T16:14:07.655-08:00Recommending Actions for Your NLP Bot<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4OyISUUP3WQ3bX_I8QD7Ons3D_NljFoyykw0eTXNVttlSAl4hWCA1b2ybuPD7EwG6-iocGSVu_F3VnuOsokjIiEXDmCZ_l4GideJ9Pqs9s0MnyGXIBIb4HPxFIzVeookRE9hrqow5qmo/s1600/nlp.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="165" data-original-width="305" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4OyISUUP3WQ3bX_I8QD7Ons3D_NljFoyykw0eTXNVttlSAl4hWCA1b2ybuPD7EwG6-iocGSVu_F3VnuOsokjIiEXDmCZ_l4GideJ9Pqs9s0MnyGXIBIb4HPxFIzVeookRE9hrqow5qmo/s1600/nlp.png" /></a></div>
At first glance, the machine learning methods typically used in NLP applications (such as chatbots) and those used in recommender systems (for recommending products) are not often leveraged together in the same applications.<br />
<br />
NLP is the machine learning domain that makes your virtual assistant capable of engaging in human language based conversation and <a href="https://www.youtube.com/watch?v=EgE0DUrYmo8" target="_blank">recommender systems</a>, as the name suggests, recommend products/services you will hopefully like (thus saving you the trouble of discovering them on your own); but it is not often you see NLP and recommender systems together.<br />
<br />
<h3>
Where Conversational UI Meets Recommendations</h3>
But let's think about that for a moment. Is there a solution space where NLP and recommender systems intersect and why would anyone want to do such a thing? I will make the case that every so called AI powered virtual assistant (aka chatbot and their kin) needs context and part of that context can be provided by a personalized recommender system that helps guide the conversation and streamline the conversational user experience.<br />
<br />
<h3>
A Messaging First World</h3>
We are in the midst of a messaging application revolution. <a href="https://grandlogic.blogspot.com/2016/10/goodbye-apps-and-hello-bots.html" target="_blank">A new generation</a> of users are making messaging based applications their preferred medium for communicating with people, places and things around them, especially when it comes to the digital world (and contextual world). And there is no lack of applications from fintech to social applications leveraging and rediscovering the command line interface as the new mode (or not so new mode for many <a href="http://grandlogic.blogspot.com/2017/08/conversational-ux-design-is-evolving-as.html" target="_blank">command-line geeks</a>) of communication between humans and computers.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi94WQEj1dnCK_0E6t4jsN76p5bvBqtQaqIdt9fIGjNiOwy2nVFoA8Fw0XIu3LTvtInyjF9QWple_GntPoiEB_ol2nPM3w40uqNmZvzyk5T785XFCy03RGXwwNDWHv3IC_2Z30V9GL8Y9I/s1600/recommender.jpeg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" data-original-height="207" data-original-width="245" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi94WQEj1dnCK_0E6t4jsN76p5bvBqtQaqIdt9fIGjNiOwy2nVFoA8Fw0XIu3LTvtInyjF9QWple_GntPoiEB_ol2nPM3w40uqNmZvzyk5T785XFCy03RGXwwNDWHv3IC_2Z30V9GL8Y9I/s1600/recommender.jpeg" /></a></div>
<br />
<h3>
<span class="markup--strong markup--h4-strong" style="font-weight: inherit;">Your Next Question or Answer is a Recommendation Away</span></h3>
Ok, so where do recommender systems fit into the world of NLP and conversational driven user interfaces? Well, conversational applications are not without their own challenges. Typing and speaking takes effort on both the user and virtual assistant, in order to engage in timely and efficient interaction. But what if your virtual assistant new what you wanted to do next (or what you might/should like to do next)? What if your NLP powered bot could suggest to you actions you might want to take and save you the trouble of verbalizing it - maybe give you a quick one click shortcut (can be voice powered as well) to driving and continue the conversation?<br />
<br />
This is where recommender systems can play a vital role in making your virtual assistant not just clever at understanding intent and named entity recognition from voice or text, but also present a sense of intelligence in remembering your past behavior or predicting what you might do next (or should do next) by relating your behavior to what others in similar roles and situations have done next. So even with no prior knowledge of "you", the virtual assistant might prescribe next actions based on what others in similar roles and situations have done. Does that not sound like a recommendation system?<br />
<br />
<h3>
Prescribing vs Predicting</h3>
Recommender systems are inherently about prescribing things (which can include actions not just items) applicable to your context at a given point in time (time based being a critical context as well). I foresee a <a href="https://www.lemonade.com/" target="_blank">future where both business</a> and consumer application oriented virtual assistants and NLP bots will leverage highly personalized recommender systems to take the human-to-computer interaction to the next logical evolution (as promised by many sci-fi books and movies :)<br />
<br />
<h3>
Matrix Factorization and LSTMs are Your Friend</h3>
So for all you NLP bot developers, make things like matrix factorization and collaborative filtering your friend. Hybrid recommender systems based on collaborative filtering and content filtering (product and customer meta data) have been the state of the art for the past few years (since the <a href="https://en.wikipedia.org/wiki/Netflix_Prize" target="_blank">Netfix contest</a>). However the future of recommender systems will be powered by deep learning and concepts like LSTM and product and item embeddings. Research in this space is <a href="https://www.youtube.com/watch?v=vaJOlKxyKhA" target="_blank">evolving fast</a>. A mix of shallow and deep learning techniques are racing to enable this world of intelligent NLP bots and efficient conversational user interfaces.<br />
<br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-34956557021218404412017-09-13T19:23:00.001-07:002018-02-04T21:40:08.309-08:00AI and Machine Learning Madness<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgooYTdrphYlcj8RJ_hmI8Vwoa04koX2psgZlMMx4t2seXfAqr5h8r1fYrzZvuw6rYAEMRGHwA78fyH7P_NONj-cG-GcVp-katqHSnnB5X3c2-TFMeZ0zU0K3u8QFddp5BH6HUNS9KlLA8/s1600/AI-Artificial-Intelligence-e1483704577565.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="454" data-original-width="640" height="283" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgooYTdrphYlcj8RJ_hmI8Vwoa04koX2psgZlMMx4t2seXfAqr5h8r1fYrzZvuw6rYAEMRGHwA78fyH7P_NONj-cG-GcVp-katqHSnnB5X3c2-TFMeZ0zU0K3u8QFddp5BH6HUNS9KlLA8/s400/AI-Artificial-Intelligence-e1483704577565.jpg" width="400" /></a></div>
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif; font-size: x-small;"><br /></span>
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif;">The hype around machine learning and AI is just off the charts these days. This reminds me of several years back when just about every software product started putting the term "Big Data" in front (or at the end) of all of their product literature and company tagline.</span><br />
<br style="color: #222222; font-family: arial, sans-serif;" />
<h3>
AI is the Art of Using ML at the Right Time and in the Right Context</h3>
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif;">Today, startups are claiming to change every industry and the world just because they put some "machine learning" inside their products. Machine learning has the potential to give software super powers, but alone it is does not create great software.</span><br />
<br style="color: #222222; font-family: arial, sans-serif;" />
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif;">Listen, I am just as excited as the next geek about the potential of deep learning and the democratization of data science, but please stop putting machine learning in your company's slogan, like it was the secret ingredient that you were looking for. Sorry, this is becoming my new pet peeve.</span><br />
<br style="color: #222222; font-family: arial, sans-serif;" />
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif;">Machine Learning, at the end of the day, is just one more tool in our arsenal of tools and skills needed to build more efficient and more intuitive solutions and services for our customers and for humanity as a whole. Infusing machine learning into your products is more than technology. It takes a <a href="https://grandlogic.blogspot.com/2016/10/bots-ai-and-future-of-augmented-ux.html" target="_blank">new way of thinking</a> and new of running engineering teams and engineering processes. </span><br />
<br style="color: #222222; font-family: arial, sans-serif;" />
<h3>
A Great Product is Much More than AI</h3>
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif;">Please talk about the benefits of using AI/ML/DL, don't just say "because you are using machine learning" you are changing the world or have a better product. There is that little thing called building a product that delivers value to the end user that you still have to get right. And believe it or not, this still requires human creativity and execution. Claiming your infusing your product with ML will not necessarily make it a better product or the world a better place.</span><br />
<br style="color: #222222; font-family: arial, sans-serif;" />
<h3>
AI is in the Product - It Should not be the Product</h3>
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif;">I do believe that machine learning related technology will be in every future product, just like we have transistors in every computer today. Machine learning will allow us to build better and more efficient products. How you use the machine learning to build a great product will matter much more than the fact that you are using machine learning. Keep that in mind when you come up with your next startup tagline and company slogan!</span><br />
<br style="color: #222222; font-family: arial, sans-serif;" />
<span style="background-color: white; color: #222222; font-family: "arial" , sans-serif;">End of rant.</span>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-21321255009615064502017-08-28T06:47:00.001-07:002018-02-04T21:39:03.951-08:00AI vs Paradox of Choice<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_lYhtAo6dQu63ylm_Popw4_04hytGdmqys_k8Zr-bth0qXx4uMtxFKRemXOje88Gwq-C-hDTjoMe8S46EXiPdT6BGXKKcJ2yTYwEG6Q8EjEqiOceN0snn-tuDZgF5VGP5BcWag2p3yFgx/s1600/dilbert-paradox-of-choice.gif" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="280" data-original-width="900" height="123" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_lYhtAo6dQu63ylm_Popw4_04hytGdmqys_k8Zr-bth0qXx4uMtxFKRemXOje88Gwq-C-hDTjoMe8S46EXiPdT6BGXKKcJ2yTYwEG6Q8EjEqiOceN0snn-tuDZgF5VGP5BcWag2p3yFgx/s400/dilbert-paradox-of-choice.gif" width="400" /></a></div>
<span style="font-size: small;"><br /></span>
<span style="color: #222222; font-family: "arial" , sans-serif;">The <a href="https://www.ted.com/talks/barry_schwartz_on_the_paradox_of_choice">paradox of choice</a> is a problem we see more and more of in our modern world. It goes beyond what products Amazon should recommend or friends Facebook should suggest. In the business world and in enterprise applications this is also a challenging problem as our applications and processes grow in complexity. The potential for machine learning powered <a href="https://en.wikipedia.org/wiki/Recommender_system">recommender systems</a> to augment human decision making is one of the next frontier for AI in the enterprise . </span><br />
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span>
<span style="color: #222222; font-family: "arial" , sans-serif;">Recommender systems can do more than just suggest what articles you should read on Linkedin or what jobs are most suited for you. In the future machine learning (and more likely deep learning) powered recommender systems will guide enterprise decision making by helping business process owners take the most effective actions and decisions in a timely manner and with hyper-personalization. </span><br />
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span>
<span style="color: #222222; font-family: "arial" , sans-serif;">Recommender systems will move from solving B2C optimization problems (how they are typically used today in our data saturated and over marketed world) to solving problems in B2B and enterprise applications. Ultimately recommender systems are about prescribing (they are not really about predicting) an optimal decision at the right time and place/context, so they can naturally deal with a variety of B2B scenarios such as optimizing workflow paths, streamlining supply chain actions, to augmenting human decisions for common day to day business operational functions. Enterprise decision makers are in vital need of these AI super powers. Stay tuned they are coming :)</span>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-73573177405903721172017-08-20T18:22:00.003-07:002017-08-20T18:33:51.958-07:00Messaging-First Applications with Slack or FBM?<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXfteWW9OG4tMVtLj3sxcalStIhOrSvJOuhzIFySQ8g6Lxbe4j_lz88MJfkVtJYRUI8WeWgV9NjLDm2u60nljjyFP5su05VMgAYfLoIL40I9FwJ6viqXf_pOqKM4qnkKcnDe00PHiLF2M/s1600/stoneage-bot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="375" data-original-width="698" height="214" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXfteWW9OG4tMVtLj3sxcalStIhOrSvJOuhzIFySQ8g6Lxbe4j_lz88MJfkVtJYRUI8WeWgV9NjLDm2u60nljjyFP5su05VMgAYfLoIL40I9FwJ6viqXf_pOqKM4qnkKcnDe00PHiLF2M/s400/stoneage-bot.png" width="400" /></a></div>
</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
Conversational UX design is evolving as more and more apps begin to incorporate conversational UI functionality. While the concept of a messaging-centric UI can seem simple, the melding of a messaging-first user experience is nothing to underestimate. Conversational user interfaces can be simple for humans to interact with (you are just chatting back and forth), however, blending in and balancing rich visualization and complex interactions is not simple to get right. Just like any other UX, it is a balance of minimalism while allowing for rich expressiveness in the UI without overwhelming the user.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
Slack is one of the leading platforms for building bots, especially for enterprise applications. However Slack has a number of bot conversational UX features that are still missing relative to other platforms such as FB Messenger and FB Workplace.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
To give some perspective, here is my compiled list of features I would like to see in Slack's bot framework to improve its messaging UX and bring it on par with platforms like FB Messenger:</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
1) Conversational Streams and UI Alignment</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
Slack bots (especially in direct-messaging one-on-one dialog flows) force the bot and the user to both be left justified in the messaging UI stream. This goes against UI norms found in the majority of messaging application and related best practices for messaging apps. Typically in a streaming messaging flow, your conversational stream (you being the person interacting with the bot) is on the right of the screen and the party you are talking to (in this case the bot) is on the left side of the screen (or it can be visa-versa).</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
This is something not supported in Slack and makes a number of things awkward and cluttered in a bot-to-human dialog, especially when it is one-on-one (as opposed to Slack group channel). In Slack the entire conversational interaction is left justified, which can make the UI look cluttered when there are visual rich elements involved and and things like "Quick Replies" in the back and forth stream.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
I hope that Slack will allow for aligning the bot vs the user on different sides of the messaging stream, something more similar to how FB Messenger works. This will allow for a more natural conversational interaction.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
2) Horizontal Scrolling Carousel UI Components</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
Slack (mobile and desktop/web) does not provide any kind of horizontal card or horizontal scrolling carousel. While some might consider this bad design (to allow for horizontal scrolling of cards), it is often necessary to minimize the vertical area needed to display information in rich messaging interactions. FB Messenger allows for limited horizontal scrolling carousel that I find to be very useful when building bots. Hopefully Slack will incorporate this. Slack already supports rich "attachments", so it would be a natural fit to allow for some limited level or horizontal scrolling.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
3) WebView Integration</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
Slack does not have explicit support for messaging buttons that open a webview UI. Sometimes a webview is needed to show rich web content (again here this kind of feature should not be abused). FB Messenger has this ability and allows for controlling how the webview window is opened and closed. This can be mimicked in Slack by using embedding links in the "field" elements for example, but is a bit of a hack.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
4) Quick Reply Buttons</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
One particularly nice feature I got accustomed to using in Facebook Messenger is the feature referred to as "Quick Reply". This allows the bot to display "Quick Reply" buttons that are shortcuts for the user to enter commands that they would normally have to type.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
There is a away to mimic quick replies in Slack, but again it is a bit of hack. Check this open source node/slack project for an example of how this works with Slack. Quick replies are a real necessity in a rich messaging interaction. Again here, I hope that Slack adds this feature natively instead of making bot frameworks jump through hoops to emulate this feature.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
Hopefully the Slack product team will address these issues as Slack is by far the best team and enterprise collaboration/messaging platform on the market today.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5z01odZVPVFpF2P4di2i8Qgz8QrmCxye38Nl44lAJ5AfoWqDXQyidNC1I3-PYr_qLfQdlStPgD0n6Qir-D3AfyDjNCRH1Lr4ePzuvr45hQzplfzLZF4ZDfbgrJIfuNuPzOA8Iprk3h78/s1600/slack_vs_FBM.jpeg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="128" data-original-width="393" height="104" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5z01odZVPVFpF2P4di2i8Qgz8QrmCxye38Nl44lAJ5AfoWqDXQyidNC1I3-PYr_qLfQdlStPgD0n6Qir-D3AfyDjNCRH1Lr4ePzuvr45hQzplfzLZF4ZDfbgrJIfuNuPzOA8Iprk3h78/s320/slack_vs_FBM.jpeg" width="320" /></a></div>
</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
FB Messenger might have some superior bot-to-human interaction and UX capabilities, but it inherently lacks the team collaboration functionality and the many third-party integrations that Slack has to offer.<br />
<br />
I do believe FB Workplace will close the gap over time, and in many ways has advantages over Slack in terms of out of the box social collaboration functionality. Slack is a bit of geeky technical tool when it comes to social collaboration and thus not as intuitive to use.</div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
<br /></div>
<div style="-webkit-text-stroke-width: 0px; color: #222222; font-family: arial, sans-serif; font-size: small; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;">
I expect both FB Workplace and Slack to evolve as head to head competitors and battle for the hearts and minds of developers much like how Netscape battled Microsoft's Internet Explorer for web domination. For enterprise owners and enterprise end user, intelligent AI endowed virtual assistants and bots will usher in a new era of innovation not seen since the dot-com days. The battle has moved from the mobile app store to the AI app store where natural language understanding and deep learning are the killer technologies in the arsenal of AI sophisticated developers.</div>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-2646462571124476326.post-85748749462580205762017-07-13T07:07:00.002-07:002017-07-13T07:07:21.656-07:00Augmented Reality vs Conversational User Interface<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjKHfcLrODB2eGNDoz8B2LAu9WKLzqAIh7KRK96es0amDVk7HXzUkSH6eehbkDkmoK48DrGKFfiv_E8f4a5Fr1pwkD199m6OZHxO9CNHs3xATvf1Ax4Tym8Cdyvn_f-AOXBvouBxucV_U/s1600/webedia-is-launching-an-innovation-vr-unit-8-638.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="359" data-original-width="638" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjKHfcLrODB2eGNDoz8B2LAu9WKLzqAIh7KRK96es0amDVk7HXzUkSH6eehbkDkmoK48DrGKFfiv_E8f4a5Fr1pwkD199m6OZHxO9CNHs3xATvf1Ax4Tym8Cdyvn_f-AOXBvouBxucV_U/s400/webedia-is-launching-an-innovation-vr-unit-8-638.jpg" width="400" /></a></div>
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: x-small; font-variant-ligatures: normal; orphans: 2; widows: 2;"><br /></span>
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: x-small; font-variant-ligatures: normal; orphans: 2; widows: 2;"><br /></span>
<span style="font-family: inherit;"><span style="background-color: white; color: #222222; font-variant-ligatures: normal; orphans: 2; widows: 2;">Don't believe all the hype about virtual reality and its close cousin augmented reality. How many examples do we need to see before we get it that humans don't like a big gizmo sitting over their faces and eyes or <a href="http://www.computerworld.com/article/3054580/wearables/apple-watch-fail-no-iphone-itbwcw.html">attached to their body</a>. Examples like the failed Google Glass and the<a href="https://www.technologyreview.com/s/608257/another-price-slash-suggests-the-oculus-rift-is-dead-in-the-water/"> failing Facebook Oculus</a> are two examples of such failures, with more on the way <a href="https://www.forbes.com/sites/ianaltman/2015/04/28/why-google-glass-failed-and-why-apple-watch-could-too/#8e2453344c4b">if Apple is not careful</a>.</span><br style="color: #222222; font-variant-ligatures: normal; orphans: 2; widows: 2;" /><br style="color: #222222; font-variant-ligatures: normal; orphans: 2; widows: 2;" /><span style="background-color: white; color: #222222; font-variant-ligatures: normal; orphans: 2; widows: 2;">What will win out? It's all about texting and voice stupid (not you). AI is surely coming, but it will be powered by voice-first and messaging-first applications and services. Visual augmentation with AR and VR, is cute, but it will not be what transforms our reality and changes how we interact with technology.</span><br style="color: #222222; font-variant-ligatures: normal; orphans: 2; widows: 2;" /><br style="color: #222222; font-variant-ligatures: normal; orphans: 2; widows: 2;" /><span style="background-color: white; color: #222222; font-variant-ligatures: normal; orphans: 2; widows: 2;">Humans are social animals and voice and texting is what taps into that part of our brain that drives us to connect with others/things and drives us to share ideas. So gear up and start thinking how to turn your applications and services into voice-first and messaging-first user experiences. That is where the future is going.</span></span>Unknownnoreply@blogger.com0