Wednesday, February 5, 2014
The industry is now going through a learning processes with how to manage all this data at massive scales. Storing and managing more data is great, but people and businesses will get smarter at how much data to keep as it starts to hurt more (hurt the pocketbook). How much data you keep and mine will depend on statistically driven best practices and not just about data warehousing or how big your HDFS cluster is. The mainstreaming of Big Data has provided the muscle to store and process massive amounts of data at near linear scale, but we will not see the real value of all this Big Data storage and processing until machine learning and data science tools become more assessable (to the non-PHD data scientists among us) and mainstream and businesses learn how to apply these tools and disciplines effectively.
Machine Learning will provide the brains to go along with the Big Data muscle. In the long-run businesses will decide how much data to keep around based on statistical measures and best practices as they grow to understand their data and their business better as they build out developing their predictive and prescriptive analytics.
Posted by Sam Taha at 12:09:00 PM