The first round of the data revolution has focused around commoditizing computing and storage. Platforms such as Hadoop and NoSQL have helped to propel this and have enabled businesses to economically deploy more powerful scale out infrastructure than before. It has also changed and improved the way data warehousing and business intelligence is approached and managed. The storage and performance capabilities of Big Data have been a game changer. Traditional descriptive BI and reporting will never be the same. But this is just step one. The best is yet to come.
The industry is now going through a learning processes with how to manage all this data at massive scales. Storing and managing more data is great, but people and businesses will get smarter at how much data to keep as it starts to hurt more (hurt the pocketbook). How much data you keep and mine will depend on statistically driven best practices and not just about data warehousing or how big your HDFS cluster is. The mainstreaming of Big Data has provided the muscle to store and process massive amounts of data at near linear scale, but we will not see the real value of all this Big Data storage and processing until machine learning and data science tools become more assessable (to the non-PHD data scientists among us) and mainstream and businesses learn how to apply these tools and disciplines effectively.
Machine Learning will provide the brains to go along with the Big Data muscle. In the long-run businesses will decide how much data to keep around based on statistical measures and best practices as they grow to understand their data and their business better as they build out developing their predictive and prescriptive analytics.