As you may have read in my past posts, I’m not a Big Fan of the term Big Data. However , I still want to better understand what is behind this term (since it seems nobody agrees). It’s still not clear to me if it means a new way to manage and aggregate data or if it means analytics on (very) large, heterogeneous and real-time data. Thus, I recently went to a Big Data event in Lausanne, the Swiss Big Data User Group (by the way, the event was nice).
My first conclusion is that Big Data is not analytics. Some presentations, like the one from CERN, showed use cases with huge amount of data to deal with. Can you imagine: 150 millions of sensors getting data 40 millions time per second! This means 300Tb of data per second. Of course, they first filter the data and eventually collect “only” 300Mb per second. However, there was no example of analytics (predictive or descriptive). The goal for the IT is to collect, aggregate and store Big amount of Data. Of course, analytics can be a second step once you have these data (I’m sure CERN data are the starting point for several analytics projects). But the point is that analytics is not always a second part of a Big Data initiative.
Another presentation, from Urturn, showed how to use Big Data strategies to support a social network. Here again, huge amount of data, but no mining (at least to my knowledge). The event comprised two other presentations from Brainserve and Neo Technology. To conclude, I have learned that Big Data can be done without analytics in mind. But this may be a minority of cases. As soon as new data sets, whatever their size and diversity, have been put in place thanks to Big Data projects, analytics should be considered.
Of course, these thoughts are based on my personal view of Big Data, through events, blog posts and white papers. Feel free to add your opinions and comments about what Big Data is… or is not 🙂