Now that Big Data is a common buzzword, some people want to make Big Data projects for the sake of it. First, let’s remember that Big Data is mainly an architecture for storing and processing huge amount of fast changing and heterogeneous data. As you can expect, there are several requirements for a Big Data projects to be meaningful.
Even if you think that Big Data will solve all your issues, you still need to define your business problem. You should not start a Big Data project without having a clear business objective in mind. Second, you need to have those (big) data. If you plan to use internal data, then you need access to these data. This is one of the reason why most Big Data projects happen within big companies. When using external data, you should think about accessibility, gathering (API), pricing, etc.
Whether internal or external, you will have to worry about data quality. Big Data does not mean Big Signal. You cannot just collect all your log files or any tweet you find. You need to think about information present in these data and the way to unlock it. Basically, it means data aggregation and feature selection/extraction, among others.
If you can manage your data in a standard database, then a Big Data architecture may be costly and useless. You may try to hammer a nail with a shovel. There is a fuzzy threshold between a standard and a Big Data architecture, so it should still be studied carefully. It is also important to understand that Big Data is not analytics. Big Data doesn’t mean extracting knowledge from data, although it may support you in performing analytics on huge data sets.
To conclude, there are several situations where Big Data architecture are justifiable and useful. However, don’t forget that there are plenty of situations where Big Data projects are not appropriate. Whatever Big Data solution vendors may sell to you, there are conditions for a Big Data project to be profitable. Remember, a Big Data architecture doesn’t solve all common issues related to data-driven projects. It will allow you to handle larger volume of data more efficiently…but it is up to you to leverage these data for decision making.