Going on several years now, Hadoop has been touted as a replacement for traditional enterprise data warehouses, a position which has spawned debates between Hadoop providers and EDW vendors. Naturally, Hadoop vendors tout the program’s ability to store unstructured data – a resource which enterprises produce in droves – and its scalability as reasons for adoption. And given the recent Hadoop distributions from IBM, Oracle, and Teradata, it seems quite likely that Hadoop has entrenched itself as part of the EDW ecosystem, though it hasn’t yet totally replaced relational databases, or come to serve as an all-encompassing database management system.
For enterprises, the problem with Hadoop was always support: version control, vendor viability, and even basic customer support were all looming question marks over the open-source technology. Essentially the Hadoop framework wasn’t backed by the right muscle, making it a complicated, high risk move for enterprises to go with Hadoop, despite early adopters such as Yahoo and Facebook.
However, Hadoop didn’t slow down and with Hadoop specific vendors like Hortonworks and Cloudera supporting the technology, it was eventually adopted by many of the aforementioned EDW titans. But much of the industry buzz was around Hadoop’s direct competition with these same data warehouse vendors. Wasn’t Hadoop meant to slay these outmoded dinosaurs?
The truth is Hadoop is probably not ever going to truly replace enterprise data warehouses. The main issue has always been the tendency of warehouses to grow too large, and become unwieldy. Now, EDW vendors are moving to more modular forms of storage, which suites Hadoop’s ability to process data in multiple locations without having to move it. However, the relational database’s main role of storing and analyzing historical data in transactional form is still necessary, and EDWs still excel in that role.
It’s in this form that Hadoop looks set to continue on for the next several years: as a complement to an EDW storage clusters, rather than a straight swap. Why? Because Hadoop has several use cases that work well with EDWs, like loading unstructured data, which can then be processed in Hadoop before being shipped to the warehouse or stored for later use.
And even with applications like Hive and HBase that create warehouse-like infrastructures on top of Hadoop, it was always going to take the support of megavendors to legitimize the technology in the eyes of enterprise companies. The open-source nature of Hadoop ensured that implementing the software at scale would be difficult because of the enterprise’s reliance on support. Because while Hadoop is powerful, it’s not exactly intuitive, and signing service level agreements predicated on paying for support would be expensive for large organizations that are used to support being built in to their software contracts. Megavendor support also eliminates lingering questions about vendor viability.
As it stands, it seems Hadoop will not replace the enterprise data warehouse, but rather augment it. It also appears that BI buyers looking for data warehousing tools are growing weary of having the Hadoop-replaces-warehouses conversation.
Short bio:The Value of Business Analytics, Evan Stubbs publishes another book about business analytics: Delivering Business Analytics. In summary, Evan wrote a book with good practical advices, although most of them are not strictly related to analytics. Evan starts by defining "business analytics" (BA) with a clear distinction from "analytics". He continues by explaining why BA is useful (which was… Continue reading...
This short book by Max Shron is a good reference to learn about key concepts behind data-driven decision making. One of the most important notion of the book is the emphasis on asking the right question. Indeed, you shouldn't start with the data, but rather with the question (the problem to solve). The book is full of case studies which… Continue reading...Program of the event: - Introduction, Vincent Schickel… Continue reading...Foster Provost and Tom Fawcett are known for their work on fraud detection, among others. I have recently read their last book, Data Science for Business - What you need to know about data mining and data-analytic thinking. No suspense: it's one of the best data mining book I have ever read. Its style allow it to be read by… Continue reading... | 2 Comments
Data Mining Research: Could you introduce yourself and tell us how you entered the world of data-driven decision making?
Philippe Nieuwbourg: Long story... Once upon a time... 20 years ago... I discovered what was called "reporting and analysis", through… Continue reading...
Source: spotfire.tibco.comI'm now wondering if… Continue reading... | 2 Comments