Going on several years now, Hadoop has been touted as a replacement for traditional enterprise data warehouses, a position which has spawned debates between Hadoop providers and EDW vendors. Naturally, Hadoop vendors tout the program’s ability to store unstructured data – a resource which enterprises produce in droves – and its scalability as reasons for adoption. And given the recent Hadoop distributions from IBM, Oracle, and Teradata, it seems quite likely that Hadoop has entrenched itself as part of the EDW ecosystem, though it hasn’t yet totally replaced relational databases, or come to serve as an all-encompassing database management system.
For enterprises, the problem with Hadoop was always support: version control, vendor viability, and even basic customer support were all looming question marks over the open-source technology. Essentially the Hadoop framework wasn’t backed by the right muscle, making it a complicated, high risk move for enterprises to go with Hadoop, despite early adopters such as Yahoo and Facebook.
However, Hadoop didn’t slow down and with Hadoop specific vendors like Hortonworks and Cloudera supporting the technology, it was eventually adopted by many of the aforementioned EDW titans. But much of the industry buzz was around Hadoop’s direct competition with these same data warehouse vendors. Wasn’t Hadoop meant to slay these outmoded dinosaurs?
The truth is Hadoop is probably not ever going to truly replace enterprise data warehouses. The main issue has always been the tendency of warehouses to grow too large, and become unwieldy. Now, EDW vendors are moving to more modular forms of storage, which suites Hadoop’s ability to process data in multiple locations without having to move it. However, the relational database’s main role of storing and analyzing historical data in transactional form is still necessary, and EDWs still excel in that role.
It’s in this form that Hadoop looks set to continue on for the next several years: as a complement to an EDW storage clusters, rather than a straight swap. Why? Because Hadoop has several use cases that work well with EDWs, like loading unstructured data, which can then be processed in Hadoop before being shipped to the warehouse or stored for later use.
And even with applications like Hive and HBase that create warehouse-like infrastructures on top of Hadoop, it was always going to take the support of megavendors to legitimize the technology in the eyes of enterprise companies. The open-source nature of Hadoop ensured that implementing the software at scale would be difficult because of the enterprise’s reliance on support. Because while Hadoop is powerful, it’s not exactly intuitive, and signing service level agreements predicated on paying for support would be expensive for large organizations that are used to support being built in to their software contracts. Megavendor support also eliminates lingering questions about vendor viability.
As it stands, it seems Hadoop will not replace the enterprise data warehouse, but rather augment it. It also appears that BI buyers looking for data warehousing tools are growing weary of having the Hadoop-replaces-warehouses conversation.