Think you need Hadoop? Think again

January 25, 2014 by Sandro Saitta
Filed under: Uncategorized 

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

Today’s post is written by Rick DelGado. His guest post is about the need of Hadoop. Thanks Rick for your input.

In the big data buzz, Hadoop has been the big data solution of choice leaving many feeling like Hadoop is their only option for harnessing big data. However, there are many other big data options out there that offer different features than Hadoop, and may actually fit your business needs better. Flash array storage, in particular, has made it easier to create fast, affordable storage options, so check out these other big data solutions before settling on Hadoop.

HPCC Systems

This system developed by LexisNexis is similar to Hadoop in that it is used to build clusters of servers for the purpose of analysing large data sets. HPCC uses Enterprise Control Language to make the process of writing parallel-processing workflows easier, and, also like Hadoop, has an ecosystem of tools built around it including the Roxy Rapid Data Delivery Cluster, a data warehouse similar to HBase, and the Thor Data Refinery Cluster, a data processor.

While still in its preliminary stages, HPCC’s similarities to Hadoop make it a running alternative as a big data solution.

Storm

Storm, first developed by Backtype before being purchased by Twitter, was dubbed the “Hadoop of real time processing” in a blog post by Nathan Marz. This distinction was made because Hadoop is a batch processor that works with fixed data. Storm, on the other hand, can process data as it streams. Of course, the real-time issue is being addressed in Hadoop providing some competition for Storm.

Disco Project

Developed by Nokia Research, Disco Project has been around for a while without a lot of attention. With Disco Project, data is distributed and replicated in a similar manner to Hadoop, and Disco has job scheduling features. However Disco Project doesn’t use its own file system. The advantage of Disco is that it’s backend is written in Erlang, a language with support for fault tolerance, distribution and concurrency.

Spark

Spark was specifically created by UC Berkeley to make it faster to write and run data analytics and is one of the newest options on the market. A key difference from other MapReduce systems is that Spark permits in-memory querying of data instead of disk I/O. Spark also performs better than Hadoop on several iterative algorithms and is written in Scala—an object-oriented language that permits users to make queries directly from the Scala interpreter.

GraphLab

GraphLab was created to make designing and implementing parallel machine learning algorithms easier. It varies from MapReduce in that it has an update phase that can read and modify data sets that overlap while MapReduce requires that all data sets be separated. GraphLab also offers its own version of the reduce stage called sync operation in which the output is global rather than local.

Microsoft Alternatives

Microsoft offers three Hadoop Alternatives.

Azure Table Storage is offered in Microsoft’s cloud and is meant to serve as an alternative data store to the one provided by Hadoop. It is not an analytics system.

LINQ to HPC allows you to build clusters of servers in Microsoft’s programming language and complete data analytics with unstructured data just like Hadoop does.

Azure Project Daytona is a research project that is based on MapReduce. It runs as a service that provides ready-to-use algorithms and can be delivered through Excel.

As you can see, there is actually plenty to choose from when looking for a big data solution, and an added benefit is that many of these tools can work together, including with Hadoop, to create a customized solution for your individual needs.

Short bio

I’ve been blessed to have a successful career and have recently taken a step back to pursue my passion of writing. I’ve started doing freelance writing and I love to write about new technologies and how it can help us and our planet.” – Rick DelGado

No TweetBacks yet. (Be the first to Tweet this post)
  • Share/Bookmark

Comments

5 Comments on Think you need Hadoop? Think again

  1. George on Mon, 27th Jan 2014 6:55 am
  2. You forgot stratosphere.

  3. Durai on Fri, 31st Jan 2014 8:49 am
  4. With HPCC, the Roxy is comparable to Hive rather HBase.

  5. Mohamed Goneam on Sun, 2nd Feb 2014 9:13 pm
  6. Thanks for sharing, its help me alot.
    below site maybe assist you

    Computek International

    Regards

  7. logesh on Mon, 3rd Feb 2014 7:35 am
  8. Great Blog with lot of tips on data mining. It will be more helpful blog for beginners to know more about it. Once again thanks for sharing this blog!

  9. Nick Wilson on Fri, 18th Apr 2014 2:58 pm
  10. SciDB also serves as a viable alternative to Hadoop, particulary as an analytical tool. It is a horizontally-scalable open-source technology specifically designed for answering data-intensive questions.

    Unlike a traditional DBMS which stores data in two-dimensional tables of rows and columns, SciDB holds data in multidimensional arrays. This architecture naturally facilitates the linear algebra computation which undergirds many statistical and machine learning techniques, such as linear regression, PCA, and SVD. Furthermore, SciDB offers interfaces familiar to analysts: R and Python. See http://www.scidb.org for details.

Tell me what you're thinking...





  • Swiss Association for Analytics

  • T-shirts, Mugs & Mousepads


    All benefits given to a charity association
  • Data Mining Search Engine

    Supported by AnalyticBridge

  • Archives

  • Reading Recommandations