Think you need Hadoop? Think again

Today’s post is written by Rick DelGado. His guest post is about the need of Hadoop. Thanks Rick for your input.

In the big data buzz, Hadoop has been the big data solution of choice leaving many feeling like Hadoop is their only option for harnessing big data. However, there are many other big data options out there that offer different features than Hadoop, and may actually fit your business needs better. Flash array storage, in particular, has made it easier to create fast, affordable storage options, so check out these other big data solutions before settling on Hadoop.

HPCC Systems

This system developed by LexisNexis is similar to Hadoop in that it is used to build clusters of servers for the purpose of analysing large data sets. HPCC uses Enterprise Control Language to make the process of writing parallel-processing workflows easier, and, also like Hadoop, has an ecosystem of tools built around it including the Roxy Rapid Data Delivery Cluster, a data warehouse similar to HBase, and the Thor Data Refinery Cluster, a data processor.

While still in its preliminary stages, HPCC’s similarities to Hadoop make it a running alternative as a big data solution.

Storm

Storm, first developed by Backtype before being purchased by Twitter, was dubbed the “Hadoop of real time processing” in a blog post by Nathan Marz. This distinction was made because Hadoop is a batch processor that works with fixed data. Storm, on the other hand, can process data as it streams. Of course, the real-time issue is being addressed in Hadoop providing some competition for Storm.

Disco Project

Developed by Nokia Research, Disco Project has been around for a while without a lot of attention. With Disco Project, data is distributed and replicated in a similar manner to Hadoop, and Disco has job scheduling features. However Disco Project doesn’t use its own file system. The advantage of Disco is that it’s backend is written in Erlang, a language with support for fault tolerance, distribution and concurrency.

Spark

Spark was specifically created by UC Berkeley to make it faster to write and run data analytics and is one of the newest options on the market. A key difference from other MapReduce systems is that Spark permits in-memory querying of data instead of disk I/O. Spark also performs better than Hadoop on several iterative algorithms and is written in Scala—an object-oriented language that permits users to make queries directly from the Scala interpreter.

GraphLab

GraphLab was created to make designing and implementing parallel machine learning algorithms easier. It varies from MapReduce in that it has an update phase that can read and modify data sets that overlap while MapReduce requires that all data sets be separated. GraphLab also offers its own version of the reduce stage called sync operation in which the output is global rather than local.

Microsoft Alternatives

Microsoft offers three Hadoop Alternatives.

Azure Table Storage is offered in Microsoft’s cloud and is meant to serve as an alternative data store to the one provided by Hadoop. It is not an analytics system.

LINQ to HPC allows you to build clusters of servers in Microsoft’s programming language and complete data analytics with unstructured data just like Hadoop does.

Azure Project Daytona is a research project that is based on MapReduce. It runs as a service that provides ready-to-use algorithms and can be delivered through Excel.

As you can see, there is actually plenty to choose from when looking for a big data solution, and an added benefit is that many of these tools can work together, including with Hadoop, to create a customized solution for your individual needs.

Short bio

I’ve been blessed to have a successful career and have recently taken a step back to pursue my passion of writing. I’ve started doing freelance writing and I love to write about new technologies and how it can help us and our planet.” – Rick DelGado

Share

Recommended Reading

Comments Icon140 comments found on “Think you need Hadoop? Think again

  1. I have read your blog and I got very useful and knowledgeable information from your blog. It’s really a very nice article. You have done a great job

  2. In your blog I was happy to see your article, better than last time, and have made great progress, I am very pleased. I am looking forward to your article will become better and better.

  3. Really impressive post. I read it whole and going to share it with my social circules. I enjoyed your article and planning to rewrite it on my own blog.

  4. I think it will look a lot more nice. I think school will have all most everything in it use technology. I think that schools will look a lot different.

  5. “We … are very happy with your editing. Your rewriting reduced the length and
    also improved clarity of the sentences. Most importantly, you kept the major…

  6. Dear Rick DelGado, Thank you for having taken your time to provide us with your valuable information relating to your stay with us.we are sincerely concerned.., Most importantly, you Keepit the major

  7. This is one of the best blog that i have comes across till now. hadoop is considered as the best technology that is emerging nowadays and the prerequisite for learning hadoop is java. As a fresher level candidates you should have a through knowledge with JAVA. So join our best Java Training in Chennai to get more technical knowledge

  8. I read your articles very excellent and the i agree our all points because all is very good information provided this through in the post.
    It is very helpful for me.

  9. Great explanation to given on this post and i read our full content was really amazing, then the more important in my part of life.

  10. Great tips, and awesome way to get exert tips from everyone – you are the master of content.Keep update more information..

  11. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing.

  12. he Executive Committee votes to approve all proposed changes to a specification to be carried out immediately or reject the changes and thus either require the Maintenance Lead to submit a revised list of changes, or defer the changes until the specification can be revised by an expert group in a new JSR. Challenges to one or more tests in a specification’s Technology Compatibility Kit are ultimately decided by the Executive Committee if they cannot be otherwise resolved.

    Java Training Institutes

Comments are closed.