Summary: Hadoop has emerged as somewhat of a “poster child” for the Big Data movement. Its ability to store and process massive amounts of data on commodity hardware has caught the eye of many businesses. But, while Hadoop holds massive potential for your business, it’s not without challenges. If you plan on adopting Hadoop in the near future, here are some hurdles you must address.
As data volumes rise, we’re seeing more businesses gravitate towards Hadoop. An open-source software framework, Hadoop helps businesses store and process massive amounts of data without purchasing expensive hardware.
How are businesses using Hadoop? In all sorts of ways. I’ve seen examples of businesses using Hadoop to find ideal prospects, prevent hardware failure, identify warning signs of security breaches, and so much more.
The fact is, this data explosion offers a huge opportunity. But, businesses can only use it as a competitive advantage if they can somehow capture and store this data. Since traditional databases aren’t built for “Big Data”, Hadoop provides the best means of accomplishing this goal.
But, while Hadoop offers numerous advantages, it comes with its fair share of challenges and hurdles. If your business plans on adopting Hadoop, you must first understand these challenges, and how to address each one. What are they? Here are the 5 biggest hurdles to Hadoop adoption:
1. Undefined value proposition
One of the biggest hurdles to Hadoop adoption has nothing to do with Hadoop from a technical standpoint. Business leaders aren’t clear on the value. Why should they devote time and resources to a project, if they don’t understand the payback?
A recent Gartner survey highlights this fact. Nearly half of the respondents claimed they weren’t adopting Hadoop because they weren’t sure how it would provide them with value.
What can you do about this? What value does Hadoop offer, and how can you communicate this value to business leaders?
One of the biggest reasons for this boils down to a simple fact: Many businesses don’t believe they have that much data. Yet in reality, they have access to more data than they realize–a fact we explored in a recent article. The first step to capitalizing on this data is capturing and storing it in Hadoop.
But, even if they do have the data volumes to justify Hadoop, many don’t act due to uncertainty. Business leaders aren’t sure how to capitalize on this data.
If you’re asking yourself that question, let’s answer it with another question: How are other companies capitalizing on Hadoop? While the list could go on, here’s a past article that explains just 7 real-life use cases of Hadoop. Hopefully that gives you some ideas as to the possibilities.
2. Finding good talent
What is the biggest hurdle to Hadoop adoption? According to the survey mentioned above, it’s the lack of Hadoop skills. Why are businesses having so much trouble finding qualified Big Data talent? As explained below, picking up Hadoop skills is more difficult than learning other technical skills.
“There is a big barrier to learning big data technology,” says Jeffrey Ricker, CEO of Ricker Lyman Robotic. “With most software, a developer just downloads the software to his laptop and starts hacking. You can’t do that with Hadoop. It requires a minimum of 4 servers to work. Most developers do not have four servers lying around that they can play with to learn a new technology. Cloud is an option, but it is not cheap. For most people, it is not a place to experiment. The barrier to learning is preventing the supply of developers from meeting the exploding demand for big data expertise.”
So, how can you bridge this skills gap? Besides the obvious answer of bringing in new talent, you have a couple of options:
1. Create your own skills: Training from within your business is cost-effective, and offers another valuable benefit: The trainees already know your business. This approach results in employees who know your business and Hadoop. To help you get started along this path, here’s a great list of free Hadoop training courses.
2. Find the right software: Big Data and Hadoop are still growing fields, but we’re starting to see products emerge that bridge the skills gap for you. For instance, a product like Splice Machine merges the traditional RDBMS with Hadoop–removing the skills gap entirely. Expect to see more offerings crop up that aim to ease the transition.
3. Hadoop distribution confusion
While Hadoop is free and open source software, some vendors have developed their own distributions. They do this to add new capabilities, improve the code base, and offer support. The problem: With a growing number of distributions, differentiating between all of them presents a challenge. How do you know which one to pick?
“There are many different Hadoop distributions, starting from freely available Hortonworks, Cloudera, MapR and ending with large commercial distributions like IBM InfoSphere BigInsights and Oracle Big Data Appliance,” says Sergey Tryuber, of Grid Dynamics. “Selecting the right distribution is not an easy task (even for experienced staff), since each of them embed different Hadoop components (like Cloudera Impala in CDH), configuration managers (Ambari, Cloudera Manager, etc.), and an overall vision of a Hadoop mission.”
So, how do you know which option works best for your business? Rather than get into all of the details in this article, here are a couple of articles that compare different distributions in detail.
1. Comparing the Top Hadoop Distributions
4. Data accessibility
Hadoop provides the framework to store and process data, but that data provides little value for the average business analyst (or business user) unless they can easily transform it into meaningful management information. The problem is, Hadoop was designed as a batch-processing tool. On its own, it offers little in the way of analytics for end users.
“Hadoop is getting increasingly adopted by enterprises because it provides a cost effective, scalable and flexible platform for bringing in all kinds of data sources and building a data repository or “data lake”,” says Ajay Anand, Vice President of products at Kyvos Insights. “However, Hadoop is not very accessible for the business user – it’s hard to use, and not designed to be interactive.”
What can you do about this? Fortunately, we’re seeing advancements in this area. Hadoop analytics is a growing area. Traditional BI vendors are adding Hadoop support to their offerings, and new Hadoop analytic vendors are cropping up. Expect this trend to increase in the coming years.
“It can be difficult to see how Hadoop will deliver business value if the perception is that Hadoop is a large and complex system only accessible by an elite group of IT staff,” says Tyler Wassell, Software Development Manager at mrc. “But if you look at the efforts of traditional BI vendors over the past couple years, you will see that they are quickly bringing Hadoop data analytics to the end user. Business users can now access Hadoop data just like they have accessed any other traditional data. They can answer complex questions, and gain new insights using data that has been captured, processed, and transformed in Hadoop.”
5. Hadoop integration and management
Will Hadoop replace your existing database? While some products offer this option, Hadoop is most often used in tandem with existing systems.
What does this mean? It means that you must integrate Hadoop with your existing systems–a challenge that is more difficult and time consuming for large Hadoop clusters. It also means you must devote more resources into managing your Hadoop infrastructure.
“A large cluster faces more unique problems specific to the organization’s workflow and data volumes,” says Mark Kerzner, Chief Product Architect at LexInnova. “One may have to optimize for performance, integrate with existing systems, correctly distribute the load between current and Hadoop infrastructure, and so on.”
What can you do about this? The answer depends on your database and systems you have in place. Fortunately, most database vendors do have tools and instructions for Hadoop integration. For those looking to manage their Hadoop infrastructure, this article lists some great Hadoop-related tools (and more) that might come in handy.
Now, these are just a few of the most common Hadoop hurdles. If you would like to add anything to this list, I’d love to hear it. Feel free to share in the comments.
If you enjoyed this article, sign up for email updates
We value your privacy. We will not spam you or share your email address with anyone. You're free to unsubscribe at any time.