cloudera | The JBGeorge Tech Blog

The HP Big Data Reference Architecture: It’s Worth Taking a Closer Look…

January 27, 2015 jbgeorge Leave a comment

This is a duplicate of the blog I’ve authored on the HP blog site at http://h30507.www3.hp.com/t5/Hyperscale-Computing-Blog/The-HP-Big-Data-Reference-Architecture-It-s-Worth-Taking-a/ba-p/179502#.VMfTrrHnb4Z

I recently posted a blog on the value that purpose-built products and solutions bring to the table, specifically around the HP ProLiant SL4540 and how it really steps up your game when it comes to big data, object storage, and other server based storage instances.

Last month, at the Discover event in Barcelona, we announced the revolutionary HP Big Data Reference Architecture – a major step forward in how we, as a community of users, do Hadoop and big data – and it is a stellar example of how purpose-built solutions can revolutionize how you accelerate IT technology, like big data. We’re proud that HP is leading the way in driving this new model of innovation, with the support and partnership of the leading voices in Hadoop today.

Here’s the quick version on what the HP Big Data Reference Architecture is all about:

Think about all the Hadoop clusters you’ve implemented in your environment – they could be pilot or production clusters, hosted by developer or business teams, and hosting a variety of applications. If you’re following standard Hadoop guidance, each instance is most likely a set of general purpose server nodes with local storage.

For example, your IT group may be running a 10 node Hadoop pilot on servers with local drives, your marketing team may have a 25 node Hadoop production cluster monitoring social media on similar servers with local drives, and perhaps similar for the web team tracking logs, the support team tracking customer cases, and sales projecting pipeline – each with their own set of compute + local storage instances.

There’s nothing wrong with that set up – It’s the standard configuration that most people use. And it works well.

However….

Just imagine if we made a few tweaks to that architecture.

What if we replaced the good-enough general purpose nodes, and replaced them with purpose-built nodes?
- For compute, what if we used HP Moonshot, which is purpose-built for maximum compute density and price performance?
- For storage, what if we used HP ProLiant SL4540, which is purpose-built for dense storage capacity, able to get over 3PB of capacity in a single rack?
What if we took all the individual silos of storage, and aggregated them into a single volume using the purpose-built SL4540? This way all the individual compute nodes would be pinging a single volume of storage.
And what if we ensured we were using some of the newer high speed Ethernet networking to interconnect the nodes?

Well, we did.

And the results are astounding.

While there is a very apparent cost benefit and easier management, there is a surprising bump in performance in terms of read and write.

It was a surprise to us in the labs, but we have validated it in a variety of test cases. It works, and it’s a big deal.

And Hadoop industry leaders agree.

“Apache Hadoop is evolving and it is important that the user and developer communities are included in how the IT infrastructure landscape is changing. As the leader in driving innovation of the Hadoop platform across the industry, Cloudera is working with and across the technology industry to enable organizations to derive business value from all of their data. We continue to extend our partnership with HP to provide our customers with an array of platform options for their enterprise data hub deployments. Customers today can choose to run Cloudera on several HP solutions, including the ultra-dense HP Moonshot, purpose-built HP ProLiant SL4540, and work-horse HP Proliant DL servers. Together, Cloudera and HP are collaborating on enabling customers to run Cloudera on the HP Big Data architecture, which will provide even more choice to organizations and allow them the flexibility to deploy an enterprise data hub on both traditional and newer infrastructure solutions.” – Tim Stevens, VP Business and Corporate Development, Cloudera

“We are pleased to work closely with HP to enable our joint customers’ journey towards their data lake with the HP Big Data Architecture. Through joint engineering with HP and our work within the Apache Hadoop community, HP customers will be able to take advantage of the latest innovations from the Hadoop community and the additional infrastructure flexibility and optimization of the HP Big Data Architecture.” – Mitch Ferguson, VP Corporate Business Development, Hortonworks

And this is just a sample of what HP is doing to think about “what’s next” when it comes to your IT architecture, Hadoop, and broader big data. There’s more that we’re working on to make your IT run better, and to lead the communities to improved experience with data.

If you’re just now considering a Hadoop implementation or if you’re deep into your journey with Hadoop, you really need to check into this, so here’s what you can do:

my pal, Greg Battas posted on the new architecture and goes technically deep into it, so give his blog a read to learn more about the details.
Hortonworks has also weighed in with their own blog.

If you’d like to learn more, you can check out the new published reference architectures that follow this design featuring HP Moonshot and ProLiant SL4540:

If you’re looking for even more information, reach out to your HP rep and mention the HP Big Data Reference Architecture. They can connect you with the right folks to have a deeper conversation on what’s new and innovative with HP, Hadoop, and big data. And, the fun is just getting started – stay tuned for more!

Until next time,

JOSEPH

@jbgeorge

Categories: big data, E-Progress, hadoop, Innovation, open source, Tech Tags: big data, cloudera, hadoop, hortonworks, HP, hp moonshot, open source, sl4540

Day 1: Big Data Innovation Summit 2014

April 10, 2014 jbgeorge Leave a comment

Hello from sunny, Santa Clara! BDIS Keynote Day 1

My team and I are here at the BIG DATA INNOVATION SUMMIT representing Dell (the company I work for), and it’s been a great day one.

I just wanted to take a few minutes to jot down some interesting ideas I heard today:

In Daniel Austin’s keynote, he addressed that the “Internet of things” should really be the “individual network of things” – highlighting that the number of devices, their connectivity, their availability, and their partitioning is what will be key in the future.
.
One data point that also came out of Daniel’s talk – every person is predicted to generate 20 PETABYTES of data over the course of a lifetime!
.
Juan Lavista of Bing hit on a number of key myths around big data:
- the most important part of big data is its size
- to do big data, all you need is Hadoop
- with big data, theory is no longer needed
- data scientists are always right 🙂

QUOTE OF THE DAY: “Correlation does not yield causation.” – Juan Lavista (Bing)

Anthony Scriffignano was quick to admonish the audience that “it’s not just about data, it’s not just about the math… [data] relationships matter.”
.
The state of Utah state government is taking a very progressive view to areas that analytics can help drive efficiency in at that level – census data use, welfare system fraud, etc. And it appears Utah is taking a leadership position in doing so.

I also had the privilege of moderating a panel on the topic of the convergence between HPC and the big data spaces, with representatives on the panel from Dell (Armando Acosta), Intel (Brent Gorda), and the Texas Advanced Computing Center (Niall Gaffney). Some great discussion about the connections between the two, plus tech talk on the Lustre plug-in and the SLURM resource management project.

Additionally, Dell product strategists Sanjeet Singh and Joey Jablonski presented on a number of real user implementations of big data and analytics technologies – from university student retention projects to building a true centralized, enterprise data hub. Extremely informative.

All in all, a great day one!

If you’re out here, stop by and visit us at the Dell booth. We’ll be showcasing our hadoop and big data solutions, as well as some of the analytics capabilities we offer.

(We’ll also be giving away a Dell tablet on Thursday at 1:30, so be sure to get entered into the drawing early.)

Stay tuned, and I’ll drop another update tomorrow.

Until next time,

JOSEPH
@jbgeorge

Categories: big data, E-Progress, hadoop, Innovation, open source Tags: analytics, big data, big data innovation summit, cloudera, hadoop, innovation, open source

Highlights from the 2012 Hadoop World

October 29, 2012 jbgeorge Leave a comment

Had a great time at last week’s Hadoop World, so wanted to write up a few of my thoughts from the event.

This year’s Hadoop World was the best attended to date – I believe I heard the attendee number to be at 2500 vs 1400 last year! It’s great to see this kind of growth among the community considering there were only 500 attendees just four years ago.
In some similarities to what I’m seeing in the OpenStack community, this conference seemed to boast more from the “user” ranks as opposed to just developers as in the recent past. It speaks volumes to the general adoption that Hadoop is seeing in the market.
Dell, the company I work for, and our Ecosystem Partner Datameer hosted a networking event for a number of folks at Hadoop World at the prestigious Circo NYC restaurant – great food and a great time with some innovative Hadoop implementers. Got to really get indepth how real people are implementing Hadoop in their enviornments today. Appreciate those that took the time out to attend, and for those who missed out, see you next time!
Cloudera announced their beta project called “Impala”, which allows users to perform real-time queries of their data, a feature that a number of Hadoop users have been anticipating. According to Cloudera, Impala can process queries up to 30 times faster than Hive / MapReduce – very cool, and I look forward to checking it out.
Finally, Dell made an announcement about our donation of “Zinc”, an ARM-based server concept to the Apache Software Foundation, with support from our partner, Calxeda, where we see ARM infrastructures as an interesting technology for Hadoop environments. The donation includes hosting and technical support for the Apache community. and we’re hosting the server concept at an Austin-based co-location. The Apache Hadoop project has actually performed more than a dozen builds within the first 24 hours of the servers’ deployment. (You can check out the full press release here to learn more.)

All in all, Hadoop World is another hit! It was a great event overall and I look forward to next year’s conference.

To learn more about the Dell Apache Hadoop Solution and more about what Dell is doing in this space, visit us at www.Dell.com/Hadoop.

And if you want to chat about how Dell can help you with your Hadoop initiative, drop me an email at Hadoop@Dell.com.

Until next time,

JOSEPH
@jbgeorge

Categories: big data, E-Progress, hadoop, Innovation, open source, Tech Tags: ARM, big data, cloudera, datameer, dell, dell apache hadoop solution, hadoop, hadoop world, open source, Strata, Zinc

THIS JUST IN: Dell Announces ARM Server Ecosystem and Acceleration Programs

May 29, 2012 jbgeorge Leave a comment

Dell's "Copper" ARM server

If you’ve followed some of the technology advances in the processor space, you’ve no doubt heard of the ARM architecture. We’ve seen ARM processors in a number of client devices, but they’ve not been widely adopted for server use due to additional feature needs, performance, and limited software ecosystem.

Well, today, Dell (the company I work for) announced some of the work we are doing behind ARM and ARM-based servers.

ARM, which stands for “advanced RISC machine,” is a 32-bit RISC instruction set architecture developed by ARM Holdings. In the server context, it can allow systems to be deployed at the chip level to reduce space, power consumption and cost.

Dell has been testing with ARM since 2010, and has been working with customers to understand how they could benefit from the ARM architecture, as well as what their expectations were regarding ARM-based servers.

Dell “Copper” ARM-based server

Today, we announced development of “Copper”, an ARM-based microserver, optimized for the current maturity of the ARM server market, which is primarily focused on test / dev and ARM-based technology to test and optimize code. And for this predominant use case, the Copper server is a great fit in terms of size and costwith its lightweight design, low-power-consumption and excellent density.

Dell’s Copper server is specifically designed for this market, with a small acquisition size and price, lower power consumption, and ease of use, and if you’ve followed our open source solution to date, you know that enabling open source development is important to us.

Enabling customers and the ecosystem

Testing to date has found compelling performance per dollar and performance per watt advantages for workloads like LAMP stack based web front ends and Hadoop applications. In that vein, we’ve partnered with TACC, the Texas Advanced Computing Center to help work through workloads, use cases, etc. We are also working with key open source partners like Canonical and Cloudera to help drive this space as well.

And for those who follow our Cloud and Big Data solutions (like the Dell OpenStack-Powered Cloud Solutions, the Dell Cloudera Hadoop Solution, and Crowbar), you’ll be happy to know what our team intends to enable ARM in Crowbar as well. It has proven to be a great tool in deploying our solutions on bare metal and managing the overall solutions that we see a great fit for it in the ARM use case.

We’re obviously very excited about this announcement, so I’d welcome you to check out the links on the ARM / Copper announcement at the end of this blog and in the video below.

Until next time,

JBGeorge
@jbgeorge

More info:

Learn more about what Dell is doing with ARM and the new “Copper Server” – www.Dell.com/ARMserver
Learn more about ARM (Wikipedia)
Learn more about Dell and Hadoop – www.Dell.com/Hadoop
Learn more about Dell Crowbar – www.Dell.com/Crowbar

Categories: big data, hadoop, Innovation, open source, Tech Tags: ARM, canonical, cloudera, crowbar, dell, Dell Cloudera Hadoop Solution, LAMPstack, TACC

Play Ball! Hadoop Players Sponsor Big Data Event in Chicago

May 4, 2012 jbgeorge Leave a comment

A beautiful day at Wrigley Field

What does data analytics have to do with baseball????

Well actually, quite a bit. Moneyball anyone?

(If you haven’t seen it, I highly recommend it. A true story adaption about Billy Beane and the Oakland A’s using intense number crunching to build a solid baseball team in a smaller market, competing with bigger markets – and bigger salaries.)

Great crowd at the ball game! The Technology

Last week, I had the pleasure of representing Dell (the company I work for), as we joined Intel, Cloudera, and Clarity to meet with a number of customers at the Ivy League Baseball Club across from Wrigley Field, right before the Cubs – Cardinals game. It was great to talk to customers who were using Hadoop, as well as those that were just learning about the technology.

The presentation delivered by all four companies focused on the Dell Apache Hadoop Solution, a powerful packaged solution that features

A reference architecture featuring Intel technology
A set of software which includes Cloudera’s CDH distribution (with option to upgrade to Cloudera Enterprise), along with Dell’s innovative Crowbar software framework to enable easy provisioing and management
Services provided by a combination of Dell, Cloudera, and Clarity, to provide our customers with deployment, support, and consulting services

The Experience

Even more impactful than the presentation was the more 1:1 time after the presentation, where many users and newbies shared stories, experiences, best practices, etc. Got to hear about a lot of the struggles around “going it alone”, and enthusiasm that Dell and our partners were delivering a solution that would make that a bit simpler.

Here’s a sampling of some of the topics that came up.

Why should I care about big data / hadoop?

Here’s the thing: you have data. It’s in your sales tracking system, from your website traffic, from your social media outlets, in your customer support databases, and more. And not only do you have data, you have A LOT of data. But here’s the power of data. Your company has strategic objectives, customer strategies, and product plans. Data gives you insight into how to best spend your resources, where to focus your product development, where your customers are buying your products, and what problems they are encountering. This enables your business to make intelligent decisions to better satisfy your customers.

I already have a data warehousing solution – what’s the benefit of hadoop?

Many analytics solutions today require data to be in a format that adheres to the standards of a relational database (aka structured data). This is fine for data that conforms to this format. However, a lot of the new data that is available to us is not formatted in that manner – this is referred to as unstructured data. Unstructured data includes data types, such as audio, video, graphics, log files, etc. Hadoop as a technology handles unstructured data very well, allowing for analysis of those types of data. Additionally, a number of the traditional enterprise level analytics solutions are building hadoop connectors to allow for hadoop processed data to be utilized by the enterprise tool set. Finally, as data scales, using an open source based technology like Hadoop makes things very cost efficient.

How does the Dell Apache Hadoop Solution help me with hadoop?

Before this solution was made available, many of our Dell customers came to us asking, “If Dell was going to build a hadoop solution, how would you design it?” And this was how we started down the path of hadoop. What we discovered was many customers had pockets of hadoop projects in their companies, but progress was at a crawl. Many of the issues were around infrastructure design, deployment, and overall general help around the technology. And that is the basis for the Dell Apache Hadoop Solution – making hadoop accessible, quick, and simple to deploy from bare metal and get to a functional hadoop cluster asap. We’ve enabled many of these customers to go from a science experiment to a productive Hadoop instance very quickly, and provide them the consulting and education they need to maximize its benefit.

You can learn more about what Dell is doing with Hadoop at www.Dell.com/Hadoop or you can drop me an email at Hadoop@Dell.com.

The Game

For those of you not interested in sports, you can now tune your TV’s off – about to talk baseball for a bit.

As far as the game went, it was a doozy. I have ties to Chicago, so I was rooting for the Cubs. Play Ball

The Cubs were up 1-0 most of the game until the top of the 8th when Cardinal Matt Holliday knocked out a 2 run homer
Trailing in the bottom of the 9th, Cubs first baseman Bryan Lahair hit a homer to tie it up 2-2, and take us into extra innings
Here’s where the fireworks really began!
Bottom of the 10th
- Cubs LF Tony Campana gets on base with a single
- Campana then tries to steal 2nd and barely makes it
- Cardinals coach Matt Matheny did not agree and made a federal case out of it with the 2nd base umpire
- And out goes Matheny – ejected!
- Cardinals walked Lahair
- With two men on base, Cubs LF Alfonso Soriano gets a single and drives Campana home for the 3-2 win!
- Prior to this, the Cardinals had beaten the Cubs in the LAST THIRTEEN SERIES between the two clubs. With this win, that streak has been broken.

Great game, great crowd, great partners! Thanks to everyone who came out. I look forward to the next one. 🙂

Until next time,

JBGeorge
@jbgeorge

Categories: big data, E-Progress, hadoop, Innovation, open source, Tech Tags: apache hadoop, big data, cdh, Clarity, cloudera, dell, dell apache hadoop solution, hadoop, intel, open source

THIS JUST IN: Dell Open Sources Hadoop Barclamps for Crowbar

November 29, 2011 jbgeorge Leave a comment

Have you been plugged into what Dell is doing in the open source big data space, specifically around Hadoop?

A few months ago, Dell (the company I work for) announced the Dell | Cloudera Hadoop Solution to provide our customers with superfast data mining, processing and analysis, and featured the Dell-developed Crowbar software framework to deploy / manage it.

To make it even cooler, we went ahead and open sourced Crowbar (Apache 2).

And earlier this month, at Hadoop World, we announced our intent to open source a few Hadoop barclamps (Crowbar modules that perform a specific function like configure BIOS).

Well, that day has arrived – the Hadoop barclamps are now open sourced.

The details

Specifically, the newly open sourced barclamps help with the following Hadoop projects:

Cloudera CDH / Enterprise enables Hadoop administrators to first deploy CDH then easily move to Cloudera Enterprise if/when the business needs warrant.
Zookeeper allows Apache Hadoop administrators to track and coordinate distributed applications.
Apache Pig provides a compiler that produces sequences of Map-Reduce programs.
Flume provides an agent for collecting data and putting into the Hadoop environment.
Sqoop allows rapid connection to external data sources including relational databases. E.g., data can be moved from Oracle to Hadoop and back again for different types of analysis.

Videos

Our team has put together a number of videos to help provide you more detail on what these new barclamps can do – check them out.

Hadoop Demo: http://www.youtube.com/watch?v=eilF16KqRmg
Install Crowbar: http://youtu.be/WAUKMlawrPw
Basic Build: http://youtu.be/FsOBaAiDgYs
Advanced Build: http://youtu.be/qvsfXPH5k5Q

Get those barclamps!

Ready to get the barclamps? Head over to https://github.com/dellcloudedge/barclamp-hadoop to download them.

If you’re interested in learning more about the Dell innovation happening in the Hadoop space, visit us at www.Dell.com/Hadoop or email us at Hadoop@Dell.com

Until next time,

JBGeorge
@jbgeorge

Categories: big data, hadoop, Innovation, open source Tags: barclamp, cloudera, crowbar, dell, dell apache hadoop solution, dell cloudera solution, flume, hbase, sqoop, zookeeper

Big News: #Dell and #Cloudera Partner on #Hadoop Solution

August 4, 2011 jbgeorge 1 comment

Big news!

Today, Dell (the company I work for) has announced a solution offering in conjunction with Cloudera, called the Dell | Cloudera Solution for Apache Hadoop! It’s a fully deployable Hadoop solution made up of Dell PowerEdge C hardware, Dell and Cloudera software, and Dell and Cloudera services – the complete package.

(Note that this is the second solution that my team has put out in the last two weeks – if you missed last week’s OpenStack announcement, you can get more info here.)

But on to the star of today’s show!

We’d often hear from our customers that they were experimenting with Hadoop on Dell hardware to analyze sales data, marketing information, etc in order to better serve their own customers. And pretty soon, we were being asked, “What’s Dell’s point of view? What recommendations do you have for distributions, architecture, and implementation?”

Well, here’s the answer. The Dell | Cloudera Solution for Apache Hadoop.

What’s Under the Hood

Hardware – similar to OpenStack, the reference architecture that this solution is built on is the innovative Dell PowerEdge C server line. Based out of Dell’s well known Data Center Solutions team (aka DCS), PowerEdge C server technology was born out of learnings from custom engagements with some of the biggest cloud and hyperscale environments in the world. What we discovered is that, despite custom requirements and specs, there was some commonality among the requests. Concepts like density, power efficiency, etc were at the top of the list when it came to these environments.

So we took what was common and built a line of servers that was ideal for cloud and hyperscale environments, which became the Dell PowerEdge C server family. And it is an ideal platform for our Hadoop solution, as all the tenets of scale apply when building out an optimal Hadoop instance.
Software – Our partners, Cloudera, who are among the most established Hadoop vendors in the market, will be providing their distributions of the Hadoop software, as well as their CDH management tools, which are top notch. In addition, Dell will be providing our Crowbar software framework and a specific Cloudera barclamp as part of the solution. (As a refresher, we announced availability of Crowbar last week as part of our OpenStack announcement, which helps speed deployment CONSIDERABLY. In the case of the Dell Cloudera Solution for Apache Hadoop, it can speed bare metal deployment of a Hadoop cluster from days or weeks to less than one day!)
Services – Dell and Cloudera are partnering to enable our customers to get going on a validated instance of Hadoop, and offer a variety of services from support of the entire stack of hardware and software, as well as training on the technologies that will enable our customers on Hadoop.

Now’s the time to get started on Hadoop, and the Dell | Cloudera Solution for Apache Hadoop can get you there.

Find out more about this new solution at www.Dell.com/Hadoop or get in touch with us at Hadoop@Dell.com.

More info:

The Dell Cloudera solution website – http://www.Dell.com/Hadoop
Barton George’s blog – http://bartongeorge.net/2011/08/04/introducing-the-dell-cloudera-solution-for-hadoop-harnessing-the-power-of-big-data/
Rob Hirschfeld’s blog – http://www.RobHirschfeld.com/2011/08/04/dell-cloudera-hadoop
My recent blog about Crowbar – https://jbgeorge.net/2011/08/01/the-dell-crowbar-software-framework-who-what-when-where-and-how/
Email us for more info at Hadoop@Dell.com

Until next time,

JBG
@jbgeorge

Categories: Cloud Computing, E-Progress, hadoop, Innovation, open source, Tech Tags: big data, cloudera, crowbar, dell, dell cloudera solution for apache hadoop, hadoop, hdfs, mapreduce

The JBGeorge Tech Blog

Archive

The HP Big Data Reference Architecture: It’s Worth Taking a Closer Look…

Day 1: Big Data Innovation Summit 2014

Highlights from the 2012 Hadoop World

THIS JUST IN: Dell Announces ARM Server Ecosystem and Acceleration Programs

Play Ball! Hadoop Players Sponsor Big Data Event in Chicago

THIS JUST IN: Dell Open Sources Hadoop Barclamps for Crowbar

Big News: #Dell and #Cloudera Partner on #Hadoop Solution

JBG Tweets

Blogroll