Archive

Posts Tagged ‘hadoop’

The HP Big Data Reference Architecture: It’s Worth Taking a Closer Look…

January 27, 2015 Leave a comment

This is a duplicate of the blog I’ve authored on the HP blog site at http://h30507.www3.hp.com/t5/Hyperscale-Computing-Blog/The-HP-Big-Data-Reference-Architecture-It-s-Worth-Taking-a/ba-p/179502#.VMfTrrHnb4Z

I recently posted a blog on the value that purpose-built products and solutions bring to the table, specifically around the HP ProLiant SL4540 and how it really steps up your game when it comes to big data, object storage, and other server based storage instances.

Last month, at the Discover event in Barcelona, we announced the revolutionary HP Big Data Reference Architecture – a major step forward in how we, as a community of users, do Hadoop and big data – and it is a stellar example of how purpose-built solutions can revolutionize how you accelerate IT technology, like big data.   We’re proud that HP is leading the way in driving this new model of innovation, with the support and partnership of the leading voices in Hadoop today.

Here’s the quick version on what the HP Big Data Reference Architecture is all about:

Think about all the Hadoop clusters you’ve implemented in your environment – they could be pilot or production clusters, hosted by developer or business teams, and hosting a variety of applications.  If you’re following standard Hadoop guidance, each instance is most likely a set of general purpose server nodes with local storage.

For example, your IT group may be running a 10 node Hadoop pilot on servers with local drives, your marketing team may have a 25 node Hadoop production cluster monitoring social media on similar servers with local drives, and perhaps similar for the web team tracking logs, the support team tracking customer cases, and sales projecting pipeline – each with their own set of compute + local storage instances.

There’s nothing wrong with that set up – It’s the standard configuration that most people use.  And it works well.

However….

Just imagine if we made a few tweaks to that architecture.

  • What if we replaced the good-enough general purpose nodes, and replaced them with purpose-built nodes?
    • For compute, what if we used HP Moonshot, which is purpose-built for maximum compute density and  price performance?
    • For storage, what if we used HP ProLiant SL4540, which is purpose-built for dense storage capacity, able to get over 3PB of capacity in a single rack?
  • What if we took all the individual silos of storage, and aggregated them into a single volume using the purpose-built SL4540?  This way all the individual compute nodes would be pinging a single volume of storage.
  • And what if we ensured we were using some of the newer high speed Ethernet networking to interconnect the nodes?

Well, we did.

And the results are astounding.

While there is a very apparent cost benefit and easier management, there is a surprising bump in performance in terms of read and write. 

It was a surprise to us in the labs, but we have validated it in a variety of test cases.  It works, and it’s a big deal.

And Hadoop industry leaders agree.

“Apache Hadoop is evolving and it is important that the user and developer communities are included in how the IT infrastructure landscape is changing.  As the leader in driving innovation of the Hadoop platform across the industry, Cloudera is working with and across the technology industry to enable organizations to derive business value from all of their data.  We continue to extend our partnership with HP to provide our customers with an array of platform options for their enterprise data hub deployments.  Customers today can choose to run Cloudera on several HP solutions, including the ultra-dense HP Moonshot, purpose-built HP ProLiant SL4540, and work-horse HP Proliant DL servers.  Together, Cloudera and HP are collaborating on enabling customers to run Cloudera on the HP Big Data architecture, which will provide even more choice to organizations and allow them the flexibility to deploy an enterprise data hub on both traditional and newer infrastructure solutions.” – Tim Stevens, VP Business and Corporate Development, Cloudera

“We are pleased to work closely with HP to enable our joint customers’ journey towards their data lake with the HP Big Data Architecture. Through joint engineering with HP and our work within the Apache Hadoop community, HP customers will be able to take advantage of the latest innovations from the Hadoop community and the additional infrastructure flexibility and optimization of the HP Big Data Architecture.” – Mitch Ferguson, VP Corporate Business Development, Hortonworks

And this is just a sample of what HP is doing to think about “what’s next” when it comes to your IT architecture, Hadoop, and broader big data.  There’s more that we’re working on to make your IT run better, and to lead the communities to improved experience with data.

If you’re just now considering a Hadoop implementation or if you’re deep into your journey with Hadoop, you really need to check into this, so here’s what you can do:

  • my pal, Greg Battas posted on the new architecture and goes technically deep into it, so give his blog a read to learn more about the details.
  • Hortonworks has also weighed in with their own blog.

If you’d like to learn more, you can check out the new published reference architectures that follow this design featuring HP Moonshot and ProLiant SL4540:

If you’re looking for even more information, reach out to your HP rep and mention the HP Big Data Reference Architecture.  They can connect you with the right folks to have a deeper conversation on what’s new and innovative with HP, Hadoop, and big data. And, the fun is just getting started – stay tuned for more!

Until next time,

JOSEPH

@jbgeorge

Advertisements

Purpose-Built Solutions Make a Big Difference In Extracting Data Insights: HP ProLiant SL4500

October 20, 2014 Leave a comment

This is a duplicate of the blog I’ve authored on the HP blog site at http://h30507.www3.hp.com/t5/Hyperscale-Computing-Blog/Purpose-Built-Solutions-Make-a-Big-Difference-In-Extracting-Data/ba-p/173222#.VEUdYrEo70c

Indulge me as I flash back to the summer of 2012 at the Aquatics Center in London, England – it’s the Summer Olympics, where some of the world’s top swimmers, representing a host of nations, are about to kick off the Men’s 100m Freestyle swimming competition. The starter gun fires, and the athletes give it their all in a heated head to head match for the gold.

And the results of the race are astounding: USA’s Nathan Adrian took the gold medal with a time of 47.52 seconds, with Australia’s James Magnussen finishing a mere 0.01 seconds later to claim the silver medal! It was an incredible display of competition, and a real testament to power of the human spirit.

For an event demanding such precise timing, we can only assume that very sensitive and highly calibrated measuring devices were used to capture accurate results. And it’s a good thing they did – fractions of a second separated first and second place.

Now, you and I have both measured time before – we’ve checked our watches to see how long it has been since the workday started, we’ve used our cell phones to see how long we’ve been on the phone, and so on. It got the job done. Surely the Olympic judges at the 2012 Men’s 100m Freestyle had some of these less precise options available – why didn’t they just simply huddle around one of their wrist watches to determine the winner of the gold, silver and bronze?

OK, I am clearly taking this analogy to a silly extent to make a point.

When you get serious about something, you have to step up your game and secure the tools you need to ensure the job gets done properly.

There is a real science behind using purpose-built tools to solve complex challenges, and the same is true with IT challenges, such as those addressed with big data / scale out storage. There are a variety of infrastructure options to deal with the staggering amounts of data, but there are very few purpose built server solutions like HP’s ProLiant SL4500 product – a server solution built SPECIFCIALLY for big data and scale out storage.

The HP ProLiant SL4500 was built to handle your data. Period.

  • It provides an unprecedented drive capacity with over THREE PB in a single rack
  • It delivers scalable performance across multiple drive technologies like SSD, SAS or SATA
  • It provides significant energy savings with shared cooling and power and reduced complexity with fewer cables
  • It offers flexible configurations •A 1-node, 60 large form factor drive configuration, perfect for large scale object storage with software vendors like Cleversafe and Scality, or with open source projects like OpenStack Swift and Ceph
  • A 2-node, 25 drive per node configuration, ideal for running Microsoft Exchange
  • A 3-node, 15 drive per node configuration, optimal for running Hadoop and analytics applications

If you’re serious about big data and scale out storage, it’s time to considering stepping up your game with the SL4500. Purpose-built makes a difference, and the SL4500 was purpose-built to help you make sense of your data.

You can learn more about the SL4500 by talking to your HP rep or by visiting us online at HP ProLiant SL4500 Scalable Systems or at Object Storage Software for ProLiant.

And if you’re here at Hadoop World this week, come on by the HP booth – we’d love to chat about how we can help solve your data challenges with SL4500 based solutions.

Until next time,

Joseph George

@jbgeorge

Day 2: Big Data Innovation Summit 2014 #DataWest14

April 11, 2014 2 comments

.

Levi's StadiumHello again big data fans – from where I’ve learned the San Francisco 49’ers will be playing their 2014 NFL season at Levi’s Stadium… Santa Clara!

(BTW, the stadium – from what I could see – is beautiful!  I’m a big NFL fan, and there’s now another reason to come to the San Jose area, other than all the cloud / big data conferences.)

Got a lot of great feedback on yesterday’s “Day 1” post of the summit, so here are some observations from the final day of the conference.

  • Yahoo’s Duru Ahanotu spoke through driving efficiency in how data teams are organized, going through the permutations of generalists vs specialists and centralized vs de-centralized, and how to best address teams in each model.
    .
  • PayPal’s Moises Nascimento (who is a very captivating speaker) drove the point home, that though we are now adopting many of the new data technologies like Hadoop and NoSQL, most of our existing data sources and toolsets still provide value – so there is value in leveraging ALL data sources.
    .
  • Moises also made a point of highlighting that data manipulation is best handled at the SYSTEM level, while data analysis is better managed at the ENTERPRISE level
    .
  • In HP’s discussion, they introduced the concept of the GEOBYTE – 10^30 bytes, a size of data that the human race is expected to hit in the next few years.

To provide context on the magnitude of a GEOBYTE (10^30 bytes), there is estimated to only be 10^19 GRAINS OF SAND ON THE EARTH.  Think about that for a second.

  • The team also highlighted their view on “Big BI” vs “Big Data”
    • Big BI – same types of analysis but on more data; more batch processing; results that were not easily actionable
    • Big Data – joining datasets that have not been previously joined, near real time analysis, action oriented results
      .
  • I thought Ancestry.com had one of the best sessions of the event, as they went deep into the GERMLINE algorithm that was the foundation of their business technology, and how they had to create jermline (now with a “j”) based on Hadoop / HDFS to create a SCALABLE matching engine.  As we all know, SCALE matters. The performance and speed benchmarks between the “G” project and the “j” project were mindblowing.
    .
  • Finally, sat in on the Netflix session – in addition to being a big fan of Netflix, as both a consumer and a tech observer, I’ve always been impressed with the way Netflix has evolved their business, and continues to do so.  In this session, they went into great detail on their use of the Amazon cloud services, and their open source projects as a layer above to enhance functionality and deploy features.  Topics touched on included red / black deployment to allow ease of features into production, and the importance of graceful degradation, so that a failure can be less of a catastrophic event for the end user.
    .

    • One very telling statement is really a commentary on the value of use and participation in the open source process – Netflix was clear that they see value in being an open source contributor / leader is that it preserves the future of their systems – rather than sitting back and letting the industry decide their direction with tools and tech, Netflix uses open source to help drive and lead the industry to where they see value.
      .
  • (I did resist the urge to ask the Netflix presenter when the next season of “House of Cards” would come out. 🙂 )
    .

One of the frequent questions that came up at the Dell booth was “what is Dell doing in big data?”

The answer?  Actually… quite a bit, and for quite a while.

Between the Dell Apache Hadoop HW+SW+Services Solution, the Toad BI suite, the Kitenga analytics toolsets, and our growing HPC business, Dell has been a part of this movement since its early days.  I’d recommend you drop us a line at Hadoop@Dell.com or visit us at http://www.Dell.com/Hadoop to learn more.

If you were out at the show this week, be sure to leave a comment on your thoughts as well.

Hope everyone has safe trips home, and we’ll see you at the next big data get-together!

Until next time,

JBG
@jbgeorge
BDIS 2014

Day 1: Big Data Innovation Summit 2014

April 10, 2014 Leave a comment

.

Hello from sunny, Santa Clara!BDIS Keynote Day 1

My team and I are here at the BIG DATA INNOVATION SUMMIT representing Dell (the company I work for), and it’s been a great day one.

I just wanted to take a few minutes to jot down some interesting ideas I heard today:

  • In Daniel Austin’s keynote, he addressed that the “Internet of things” should really be the “individual network of things” – highlighting that the number of devices, their connectivity, their availability, and their partitioning is what will be key in the future.
    .
  • One data point that also came out of Daniel’s talk – every person is predicted to generate 20 PETABYTES of data over the course of a lifetime!
    .
  • Juan Lavista of Bing hit on a number of key myths around big data:
    • the most important part of big data is its size
    • to do big data, all you need is Hadoop
    • with big data, theory is no longer needed
    • data scientists are always right 🙂

QUOTE OF THE DAY:  “Correlation does not yield causation.” – Juan Lavista (Bing)

  • Anthony Scriffignano was quick to admonish the audience that “it’s not just about data, it’s not just about the math…  [data] relationships matter.”
    .
  • The state of Utah state government is taking a very progressive view to areas that analytics can help drive efficiency in at that level – census data use, welfare system fraud, etc.  And it appears Utah is taking a leadership position in doing so.

I also had the privilege of moderating a panel on the topic of the convergence between HPC and the big data spaces, with representatives on the panel from Dell (Armando Acosta), Intel (Brent Gorda), and the Texas Advanced Computing Center (Niall Gaffney).  Some great discussion about the connections between the two, plus tech talk on the Lustre plug-in and the SLURM resource management project.

Additionally, Dell product strategists Sanjeet Singh and Joey Jablonski presented on a number of real user implementations of big data and analytics technologies – from university student retention projects to building a true centralized, enterprise data hub.  Extremely informative.

All in all, a great day one!

If you’re out here, stop by and visit us at the Dell booth.  We’ll be showcasing our hadoop and big data solutions, as well as some of the analytics capabilities we offer.

(We’ll also be giving away a Dell tablet on Thursday at 1:30, so be sure to get entered into the drawing early.)

Stay tuned, and I’ll drop another update tomorrow.

Until next time,

JOSEPH
@jbgeorge

Michael Dell Comments on the “Data Economy”

March 24, 2014 Leave a comment

This is a repost of my blog at  .

In this short interview with Inc., Michael Dell provides an overview of the company’s transformation into a leading player in the “data economy.”   

As Michael notes, with the costs of collecting data decreasing, more companies in a growing number of industries are making better use of existing data sources, and gathering data from new sources. 

And that’s where Dell has been enabling customers for years with solutions built with technologies like Hadoop and NoSql.  Helping companies and organizations make better use of this data, and assisting them in using it to solve their challenges, are just a few of the ways Dell has changed the Big Data conversation, and built an entirely new enterprise business along the way.

As a member of the Technology CEO Council, Michael also recently joined other tech CEOs to discuss the data economy with policy makers.  As an example of the potential of the data economy, he explained how Dell’s growing health information technology practice includes 7 billion medical images. These images are in an aggregated data set allowing researchers to mine them for patterns and predictive analytics.

“There’s lots that can be done with this data that was very, very siloed in the past,” Michael toldComputerworld, “We’re really just kind of scratching the surface.”

It’s certainly an exciting time to be at Dell – and the data revolution continues!

Read more…

Rock The [OpenStack] Vote!

August 17, 2013 Leave a comment

.

Well its that time of the year again!OpenStack

(I guess that’s fairly a open ended statement – I could be talking about the beginning of the school year, the start of football season, or the summer solstice.)

I’m talking about getting your votes in for sessions at the OpenStack Summit coming up in Hong Kong this November!

If you’re a member of the OpenStack community, you should have received a note this past week requesting your help to select which sessions should be represented at the Design Summit and User Conference this fall.

Now, let me be clear – this should not be a popularity contest on presenters (like me) or vendors (like Dell, the company I work for), but rather where you see need for certain experts to discuss a topic that is important to the OpenStack development community or to the OpenStack user community. 

Yours truly has submitted a few sessions as well for your consideration – check it out:

  • Remain Calm and Deploy On! (or How the Crowbar Community Is Innovating for Success with OpenStack)
      
    In this session, I’m planning to highlight the importance of deployment technologies in implementing OpenStack as a cloud option, and how we’ve approached it by developing our own open source project, Crowbar.  I’ll be joined by Crowbar community contributors Intel (who are working on Crowbar capabilities for Intel Hadoop and Intel TXT security) and SUSE (who have incorporated a SUSE skinned version of Crowbar into their SUSE Cloud product).  I expect it will be a great interactive session with the goal of educating the audience on how Crowbar can enable them to get going faster with OpenStack.
      
  • Enterprise Hypervisors: How Three Companies Are Making OpenStack with Hyper-V a Reality
      
    Earlier this year, we announced Dell taking an active role in bringing true Hyper-V hypervisor support to OpenStack.  To provide an update on progress there, I’m proposing a topic to present jointly with peers at SUSE, who we’ve partnered with on the Dell SUSE Cloud Solution, powered by OpenStack, and Cloudbase, who have been pioneers in Hyper-V enablement in OpenStack, to talk through how customers can implement a Hyper-V based OpenStack solution using technology from all three companies.  There has been solid work to date, including Crowbar integration, so I expect this will be a lively one!
      
  • Build in OpenStack Security with Crowbar and Intel TXT
      
    I can’t tell you how excited I am about how the Crowbar project has evolved over the years.  It started as an answer to the problem of “how do I deploy OpenStack on bare metal?” but has now emerged as a broad software platform for innovation covering cloud, hadoop, and other use cases.  One telltale sign of progress to me is how others are leveraging Crowbar, and cloud security is definitely an interesting area.  This session is one where I’ll present with my friends at Intel to talk through how Intel has developed Crowbar functionality for their Intel TXT secure resource pool solution.  Expect a lot of Q&A on this one.
      

And that’s it!

Appreciate you voting with the community’s best interest in mind!

And you can learn more about the coming OpenStack Summit here – http://www.openstack.org/summit/openstack-summit-hong-kong-2013/ 

Until next time!

JBG
@jbgeorge

NOW HIRING: Cloud and Big Data Solution Marketing Rockstars

June 13, 2013 1 comment

.

As Dell (the company that I work for) continues to service customers in all facets of OpenStack and Hadoop implementations, we are beginning another season of growth on the Revolutionary Solutions team.

The Dell Revolutionary Solutions Team delivers the Dell OpenStack-Powered Cloud Solution and the Dell Apache Hadoop Solution, leads the Crowbar open source project, and manages the Emerging Solutions Ecosystem Partner Program that includes a number of key partners such as Suse, Inktank, Cloudera, Datameer, and Pentaho.

   

We are looking for a variety of engineers and presales teams, but I will focus on the product management and marketing roles in this post.

  • Now Hiring!Technical Product Managers – Product managers to be technical SMEs (roadmaps, requirements, etc) on partner products in the cloud and big data spaces, most notably OpenStack and Hadoop, but could also be focused on other emerging solutions spaces – Link to Job Posting
       
  • Product Marketing Managers – Marketing experts to own and lead go-to-market strategy and deliverables in the cloud and big data spaces (marketing strategy, sales enablement, etc).  Again, this would certainly cover our OpenStack and Hadoop solutions today, but could also focus on future emerging solutions spaces.  – Link to Job Posting
      
  • Open Source Community Manager / Evangelist – Community oriented professionals with strong networks, strong social media presence, and an ability to bring collaborators and customers together to work on common goals – Link to Job Posting
      
  • Marketing Directors – Experienced people managers to drive business objectives, product vision, and go-to-market strategy, specifically in the areas of Product Management and Product Marketing – Link to Job Posting

  

In our experience, the best candidates

  • have a track record of ownership
  • have a techincal background
  • are experienced in their discipline
  • are participants in cloud, big data, virtualization, or similar emerging technologies

  

Pass it on to a friend or apply yourself – I look forward to hearing from you!

Until next time,

JBGeorge
@jbgeorge