Archive
Highlights from the 2012 Hadoop World
.
Had a great time at last week’s Hadoop World, so wanted to write up a few of my thoughts from the event.
- This year’s Hadoop World was the best attended to date – I believe I heard the attendee number to be at 2500 vs 1400 last year! It’s great to see this kind of growth among the community considering there were only 500 attendees just four years ago.
- In some similarities to what I’m seeing in the OpenStack community, this conference seemed to boast more from the “user” ranks as opposed to just developers as in the recent past. It speaks volumes to the general adoption that Hadoop is seeing in the market.
- Dell, the company I work for, and our Ecosystem Partner Datameer hosted a networking event for a number of folks at Hadoop World at the prestigious Circo NYC restaurant – great food and a great time with some innovative Hadoop implementers. Got to really get indepth how real people are implementing Hadoop in their enviornments today. Appreciate those that took the time out to attend, and for those who missed out, see you next time!
- Cloudera announced their beta project called “Impala”, which allows users to perform real-time queries of their data, a feature that a number of Hadoop users have been anticipating. According to Cloudera, Impala can process queries up to 30 times faster than Hive / MapReduce – very cool, and I look forward to checking it out.
- Finally, Dell made an announcement about our donation of “Zinc”, an ARM-based server concept to the Apache Software Foundation, with support from our partner, Calxeda, where we see ARM infrastructures as an interesting technology for Hadoop environments. The donation includes hosting and technical support for the Apache community. and we’re hosting the server concept at an Austin-based co-location. The Apache Hadoop project has actually performed more than a dozen builds within the first 24 hours of the servers’ deployment. (You can check out the full press release here to learn more.)
All in all, Hadoop World is another hit! It was a great event overall and I look forward to next year’s conference.
To learn more about the Dell Apache Hadoop Solution and more about what Dell is doing in this space, visit us at www.Dell.com/Hadoop.
And if you want to chat about how Dell can help you with your Hadoop initiative, drop me an email at Hadoop@Dell.com.
Until next time,
JOSEPH
@jbgeorge
Dell @ Hadoop World 2012: Experts, Solutions, and Networking Event

- Date: Tuesday, October 23, 2012
- Time: 6:30 – 8:30 p.m. EST
- Place: Circo NYC, 120 W. 55th Street, New York, NY 10019, (212) 265-3636, circonyc.com
- Circo: offers upscale Italian fare built upon a foundation of signature Tuscan recipes from the kitchen of Maccioni matriarch Egidiana and prepared by Executive Chef Michael Galata. The menu is served in a lively, sophisticated setting reminiscent of the old-style European circus tents which inspired the restaurant’s name.
If you’re interested in joining us, be sure to RSVP with Dianna Doan (ddoan@datameer.com) ASAP. There are only a few spots left, so be sure to RSVP now.
I’ll be there, so I hope to see you too.
Looking forward to a great week!
Until next time,
JOSEPH
@jbgeorge
Highlights from the Open Source Business Conference 2012
.
Last week I had the pleasure to head (back) to San Francisco to spend a few days with other open source believers at this year’s Open Source Business Conference. I was there on behalf of Dell, the company I work for.
Here are some of my thoughts from the sessions / keynotes I sat in on this past week.
- Jim Whitehurst of Red Hat spoke at a keynote and highlighted how the innovation that will be built on IaaS is where the revolution will reside, and that the role vendors will play in this new open source friendly enterprise will focus more on support and services.
- There was a great open source panel with personnel from Yahoo, Warner Music, Blackduck, Acquia, and NorthBridge that talked through real use cases at Yahoo and Warner, plus feedback on their annual open source survey which talked through the rise of open source adoption in the enterprise, how quality and cost is driving that, and how many companies are viewing open source software as a starting point for projects now, rather than an alternative option.
- HP’s Biri Singh talked through their cloud strategy including their tiered strategy of Iaas + ecosystem + marketplace. Turns out they’re using quite a bit of open source as they are building out their public cloud with focus on web services at scale.
- A panel on “Amazon vs the world”, panelists from Canonical , Eucalyptus, and Citrix talked about open private cloud with the backdrop of Amazon’s dominance as a public cloud provider. AWS API compatibility came up a lot, as well as the need to productize open source technologies more. Some opportunities that were highlighted included the need to have vendors who know more than just software, but also the “wiring” of actual working systems, and the importance of staying open as we are just starting to see adoption by the enterprise.
- CloudScaling hosted a great session on why open cloud is winning – how internet companies drove cloud technologies and how they were built with open source, the differences between the “Enterprise IT cloud” and the “Next Gen IT cloud”, and how “no lock-in” + flexibility + scale are the key tenets of open cloud.
Obviously there was a lot more at the event that I was not able to get to – You can check out a few of the presentation slides at https://www.eiseverywhere.com/ehome/31601/50199/?&
If you were out there last week, be sure to leave a comment with your thoughts.
I enjoyed the few days out there – looking forward to the next open source event – likely in San Fran again. 🙂
Until next time,
JBGeorge
@jbgeorge
Play Ball! Hadoop Players Sponsor Big Data Event in Chicago
.
.
What does data analytics have to do with baseball????
Well actually, quite a bit. Moneyball anyone?
(If you haven’t seen it, I highly recommend it. A true story adaption about Billy Beane and the Oakland A’s using intense number crunching to build a solid baseball team in a smaller market, competing with bigger markets – and bigger salaries.)
The Technology
Last week, I had the pleasure of representing Dell (the company I work for), as we joined Intel, Cloudera, and Clarity to meet with a number of customers at the Ivy League Baseball Club across from Wrigley Field, right before the Cubs – Cardinals game. It was great to talk to customers who were using Hadoop, as well as those that were just learning about the technology.
The presentation delivered by all four companies focused on the Dell Apache Hadoop Solution, a powerful packaged solution that features
-
A reference architecture featuring Intel technology
-
A set of software which includes Cloudera’s CDH distribution (with option to upgrade to Cloudera Enterprise), along with Dell’s innovative Crowbar software framework to enable easy provisioing and management
-
Services provided by a combination of Dell, Cloudera, and Clarity, to provide our customers with deployment, support, and consulting services
.
The Experience
Even more impactful than the presentation was the more 1:1 time after the presentation, where many users and newbies shared stories, experiences, best practices, etc. Got to hear about a lot of the struggles around “going it alone”, and enthusiasm that Dell and our partners were delivering a solution that would make that a bit simpler.
Here’s a sampling of some of the topics that came up.
Why should I care about big data / hadoop?
Here’s the thing: you have data. It’s in your sales tracking system, from your website traffic, from your social media outlets, in your customer support databases, and more. And not only do you have data, you have A LOT of data. But here’s the power of data. Your company has strategic objectives, customer strategies, and product plans. Data gives you insight into how to best spend your resources, where to focus your product development, where your customers are buying your products, and what problems they are encountering. This enables your business to make intelligent decisions to better satisfy your customers.
I already have a data warehousing solution – what’s the benefit of hadoop?
Many analytics solutions today require data to be in a format that adheres to the standards of a relational database (aka structured data). This is fine for data that conforms to this format. However, a lot of the new data that is available to us is not formatted in that manner – this is referred to as unstructured data. Unstructured data includes data types, such as audio, video, graphics, log files, etc. Hadoop as a technology handles unstructured data very well, allowing for analysis of those types of data. Additionally, a number of the traditional enterprise level analytics solutions are building hadoop connectors to allow for hadoop processed data to be utilized by the enterprise tool set. Finally, as data scales, using an open source based technology like Hadoop makes things very cost efficient.
How does the Dell Apache Hadoop Solution help me with hadoop?
Before this solution was made available, many of our Dell customers came to us asking, “If Dell was going to build a hadoop solution, how would you design it?” And this was how we started down the path of hadoop. What we discovered was many customers had pockets of hadoop projects in their companies, but progress was at a crawl. Many of the issues were around infrastructure design, deployment, and overall general help around the technology. And that is the basis for the Dell Apache Hadoop Solution – making hadoop accessible, quick, and simple to deploy from bare metal and get to a functional hadoop cluster asap. We’ve enabled many of these customers to go from a science experiment to a productive Hadoop instance very quickly, and provide them the consulting and education they need to maximize its benefit.
You can learn more about what Dell is doing with Hadoop at www.Dell.com/Hadoop or you can drop me an email at Hadoop@Dell.com.
The Game
For those of you not interested in sports, you can now tune your TV’s off – about to talk baseball for a bit.
As far as the game went, it was a doozy. I have ties to Chicago, so I was rooting for the Cubs.
- The Cubs were up 1-0 most of the game until the top of the 8th when Cardinal Matt Holliday knocked out a 2 run homer
- Trailing in the bottom of the 9th, Cubs first baseman Bryan Lahair hit a homer to tie it up 2-2, and take us into extra innings
- Here’s where the fireworks really began!
- Bottom of the 10th
- Cubs LF Tony Campana gets on base with a single
- Campana then tries to steal 2nd and barely makes it
- Cardinals coach Matt Matheny did not agree and made a federal case out of it with the 2nd base umpire
- And out goes Matheny – ejected!
- Cardinals walked Lahair
- With two men on base, Cubs LF Alfonso Soriano gets a single and drives Campana home for the 3-2 win!
- Prior to this, the Cardinals had beaten the Cubs in the LAST THIRTEEN SERIES between the two clubs. With this win, that streak has been broken.
Great game, great crowd, great partners! Thanks to everyone who came out. I look forward to the next one. 🙂
Until next time,
JBGeorge
@jbgeorge
2012: A year of Cloud Coalescence (whatever that means)
This post is a collaboration between three Dell Cloud activists: Rob Hirschfeld (@zehicle), Joseph B George (@jbgeorge) and Stephen Spector (@SpectoratDell).
We’re not making predictions for the “whole” Cloud market, this is a relatively narrow perspective based on technologies that on our daily radar. These views are strictly our own and based on publicly available data. They do not reflect plans, commitments, or internal data from our employer (Dell).
The major 2012 theme is cloud coalescence. However, Rob worries that we’ll see slower adoption due to lack of engineers and confusing names/concepts.
Here are our twelve items for 2012:
- Open sourcecontinues to be a disruptive technology delivery model. It’s not “free” software – there’s an emerging IT culture that is doing business differently, including a number of large enterprises. The stable of sleeping giant vendors are waking up to this in 2012 but full engagement will take time.
- Linux. It is the cloud operating system and had a great 2012. It seems silly pointing this out since it seems obvious, but it’s the foundation for open source acceleration.
- Tight market for engineering and product development talent will get tighter. The catch-22 of this is that potential mentors are busy breaking new ground and writing code, making it hard for new experts to be developed.
- On track, OpenStack moves into its awkward adolescence. It is still gangly and rebelling against authority, but coming into its own. Expect to see a groundswell of installations and an expected wave of issues and challenges that will drive the community. By the “F” release, expect to see OpenStack cement itself as a serious, stable contender with notable public deployments and a significant international private deployment foot print.
- We’ll start seeing OpenStack Quantum (networking) in near-production pilots by year end.OpenStack Quantum is the glue that holds the big players in OpenStack Nova together. The potential for next generation cloud networking based on open standards is huge, but it will emerge without a killer app (OpenStack Nova in this case) pushing it forward. The OpenStack community will pull together to keep Quantum on track.
- Hadoop will cross into mainstream awareness as the need for big data analysis grows exponentially along with the data. Hadoop is on fire in select circles and completely obscure in others. The challenge for Hadoop is there are not enough engineers who know how to operate it. We suspect that lack of expertise will throttle demand until we get more proprietary tools to simplify analysis. We also predict a lot of very rich entrepreneurs and VCs emerging from this market segment.
- DevOps will enter mainstream IT discussions. Marketers from major IT brands will struggle and fail to find a better name for the movement. Our prediction is that by 2015, it will just be the way that “IT” is done and the name won’t matter.
- KVM continues to gain believers as the open source hypervisor. In 2011, I would not have believed this prediction but KVM making great strides and getting a lot of love from the OpenStack community, though Xen is also a key open source technology as well. I believe that Libvirt compatibility between LXE & KVM will further accelerate both virtualization approaches.
Big Data and NoSQL will continue to converge. While NoSQL enthusiasm as a universal replacement for structured databases appears to be deflating, real applications will win.
- Java will continue to encounter turbulenceas a software platform under Oracle’s overly heady handed management.
- PaaS continues to be a confusing term. Cloud players will struggle with a definition but I don’t think a common definition will surface in 2012. I think the big news will be convergence between DevOps and PaaS; however, that will be under the radar since most of the market is still getting educated on both of those concepts.
- Hybrid cloud will continue to make strides but will not truly emerge in 2012 – we’ll try to develop this technology, and expose gaps that will get us there ultimately (see PaaS and Quantum above)
Thoughts? We’d love to hear your comments.
Rob, JBG, and Stephen
You can follow Rob at www.RobHirschfeld.com or @zehicle on Twitter.
You can follow Joseph at www.JBGeorge.net or @jbgeorge on Twitter.
You can follow Stephen at http://en.community.dell.com/members/dell_2d00_stephen-sp/blogs/default.aspx or @SpectoratDell on Twitter.
THIS JUST IN: VMware and Dell Partner to Enable Cloud Foundry via #Crowbar
.
And the goodness just keeps on coming!
A few weeks ago, Dell (they company I work for) unleashed the power of the Dell developed, open source Crowbar software framework as a part of the announcement of the Dell OpenStack Cloud Solution. It allowed users to deploy a full OpenStack IaaS cloud on bare metal PowerEdge C servers in less than two hours (vs multiple days if done manually), and allows for a continuous integration mechanism for the stood up cloud.
A week later, we announced the Dell | Cloudera Solution for Apache Hadoop, which also leverages the powerful Crowbar software to deploy a running Hadoop cluster on to bare metal PowerEdge C servers in less than a day, where it can take days or even weeks if deployed by other means.
So….
Infrastructure as a Service (IaaS)? Check.
Hadoop / Big Data? Check.
But what about Platform as a Service (PaaS)?
Big time check.
Today, VMware is announcing their development of a Cloud Foundry barclamp for Dell’s Crowbar software!
VMware’s Cloud Foundry is an open platform as a service (PaaS) project initiated by VMware designed to support multiple frameworks, multiple cloud providers, and multiple application services all on a cloud scale platform. It’s a project that is only a few months old, but one that has been getting outstanding interest from enterprises who want PaaS to be the new developer UI to their private cloud. And Cloud Foundry is already powering real solutions.
And now it can be deployed quickly, simply, and automated with Dell’s Crowbar software.
Crowbar, software that leverages OpsCode’s Chef configuration management tool, allows users to get up and running on powerful technologies like Cloud Foundry, but actually does much more. It actually does BIOS configuration, RAID configuration, network discovery, deploys Nagios and Ganglia, and more to enable an environment ideal for complex technologies. It is also aware of changes in its environment, and adjusts to them in an automated manner. (Learn more about Crowbar here.)
This is another win for open source in my book, and a real indicator of the impact open source is going to have on the next era of IT.
So who’s the next Crowbar barclamp rockstar?
You tell me.
You. Crowbar. Download. Build barclamp. Share.
I’d love to be telling your story here next. 🙂
More info:
- The Cloud Foundry Blog – http://blog.cloudfoundry.com
- Get open source Crowbar – www.github.com/DellCloudEdge
- Rob Hirschfeld’s Blog (Dell)
- Barton George’s Blog (Dell)
Until next time,
JBGeorge
@jbgeorge