This is a duplicate of a blog I authored for SUSE, originally published at the SUSE Blog Site.
Experts predict that our world will generate 44 ZETTABYTES of digital data by 2020.
How about some context?
Now, you may think that these are all teenage selfies and funny cat videos – in actuality, much of it is legitimate data your company will need to stay competitive and to serve your customers.
The Data Explosion Happening in YOUR Industry
Some interesting factoids:
- An automated manufacturing facility can generate many terabytes of data in a single hour.
- In the airline industry, a commercial airplane can generate upwards of 40 TB of data per hour.
- Mining and drilling companies can gather multiple terabytes of data per minute in their day-to-day operations.
- In the retail world, a single store can collect many TB of customer data, financial data, and inventory data.
- Hospitals quickly generate terabytes of data on patient health, medical equipment data, and patient x-rays.
The list goes on and on. Service providers, telecommunications, digital media, law enforcement, energy companies, HPC research groups, governments, the financial world, and many other industries (including yours) are experiencing this data deluge now.
And with terabytes of data being generated by single products by the hour or by the minute, the next stop is coming up quick: PETABYTES OF DATA.
Status Quo Doesn’t Cut It
I know what you’re thinking: “What’s the problem? I‘ve had a storage solution in place for years. It should be fine.”
- You are going to need to deal with a LOT more data than you are storing today in order to maintain your competitive edge.
- The storage solutions you’ve been using for years have likely not been designed to handle this unfathomable amount of data.
- The costs of merely “adding more” of your current storage solutions to deal with this amount of data can be extremely expensive.
The good news is that there is a way to store data at this scale with better performance at a much better price point.
Open Source Scale Out Storage
Why is this route better?
- It was designed from the ground up for scale.
Much like how mobile devices changed the way we communicate / interact / take pictures / trade stock, scale out storage is different design for storage. Instead of all-in-one storage boxes, it uses a “distributed model” – farming out the storage to as many servers / hard drives as it has access to, making it very scalable and very performant. (Cloud environments leverage a very similar model for computing.)
- It’s cost is primarily commodity servers with hard drives and software.
Traditional storage solutions are expensive to scale in capacity or performance. Instead of expensive engineered black boxes, we are looking at commodity servers and a bit of software that sits on each server – you then just add a “software + server” combo as you need to scale.
- When you go open source, the software benefits get even better.
Much like other open source technologies, like Linux operating systems, open source scale out storage allows users to take advantage of rapid innovation from the developer communities, as well as cost benefits which are primarily support or services, as opposed to software license fees.
Ready. Set. Go.
At SUSE, we’ve put this together in an offering called SUSE Enterprise Storage, an intelligent software-defined storage management solution, powered by the open source Ceph project.
It delivers what we’ve talked about: open source scale out storage. It scales, it performs, and it’s open source – a great solution to manage all that data that’s coming your way, that will scale as your data needs grow.
And with SUSE behind you, you’ll get full services and support to any level you need.
OK, enough talk – it’s time for you to get started.
And here’s a great way to kick this off: Go get your FREE TRIAL of SUSE Enterprise Storage. Just click this link, and you’ll be directed to the site (note you’ll be prompted to do a quick registration.) It will give you quick access to the scale out storage tech we’ve talked about, and you can begin your transition over to the new evolution of storage technology.
Until next time,
This is a joint blog I did with John Furrier of SiliconAngle / theCube and Leo Leung from Scality, originally published at http://bit.ly/1E6nQuR
Data is an interesting concept.
During a recent CrowdChat a number of us started talking about server based storage, big data, etc., and the topic quickly developed into a forum on data and its inherent qualities. The discussion led us to realize that data actually has a number of attributes that clearly define it – similar to how a vector has both a direction and magnitude.
Several of the attributes we uncovered as we delved into this notion of data as a vector include:
- Data Gravity: This was a concept developed by my friend, Dave McCrory, a few years ago, and it is a burgeoning area of study today. The idea is that as data is accumulated, additional services and applications are attracted to this data – similar to how a planet’s gravitational pull attracts objects to it. An example would be the number 10. If you the “years old” context is “attracted” to that original data point, it adds a certain meaning to it. If the “who” context is applied to a dog vs. a human being, it takes on additional meaning.
- Relative Location with Similar Data: You could argue that this is related to data gravity, but I see it as more of a poignant a point that bears calling out. At a Hadoop World conference many years ago, I heard Tim O’Reilly make the comment that our data is most meaningful when it’s around other data. A good example of this is medical data. Health information of a single individual (one person) may lead to some insights, but when placed together with data from a members of a family, co-workers on a job location, or the citizens of a town, you are able to draw meaningful conclusions. When grouped with other data, individual pieces of data take on more meaning.
- Time: This came up when someone posed the question “does anyone delete data anymore?” With the storage costs at scale becoming more and more affordable, we concluded that there is no longer an economic need to delete data (though there may be regulatory reasons to do so). Then came the question of determining what data was not valuable enough to keep, which led to the epiphany that data that might be viewed as not valuable today, may become significantly valuable tomorrow. Medical information is a good example here as well – capturing the data that certain individuals in the 1800’s were plagued with a specific medical condition may not seem meaningful at the time, until you’ve tracked data on specific descendants of his family being plagued by similar ills over the next few centuries. It is difficult to quantify the value of specific data at the time of its creation.
In discussing this with my colleagues, it became very clear how early we are in the evolution of data / big data / software defined storage. With so many angles yet to be discussed and discovered, the possibilities are endless.
This is why it is critical that you start your own journey to salvage the critical insights your data offers. It can help you drive efficiency in product development, it can help you better serve you constituents, and it can help you solve seemingly unsolvable problems. Technologies like object storage, cloud based storage, Hadoop, and more are allowing us to learn from our data in ways we couldn’t imagine 10 years ago.
And there’s a lot happening today – it’s not science fiction. In fact, we are seeing customers implement these technologies and make a turn for the better – figuring out how to treat more patients, enabling student researchers to share data across geographic boundaries, moving media companies to stream content across the web, and allowing financial institutions to detect fraud when it happens. Though the technologies may be considered “emerging,” the results are very, very real.
Over the next few months, we’ll discuss specific examples of how customers are making this work in their environments, tips on implementing these innovative technologies, some unique innovations that we’ve developed in both server hardware and open source software, and maybe even some best practices that we’ve developed after deploying so many of these big data solutions.
Until next time,
Joseph George – @jbgeorge
Director, HP Servers
Leo Leung – @lleung
John Furrier – @furrier
Founder of SiliconANGLE Media
Cohost of @theCUBE
CEO of CrowdChat
This is a duplicate of the blog I’ve authored on the HP blog site at http://h30507.www3.hp.com/t5/Hyperscale-Computing-Blog/The-HP-Big-Data-Reference-Architecture-It-s-Worth-Taking-a/ba-p/179502#.VMfTrrHnb4Z
I recently posted a blog on the value that purpose-built products and solutions bring to the table, specifically around the HP ProLiant SL4540 and how it really steps up your game when it comes to big data, object storage, and other server based storage instances.
Last month, at the Discover event in Barcelona, we announced the revolutionary HP Big Data Reference Architecture – a major step forward in how we, as a community of users, do Hadoop and big data – and it is a stellar example of how purpose-built solutions can revolutionize how you accelerate IT technology, like big data. We’re proud that HP is leading the way in driving this new model of innovation, with the support and partnership of the leading voices in Hadoop today.
Here’s the quick version on what the HP Big Data Reference Architecture is all about:
Think about all the Hadoop clusters you’ve implemented in your environment – they could be pilot or production clusters, hosted by developer or business teams, and hosting a variety of applications. If you’re following standard Hadoop guidance, each instance is most likely a set of general purpose server nodes with local storage.
For example, your IT group may be running a 10 node Hadoop pilot on servers with local drives, your marketing team may have a 25 node Hadoop production cluster monitoring social media on similar servers with local drives, and perhaps similar for the web team tracking logs, the support team tracking customer cases, and sales projecting pipeline – each with their own set of compute + local storage instances.
There’s nothing wrong with that set up – It’s the standard configuration that most people use. And it works well.
Just imagine if we made a few tweaks to that architecture.
- What if we replaced the good-enough general purpose nodes, and replaced them with purpose-built nodes?
- For compute, what if we used HP Moonshot, which is purpose-built for maximum compute density and price performance?
- For storage, what if we used HP ProLiant SL4540, which is purpose-built for dense storage capacity, able to get over 3PB of capacity in a single rack?
- What if we took all the individual silos of storage, and aggregated them into a single volume using the purpose-built SL4540? This way all the individual compute nodes would be pinging a single volume of storage.
- And what if we ensured we were using some of the newer high speed Ethernet networking to interconnect the nodes?
Well, we did.
And the results are astounding.
While there is a very apparent cost benefit and easier management, there is a surprising bump in performance in terms of read and write.
It was a surprise to us in the labs, but we have validated it in a variety of test cases. It works, and it’s a big deal.
And Hadoop industry leaders agree.
“Apache Hadoop is evolving and it is important that the user and developer communities are included in how the IT infrastructure landscape is changing. As the leader in driving innovation of the Hadoop platform across the industry, Cloudera is working with and across the technology industry to enable organizations to derive business value from all of their data. We continue to extend our partnership with HP to provide our customers with an array of platform options for their enterprise data hub deployments. Customers today can choose to run Cloudera on several HP solutions, including the ultra-dense HP Moonshot, purpose-built HP ProLiant SL4540, and work-horse HP Proliant DL servers. Together, Cloudera and HP are collaborating on enabling customers to run Cloudera on the HP Big Data architecture, which will provide even more choice to organizations and allow them the flexibility to deploy an enterprise data hub on both traditional and newer infrastructure solutions.” – Tim Stevens, VP Business and Corporate Development, Cloudera
“We are pleased to work closely with HP to enable our joint customers’ journey towards their data lake with the HP Big Data Architecture. Through joint engineering with HP and our work within the Apache Hadoop community, HP customers will be able to take advantage of the latest innovations from the Hadoop community and the additional infrastructure flexibility and optimization of the HP Big Data Architecture.” – Mitch Ferguson, VP Corporate Business Development, Hortonworks
And this is just a sample of what HP is doing to think about “what’s next” when it comes to your IT architecture, Hadoop, and broader big data. There’s more that we’re working on to make your IT run better, and to lead the communities to improved experience with data.
If you’re just now considering a Hadoop implementation or if you’re deep into your journey with Hadoop, you really need to check into this, so here’s what you can do:
- my pal, Greg Battas posted on the new architecture and goes technically deep into it, so give his blog a read to learn more about the details.
- Hortonworks has also weighed in with their own blog.
If you’d like to learn more, you can check out the new published reference architectures that follow this design featuring HP Moonshot and ProLiant SL4540:
- HP Big Data Reference Architecture: Cloudera Enterprise reference architecture implementation
- HP Big Data Reference Architecture: Hortonworks Data Platform reference architecture implementation
If you’re looking for even more information, reach out to your HP rep and mention the HP Big Data Reference Architecture. They can connect you with the right folks to have a deeper conversation on what’s new and innovative with HP, Hadoop, and big data. And, the fun is just getting started – stay tuned for more!
Until next time,
This is a duplicate of the blog I’ve authored on the HP blog site at http://h30507.www3.hp.com/t5/Hyperscale-Computing-Blog/Purpose-Built-Solutions-Make-a-Big-Difference-In-Extracting-Data/ba-p/173222#.VEUdYrEo70c
Indulge me as I flash back to the summer of 2012 at the Aquatics Center in London, England – it’s the Summer Olympics, where some of the world’s top swimmers, representing a host of nations, are about to kick off the Men’s 100m Freestyle swimming competition. The starter gun fires, and the athletes give it their all in a heated head to head match for the gold.
And the results of the race are astounding: USA’s Nathan Adrian took the gold medal with a time of 47.52 seconds, with Australia’s James Magnussen finishing a mere 0.01 seconds later to claim the silver medal! It was an incredible display of competition, and a real testament to power of the human spirit.
For an event demanding such precise timing, we can only assume that very sensitive and highly calibrated measuring devices were used to capture accurate results. And it’s a good thing they did – fractions of a second separated first and second place.
Now, you and I have both measured time before – we’ve checked our watches to see how long it has been since the workday started, we’ve used our cell phones to see how long we’ve been on the phone, and so on. It got the job done. Surely the Olympic judges at the 2012 Men’s 100m Freestyle had some of these less precise options available – why didn’t they just simply huddle around one of their wrist watches to determine the winner of the gold, silver and bronze?
OK, I am clearly taking this analogy to a silly extent to make a point.
When you get serious about something, you have to step up your game and secure the tools you need to ensure the job gets done properly.
There is a real science behind using purpose-built tools to solve complex challenges, and the same is true with IT challenges, such as those addressed with big data / scale out storage. There are a variety of infrastructure options to deal with the staggering amounts of data, but there are very few purpose built server solutions like HP’s ProLiant SL4500 product – a server solution built SPECIFCIALLY for big data and scale out storage.
The HP ProLiant SL4500 was built to handle your data. Period.
- It provides an unprecedented drive capacity with over THREE PB in a single rack
- It delivers scalable performance across multiple drive technologies like SSD, SAS or SATA
- It provides significant energy savings with shared cooling and power and reduced complexity with fewer cables
- It offers flexible configurations •A 1-node, 60 large form factor drive configuration, perfect for large scale object storage with software vendors like Cleversafe and Scality, or with open source projects like OpenStack Swift and Ceph
- A 2-node, 25 drive per node configuration, ideal for running Microsoft Exchange
- A 3-node, 15 drive per node configuration, optimal for running Hadoop and analytics applications
If you’re serious about big data and scale out storage, it’s time to considering stepping up your game with the SL4500. Purpose-built makes a difference, and the SL4500 was purpose-built to help you make sense of your data.
And if you’re here at Hadoop World this week, come on by the HP booth – we’d love to chat about how we can help solve your data challenges with SL4500 based solutions.
Until next time,
Hello from sunny, Santa Clara!
My team and I are here at the BIG DATA INNOVATION SUMMIT representing Dell (the company I work for), and it’s been a great day one.
I just wanted to take a few minutes to jot down some interesting ideas I heard today:
- In Daniel Austin’s keynote, he addressed that the “Internet of things” should really be the “individual network of things” – highlighting that the number of devices, their connectivity, their availability, and their partitioning is what will be key in the future.
- One data point that also came out of Daniel’s talk – every person is predicted to generate 20 PETABYTES of data over the course of a lifetime!
- Juan Lavista of Bing hit on a number of key myths around big data:
- the most important part of big data is its size
- to do big data, all you need is Hadoop
- with big data, theory is no longer needed
- data scientists are always right 🙂
QUOTE OF THE DAY: “Correlation does not yield causation.” – Juan Lavista (Bing)
- Anthony Scriffignano was quick to admonish the audience that “it’s not just about data, it’s not just about the math… [data] relationships matter.”
- The state of Utah state government is taking a very progressive view to areas that analytics can help drive efficiency in at that level – census data use, welfare system fraud, etc. And it appears Utah is taking a leadership position in doing so.
I also had the privilege of moderating a panel on the topic of the convergence between HPC and the big data spaces, with representatives on the panel from Dell (Armando Acosta), Intel (Brent Gorda), and the Texas Advanced Computing Center (Niall Gaffney). Some great discussion about the connections between the two, plus tech talk on the Lustre plug-in and the SLURM resource management project.
Additionally, Dell product strategists Sanjeet Singh and Joey Jablonski presented on a number of real user implementations of big data and analytics technologies – from university student retention projects to building a true centralized, enterprise data hub. Extremely informative.
All in all, a great day one!
If you’re out here, stop by and visit us at the Dell booth. We’ll be showcasing our hadoop and big data solutions, as well as some of the analytics capabilities we offer.
(We’ll also be giving away a Dell tablet on Thursday at 1:30, so be sure to get entered into the drawing early.)
Stay tuned, and I’ll drop another update tomorrow.
Until next time,
Two pieces of important cloud news coming out of Dell (the company I work for) today:
- Dell to Enable Hyper-V and Windows Server 2012 Support for OpenStack
Dell today announced it will enable Microsoft Windows Server Hyper-V as a viable hypervisor choice for the OpenStack cloud platform. This development, which is the first instance of a leading technology vendor enabling Windows Server Hyper-V hypervisor on OpenStack for private clouds, will give customers additional flexibility and choice to run OpenStack workloads within their existing Windows Server environments.
More info here.
- Dell to Deliver Public Cloud through Partner Ecosystem
Dell is launching the Dell Cloud Partner Program to deliver public cloud Infrastructure as a Service (IaaS) through an ecosystem of partners. Acting as a single-source supplier, Dell will offer customers a choice of vendors and technology, freedom from lock-in to a single platform or pricing model and a central point of solution integration and control. Sales of Dell’s current in-house multi-tenant public cloud IaaS will be discontinued in the U.S. in favor of best-in-class partner offerings.
More info here.
Both of these announcements highlight key tenets for our cloud strategy.
- First and foremost, our customers come first when it comes to new products, solutions, and services – they are the most important element in enabling the vendor lanscape understand where the priorities should be.
- We will continue to collaborate and learn from our customers as we develop products, solutions, and services in the cloud space. It is part of our DNA – plain and simple.
- Though our tactics evolve, our commitment to enabling our customers with cloud technology remains constant. Customers are finding success in cloud when partnering with Dell, and we’ll continue to keep that our mission.
- Specifically with OpenStack, our commitment to enabling the innovative cloud technology and community remains as solid as ever. We have been vocal advocates since Day 1 (something no other hardware solutions vendor can claim), and we have no intentions of slowing down.
From the Data Center Solutions (DCS) team, who have built and enabled some of the biggest public clouds in the world, to the Dell Revolutionary Solutions Team, who have been early entrants into the OpenStack and Hadoop spaces, and to Dell’s open source community in Crowbar, we’ve proven our commitment to innovation and customer success.
That is what we are about.
And don’t expect that to ever change.
Until next time,