data | The JBGeorge Tech Blog

Tech in Real Life: Content Delivery Networks, Big Data Servers and Object Storage

April 6, 2015 jbgeorge Leave a comment

This is a duplicate of a blog I authored for HP, originally published at hp.nu/Lg3KF.

In a joint blog authored with theCube’s John Furrier and Scality’s Leo Leung, we pointed out some of the unique characteristics of data that make it act and look like a vector.

At that time, I promised we’d delve into specific customer uses for data and emerging data technologies – so let’s begin with our friends in the telecommunications and media industries, specifically around the topic of content distribution.

But let’s start at a familiar point for many of us…

If you’re like most people, when it comes to TV, movies, and video content, you’re an avid (sometimes binge-watching) fan of video streaming and video on-demand. More and more people are opting to view content via streaming technologies. In fact, a growing number of broadcast shows are viewed on mobile and streaming devices, as are a number of live events, such as this year’s NCAA basketball tournament via streaming devices.

These are fascinating data points to ponder, but think about what goes on behind them.

How does all this video content get stored, managed, and streamed?

Suffice it to say, telecom and media companies around the world are addressing this exact challenge with content delivery networks (CDN). There are a variety of interesting technologies out there to help develop CDNs, and one interesting new technology to enable this is object storage, especially when it comes to petabytes of data.

Here’s how object storage helps when it comes to streaming content.

With streaming content comes a LOT of data. Managing and moving that data is a key area to address, and object storage handles it well. It allows telecom and media companies to effectively manage many petabytes of content with ease – many IT options lack that ability to scale. Features in object storage like replication and erasure coding allow users to break large volumes of data into bite size chunks, and disperse it over several different server nodes, and often times, several different geographic locations. As data is needed, it is rapidly re-compiled and distributed as needed.
Raise your hand if you absolutely love to wait for your video content to load. (Silence.) The fact is, no one likes to see the status bar slowly creeping along, while you’re waiting for zombies, your futbol club, or the next big singing sensation to show up on the screen. Because object storage technologies are able to support super high bandwidth and millions of HTTP requests per minute, any customer looking to distribute media is able to allow their customers access to content with superior performance metrics. It has a lot to do with the network, but also with the software managing the data behind the network, and object storage fits the bill.

These are just two of the considerations, and there are many others, but object storage becomes an interesting technology to consider if you’re looking to get content or media online, especially if you are in the telecom or media space.

Want a real life example? Check out how our customer RTL II, a European based television station, addressed their video streaming challenge with object storage. It’s all detaile here in this case study – “RTL II shifts video archive into hyperscale with HP and Scality.” Using HP ProLiant SL4540 big data servers and object storage software from HP partner Scality, RTL II was able to boost their video transfer speeds by 10x!

Webinar this week! If this is a space you could use more education on, Scality and HP will be hosting a couple of webinars this week, specifically around object storage and content delivery networks. If you’re looking for more on this, be sure to join us – here are the details:

Session 1 (Time-friendly for European and APJ audiences)

Who: HP’s big data strategist, Sanjeet Singh, and Scality VP, Leo Leung
Date: Wed, Apr 8, 2015
Time: 3pm Central Europe Summer / 8am Central US
Registration Link

Session 2 (Time-friendly for North American audiences)

Who: HP Director, Joseph George, and Scality VP, Leo Leung
Date: Wed, Apr 8, 2015
Time: 10am Pacific US / 12 noon Central US
Registration Link

And as always, for any questions at all, you can always send us an email at BigDataEcosystem@hp.com or visit us at www.hp.com/go/ProLiant/BigDataServer.

And now off to relax and watch some TV – via streaming video of course!

Until next time,

JOSEPH
@jbgeorge

Categories: big data, Cloud Computing, hadoop, Innovation, open source, openstack, Tech Tags: analytics, big data, ceph, cloud, data, media, object storage, telco

Recognizing the Layers of Critical Insight That Data Offers

March 11, 2015 jbgeorge Leave a comment

This is a joint blog I did with John Furrier of SiliconAngle / theCube and Leo Leung from Scality, originally published at http://bit.ly/1E6nQuR

Data is an interesting concept.

During a recent CrowdChat a number of us started talking about server based storage, big data, etc., and the topic quickly developed into a forum on data and its inherent qualities. The discussion led us to realize that data actually has a number of attributes that clearly define it – similar to how a vector has both a direction and magnitude.

Several of the attributes we uncovered as we delved into this notion of data as a vector include:

Data Gravity: This was a concept developed by my friend, Dave McCrory, a few years ago, and it is a burgeoning area of study today. The idea is that as data is accumulated, additional services and applications are attracted to this data – similar to how a planet’s gravitational pull attracts objects to it. An example would be the number 10. If you the “years old” context is “attracted” to that original data point, it adds a certain meaning to it. If the “who” context is applied to a dog vs. a human being, it takes on additional meaning.
Relative Location with Similar Data: You could argue that this is related to data gravity, but I see it as more of a poignant a point that bears calling out. At a Hadoop World conference many years ago, I heard Tim O’Reilly make the comment that our data is most meaningful when it’s around other data. A good example of this is medical data. Health information of a single individual (one person) may lead to some insights, but when placed together with data from a members of a family, co-workers on a job location, or the citizens of a town, you are able to draw meaningful conclusions. When grouped with other data, individual pieces of data take on more meaning.
Time: This came up when someone posed the question “does anyone delete data anymore?” With the storage costs at scale becoming more and more affordable, we concluded that there is no longer an economic need to delete data (though there may be regulatory reasons to do so). Then came the question of determining what data was not valuable enough to keep, which led to the epiphany that data that might be viewed as not valuable today, may become significantly valuable tomorrow. Medical information is a good example here as well – capturing the data that certain individuals in the 1800’s were plagued with a specific medical condition may not seem meaningful at the time, until you’ve tracked data on specific descendants of his family being plagued by similar ills over the next few centuries. It is difficult to quantify the value of specific data at the time of its creation.

Data as a vector.jpg

In discussing this with my colleagues, it became very clear how early we are in the evolution of data / big data / software defined storage. With so many angles yet to be discussed and discovered, the possibilities are endless.

This is why it is critical that you start your own journey to salvage the critical insights your data offers. It can help you drive efficiency in product development, it can help you better serve you constituents, and it can help you solve seemingly unsolvable problems. Technologies like object storage, cloud based storage, Hadoop, and more are allowing us to learn from our data in ways we couldn’t imagine 10 years ago.

And there’s a lot happening today – it’s not science fiction. In fact, we are seeing customers implement these technologies and make a turn for the better – figuring out how to treat more patients, enabling student researchers to share data across geographic boundaries, moving media companies to stream content across the web, and allowing financial institutions to detect fraud when it happens. Though the technologies may be considered “emerging,” the results are very, very real.

Over the next few months, we’ll discuss specific examples of how customers are making this work in their environments, tips on implementing these innovative technologies, some unique innovations that we’ve developed in both server hardware and open source software, and maybe even some best practices that we’ve developed after deploying so many of these big data solutions.

Stay tuned.

Until next time,

Joseph George – @jbgeorge

Director, HP Servers

Leo Leung – @lleung

VP, Scality

John Furrier – @furrier

Founder of SiliconANGLE Media

Cohost of @theCUBE

CEO of CrowdChat

Categories: big data, E-Progress, hadoop, Innovation, Tech Tags: big data, data, HP, object storage, Scality, SiliconAngle

Day 2: Big Data Innovation Summit 2014 #DataWest14

April 11, 2014 jbgeorge 2 comments

Hello again big data fans – from where I’ve learned the San Francisco 49’ers will be playing their 2014 NFL season at Levi’s Stadium… Santa Clara!

(BTW, the stadium – from what I could see – is beautiful! I’m a big NFL fan, and there’s now another reason to come to the San Jose area, other than all the cloud / big data conferences.)

Got a lot of great feedback on yesterday’s “Day 1” post of the summit, so here are some observations from the final day of the conference.

Yahoo’s Duru Ahanotu spoke through driving efficiency in how data teams are organized, going through the permutations of generalists vs specialists and centralized vs de-centralized, and how to best address teams in each model.
.
PayPal’s Moises Nascimento (who is a very captivating speaker) drove the point home, that though we are now adopting many of the new data technologies like Hadoop and NoSQL, most of our existing data sources and toolsets still provide value – so there is value in leveraging ALL data sources.
.
Moises also made a point of highlighting that data manipulation is best handled at the SYSTEM level, while data analysis is better managed at the ENTERPRISE level
.
In HP’s discussion, they introduced the concept of the GEOBYTE – 10^30 bytes, a size of data that the human race is expected to hit in the next few years.

To provide context on the magnitude of a GEOBYTE (10^30 bytes), there is estimated to only be 10^19 GRAINS OF SAND ON THE EARTH. Think about that for a second.

The team also highlighted their view on “Big BI” vs “Big Data”
- Big BI – same types of analysis but on more data; more batch processing; results that were not easily actionable
- Big Data – joining datasets that have not been previously joined, near real time analysis, action oriented results
  .
I thought Ancestry.com had one of the best sessions of the event, as they went deep into the GERMLINE algorithm that was the foundation of their business technology, and how they had to create jermline (now with a “j”) based on Hadoop / HDFS to create a SCALABLE matching engine. As we all know, SCALE matters. The performance and speed benchmarks between the “G” project and the “j” project were mindblowing.
.
Finally, sat in on the Netflix session – in addition to being a big fan of Netflix, as both a consumer and a tech observer, I’ve always been impressed with the way Netflix has evolved their business, and continues to do so. In this session, they went into great detail on their use of the Amazon cloud services, and their open source projects as a layer above to enhance functionality and deploy features. Topics touched on included red / black deployment to allow ease of features into production, and the importance of graceful degradation, so that a failure can be less of a catastrophic event for the end user.
.
- One very telling statement is really a commentary on the value of use and participation in the open source process – Netflix was clear that they see value in being an open source contributor / leader is that it preserves the future of their systems – rather than sitting back and letting the industry decide their direction with tools and tech, Netflix uses open source to help drive and lead the industry to where they see value.
  .
(I did resist the urge to ask the Netflix presenter when the next season of “House of Cards” would come out. 🙂 )
.

One of the frequent questions that came up at the Dell booth was “what is Dell doing in big data?”

The answer? Actually… quite a bit, and for quite a while.

Between the Dell Apache Hadoop HW+SW+Services Solution, the Toad BI suite, the Kitenga analytics toolsets, and our growing HPC business, Dell has been a part of this movement since its early days. I’d recommend you drop us a line at Hadoop@Dell.com or visit us at http://www.Dell.com/Hadoop to learn more.

If you were out at the show this week, be sure to leave a comment on your thoughts as well.

Hope everyone has safe trips home, and we’ll see you at the next big data get-together!

Until next time,

JBG
@jbgeorge

Categories: big data, E-Progress, hadoop, Innovation, open source, Social Media, Tech Tags: big data, big data innovation summit, data, dell, hadoop

The JBGeorge Tech Blog

Archive

Recognizing the Layers of Critical Insight That Data Offers

JBG Tweets

Blogroll