Archive
Day 2: Big Data Innovation Summit 2014 #DataWest14
.
Hello again big data fans – from where I’ve learned the San Francisco 49’ers will be playing their 2014 NFL season at Levi’s Stadium… Santa Clara!
(BTW, the stadium – from what I could see – is beautiful! I’m a big NFL fan, and there’s now another reason to come to the San Jose area, other than all the cloud / big data conferences.)
Got a lot of great feedback on yesterday’s “Day 1” post of the summit, so here are some observations from the final day of the conference.
- Yahoo’s Duru Ahanotu spoke through driving efficiency in how data teams are organized, going through the permutations of generalists vs specialists and centralized vs de-centralized, and how to best address teams in each model.
. - PayPal’s Moises Nascimento (who is a very captivating speaker) drove the point home, that though we are now adopting many of the new data technologies like Hadoop and NoSQL, most of our existing data sources and toolsets still provide value – so there is value in leveraging ALL data sources.
. - Moises also made a point of highlighting that data manipulation is best handled at the SYSTEM level, while data analysis is better managed at the ENTERPRISE level
. - In HP’s discussion, they introduced the concept of the GEOBYTE – 10^30 bytes, a size of data that the human race is expected to hit in the next few years.
To provide context on the magnitude of a GEOBYTE (10^30 bytes), there is estimated to only be 10^19 GRAINS OF SAND ON THE EARTH. Think about that for a second.
- The team also highlighted their view on “Big BI” vs “Big Data”
- Big BI – same types of analysis but on more data; more batch processing; results that were not easily actionable
- Big Data – joining datasets that have not been previously joined, near real time analysis, action oriented results
.
- I thought Ancestry.com had one of the best sessions of the event, as they went deep into the GERMLINE algorithm that was the foundation of their business technology, and how they had to create jermline (now with a “j”) based on Hadoop / HDFS to create a SCALABLE matching engine. As we all know, SCALE matters. The performance and speed benchmarks between the “G” project and the “j” project were mindblowing.
. - Finally, sat in on the Netflix session – in addition to being a big fan of Netflix, as both a consumer and a tech observer, I’ve always been impressed with the way Netflix has evolved their business, and continues to do so. In this session, they went into great detail on their use of the Amazon cloud services, and their open source projects as a layer above to enhance functionality and deploy features. Topics touched on included red / black deployment to allow ease of features into production, and the importance of graceful degradation, so that a failure can be less of a catastrophic event for the end user.
.- One very telling statement is really a commentary on the value of use and participation in the open source process – Netflix was clear that they see value in being an open source contributor / leader is that it preserves the future of their systems – rather than sitting back and letting the industry decide their direction with tools and tech, Netflix uses open source to help drive and lead the industry to where they see value.
.
- One very telling statement is really a commentary on the value of use and participation in the open source process – Netflix was clear that they see value in being an open source contributor / leader is that it preserves the future of their systems – rather than sitting back and letting the industry decide their direction with tools and tech, Netflix uses open source to help drive and lead the industry to where they see value.
- (I did resist the urge to ask the Netflix presenter when the next season of “House of Cards” would come out. 🙂 )
.
One of the frequent questions that came up at the Dell booth was “what is Dell doing in big data?”
The answer? Actually… quite a bit, and for quite a while.
Between the Dell Apache Hadoop HW+SW+Services Solution, the Toad BI suite, the Kitenga analytics toolsets, and our growing HPC business, Dell has been a part of this movement since its early days. I’d recommend you drop us a line at Hadoop@Dell.com or visit us at http://www.Dell.com/Hadoop to learn more.
If you were out at the show this week, be sure to leave a comment on your thoughts as well.
Hope everyone has safe trips home, and we’ll see you at the next big data get-together!
Until next time,
JBG
@jbgeorge
Day 1: Big Data Innovation Summit 2014
.
Hello from sunny, Santa Clara!
My team and I are here at the BIG DATA INNOVATION SUMMIT representing Dell (the company I work for), and it’s been a great day one.
I just wanted to take a few minutes to jot down some interesting ideas I heard today:
- In Daniel Austin’s keynote, he addressed that the “Internet of things” should really be the “individual network of things” – highlighting that the number of devices, their connectivity, their availability, and their partitioning is what will be key in the future.
. - One data point that also came out of Daniel’s talk – every person is predicted to generate 20 PETABYTES of data over the course of a lifetime!
. - Juan Lavista of Bing hit on a number of key myths around big data:
- the most important part of big data is its size
- to do big data, all you need is Hadoop
- with big data, theory is no longer needed
- data scientists are always right 🙂
QUOTE OF THE DAY: “Correlation does not yield causation.” – Juan Lavista (Bing)
- Anthony Scriffignano was quick to admonish the audience that “it’s not just about data, it’s not just about the math… [data] relationships matter.”
. - The state of Utah state government is taking a very progressive view to areas that analytics can help drive efficiency in at that level – census data use, welfare system fraud, etc. And it appears Utah is taking a leadership position in doing so.
I also had the privilege of moderating a panel on the topic of the convergence between HPC and the big data spaces, with representatives on the panel from Dell (Armando Acosta), Intel (Brent Gorda), and the Texas Advanced Computing Center (Niall Gaffney). Some great discussion about the connections between the two, plus tech talk on the Lustre plug-in and the SLURM resource management project.
Additionally, Dell product strategists Sanjeet Singh and Joey Jablonski presented on a number of real user implementations of big data and analytics technologies – from university student retention projects to building a true centralized, enterprise data hub. Extremely informative.
All in all, a great day one!
If you’re out here, stop by and visit us at the Dell booth. We’ll be showcasing our hadoop and big data solutions, as well as some of the analytics capabilities we offer.
(We’ll also be giving away a Dell tablet on Thursday at 1:30, so be sure to get entered into the drawing early.)
Stay tuned, and I’ll drop another update tomorrow.
Until next time,
JOSEPH
@jbgeorge