XLDB - Extremely Large Databases

XLDB-2016 Conference Program

Tuesday, MAY 24, 2016
08:00 AM   Continental Breakfast  (registration starts 7:30 AM, SLAC SUSB near Panofsky Auditorium)    
9:00 AM 20 Welcome, Conference Introductions, Logistics Jacek Becla
(SLAC XLDB2016 chair)
Real-time Analytics Moderator: Stephen Brobst
   
The world of Big Data Analytics (BDA) is quickly evolving from a paradigm of batch file processing to the world of real-time stream processing. New tools, algorithms, and platforms are required to exploit BDA opportunities when handling XLDB sized data processing in real-time. This workshop will explore emerging design patterns and best practice experiences from the domains of both science and commercial implementations.
   
9:20 AM 10 Introduction Stephen Brobst  
9:30 AM 40 Real-time Data at LinkedIn Shirshanka Das
Kapil Surlaker
(LinkedIn)
pdf
10:10 AM 20 Coffee Break
10:30 AM 40 Real-time data analytics at the National Energy Research Scientific Computing Center Prabhat
(NERSC)
pdf
11:10 AM 40 Scaling Blockchain Infrastructure: Multi-party Synchronization and Advanced Analytics Greg Schvey
(Axoni)
11:50 AM 40 Real-time Analytics Discussion Panel  
12:30 PM 60 Lunch  
Decoupling Compute from Storage Moderator: Per Brashers  
   
What are the effects on networking, costs, data integrity, etc. of the different architectural choices? How does cascading failure get thwarted in each, so what is the 'blast zone' of a failure? How is longevity of data preserved, and what methods do I have to restore my RPO/RTO given a failure condition? In this session we plan to explore these and more through a series of talks. These talks will include vision, actionable design, and end-user experience. There will also be a group Q&A for broader understanding of the topic.
   
1:30 PM 25 Trends Affecting Disaggregation Per Brashers
(Yttibrium)
pdf
1:55 PM 25 Journey to the Enterprise Cloud Binny Gill
(Nutanix)
pdf
2:20 PM 25 Intel Rack Scale Architecture Mohan Kumar
(Intel)
pdf
2:45 PM 25 Snowflake Marcin Zukowski
(Snowflake)
pdf
3:10 PM 20 Decoupling Compute from Storage Q&A
3:30 PM 40 Coffee Break  
Lightning Talks Moderator: Jacek Becla  
4:10 PM 40
1. Best Practices in Data Lake Deployment Stephen Brobst / CTO Teradata pdf
2. SQL, Scaling, and What's Unique About PostgreSQL Ozgun Erdogan / CTO and Co-Founder Citus Data pdf
3. Big Data Application Development Anti Patterns Steve Gonzales / Principal Manager at Think Big Analytics pdf
4. MyFlashSQL Sang-Won Lee / Professor at SKKU SICE pdf  
5. Presto Matthew Fuller / Engineer, Teradata pdf
6. Adding Analytical Behavioral Intelligence to the Block Storage Layer Andy Mills / President/CEO and Co-Founder of enmotus pdf
7. Implementing Connected Component Labeling as a User Defined Operator for SciDB Amidu O Oloso / Computational Scientist, NASA pdf
8. Using Co-processors to Accelerate Analytics Debabrata Sarkar / Senior Engineering Manager at Oracle pdf
Networking reception  
5:00 PM 90 Networking reception
Light appetizers and drinks served. Included in the conference registration.
     
~6:30 PM   Adjourn      
Wednesday, MAY 25, 2016
08:00 AM   Continental Breakfast  (registration starts 7:30 AM, SLAC SUSB near Panofsky Auditorium)  
9:00 AM 30 Sketching the Future of Data Processing Eric Tschetter
(Yahoo!)
pdf
Big Data Management and Analytics as a Cloud Service Moderator: Magdalena Balazinska
The quantity and variety of cloud services for data management and analytics is growing. Azure now offers Datalake, HDInsight, and ML services. Amazon deployed Redshift to complement their well established Elastic MapReduce service. Google offers BigQuery, Dataproc, and Datalab. This session will explore new advances in cloud services for big data management and analytics together with example of successful applications that leverage them.
9:30 AM 40 Big Data at Microsoft Raghu Ramakrishnan
(Microsoft)
10:10 AM 40 Data Analytics Services at AWS Mehul Shah,   Andrew Caldwell
(Amazon)
10:50 AM 20 Coffee Break
11:10 AM 40 Google Cloud Dataflow / Apache Beam Jelena Pjesivac-Grbovic
(Google)
pdf
11:50 AM 40 Ability and Audacity to scale your science: Building global communities with shared computational infrastructure Nirav Merchant
(CyVerse)
12:30 PM 60 Lunch
Security in Big Data Systems Moderator: Gary Golomb
1:30 PM 10 Data-driven cybersecurity state-of-the-art and future directions Glenn Chisholm
(Cylance)
1:40 PM 20 Security – Insights at Scale  Raffael Marty
(Sophos)
2:00 PM 10 Security and Data Science Luke McConoughey
(Silicon Valley Bank)
2:10 PM 50 Discussion Panel  
Lightning Talks Moderator: Jacek Becla
3:00 PM 40
1. Vertica and Spark: Connecting Computation and Data Edward Ma / Software Engineer, Hewlett Packard Enterprise pdf
2. Big Data and Cyber Security Tom Plunkett / Oracle pdf
3. Fast and Scalable Inequality Joins Zuhair Yarub Khayyat / InfoCloud Research Group pdf  
4. ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys Chiyoung Seo / Software Architect, Couchbase pdf
5. Bridging Oracle with Hadoop Zbigniew Baranowski / Researcher at CERN pdf
6. SQL in Silicon: SQL Processing on Specialized Hardware Weiwei Gong / Senior Member of Technical Staff at Oracle pdf
7. FileDB: Extending Relational Databases For Scientific Data Sets Gerard Lemson / Research Scientist - IDIES, John Hopkins University pdf
8. Virginia Connected Corridors - the velocity of data and advanced automotive research Clark Gaylord / Virginia Tech, Virginia Tech Transportation Institute pdf
3:40 PM 35 Coffee Break & Ice Creams
4:15 PM 30 Streaming SQL Julian Hyde
(Hortonworks)
pdf
4:45 PM Adjourn
Thursday, MAY 26, 2016
08:00 AM   Continental Breakfast  (registration starts 7:30 AM, SLAC SUSB near Panofsky Auditorium)  
Late Bindings Moderator: Tim Frazier
The last 15 years has seen the quiet rise of technologies that enable the structure of the databases to be specified at query time rather than at the time of ingest. This new paradigm, “Schema on Need”, is supported by products such as Splunk, CouchBase, MongoDB, Vertica’s Flex Table and the native support of JSON in RDBMSs. These technologies are being paired with powerful data visualization tools to provide the ability to mine data sources typically not supported by relational databases. This session will explore the motivation for these technologies and some of the largest use cases where they have been deployed.
09:00 AM 30 Design Goals and Architectural Tradeoffs We Made with MongoDB Kelly Stirman
(MongoDB, Inc.)
pdf
09:30 AM 30 Late Binding and optimization Vinayak Borkar
(X15 Software, Inc.)
10:00 AM 30 Late Binding Schema Chris Pride
(Splunk)
10:30 AM 20 Coffee Break
Long-term Storage Moderator: K-T Lim
Storing data for the long term takes significant effort and takes careful thought in the design. Modern science projects can take decades to run to completion, millions of dollars are spent to produce the data, but what happens to it after that? Industry an technology companies look more and more like long-term science now: many Internet companies are more than a decade old and still expect to keep and have access to their oldest data. How can these enormous quantities of data be retained and accessed cheaply and efficiently; or conversely, how can it be decided what data to throw away?
10:50 AM 35 BaBar Data Preservation and Access Concetta Cartaro
(SLAC)
pdf
11:25 AM 35 Long-term Data Archiving with Amazon Glacier Henry Zhang
(Amazon)
pdf
Closing Remarks
12:00 PM 15 Closing Remarks
Next conference planning, final conclusions and closeout
Jacek Becla
12:15 PM 60 Lunch
1:15 PM Adjourn

We are planning to conduct a small meeting right after lunch to brainstorm about future XLDB events with particular focus on topics to cover at the next event. If you are interested in getting involved and helping with organizing the next XLDB please attend this meeting.

Privacy Statement -