XLDB-2016 Conference Program

Tuesday, MAY 24, 2016
9:00 AM 20 Welcome, Conference Introductions, Logistics Jacek Becla
(SLAC XLDB2016 chair)
Real-time Analytics Moderator: Stephen Brobst
The world of Big Data Analytics (BDA) is quickly evolving from a paradigm of batch file processing to the world of real-time stream processing. New tools, algorithms, and platforms are required to exploit BDA opportunities when handling XLDB sized data processing in real-time. This workshop will explore emerging design patterns and best practice experiences from the domains of both science and commercial implementations.
9:20 AM 10 Introduction Stephen Brobst  
9:30 AM 40 Real-time Data at LinkedIn Shirshanka Das
Kapil Surlaker
10:30 AM 40 Real-time data analytics at the National Energy Research Scientific Computing Center Prabhat
11:10 AM 40 Scaling Blockchain Infrastructure: Multi-party Synchronization and Advanced Analytics Greg Schvey
11:50 AM 40 Real-time Analytics Discussion Panel  
Decoupling Compute from Storage Moderator: Per Brashers  
What are the effects on networking, costs, data integrity, etc. of the different architectural choices? How does cascading failure get thwarted in each, so what is the 'blast zone' of a failure? How is longevity of data preserved, and what methods do I have to restore my RPO/RTO given a failure condition? In this session we plan to explore these and more through a series of talks. These talks will include vision, actionable design, and end-user experience. There will also be a group Q&A for broader understanding of the topic.
1:30 PM 25 Trends Affecting Disaggregation Per Brashers
1:55 PM 25 Journey to the Enterprise Cloud Binny Gill
2:20 PM 25 Intel Rack Scale Architecture Mohan Kumar
2:45 PM 25 Snowflake Marcin Zukowski
3:10 PM 20 Decoupling Compute from Storage Q&A
Lightning Talks Moderator: Jacek Becla  
4:10 PM 40
1. Best Practices in Data Lake Deployment Stephen Brobst / CTO Teradata pdf
2. SQL, Scaling, and What's Unique About PostgreSQL Ozgun Erdogan / CTO and Co-Founder Citus Data pdf
3. Big Data Application Development Anti Patterns Steve Gonzales / Principal Manager at Think Big Analytics pdf
4. MyFlashSQL Sang-Won Lee / Professor at SKKU SICE pdf
5. Presto Matthew Fuller / Engineer, Teradata pdf
6. Adding Analytical Behavioral Intelligence to the Block Storage Layer Andy Mills / President/CEO and Co-Founder of enmotus pdf
7. Implementing Connected Component Labeling as a User Defined Operator for SciDB Amidu O Oloso / Computational Scientist, NASA pdf
8. Using Co-processors to Accelerate Analytics Debabrata Sarkar / Senior Engineering Manager at Oracle pdf
~6:30 PM   Adjourn      
Wednesday, MAY 25, 2016
9:00 AM 30 Sketching the Future of Data Processing Eric Tschetter
Big Data Management and Analytics as a Cloud Service Moderator: Magdalena Balazinska
The quantity and variety of cloud services for data management and analytics is growing. Azure now offers Datalake, HDInsight, and ML services. Amazon deployed Redshift to complement their well established Elastic MapReduce service. Google offers BigQuery, Dataproc, and Datalab. This session will explore new advances in cloud services for big data management and analytics together with example of successful applications that leverage them.
9:30 AM 40 Big Data at Microsoft Raghu Ramakrishnan
10:10 AM 40 Data Analytics Services at AWS Mehul Shah,   Andrew Caldwell
11:10 AM 40 Google Cloud Dataflow / Apache Beam Jelena Pjesivac-Grbovic
11:50 AM 40 Ability and Audacity to scale your science: Building global communities with shared computational infrastructure Nirav Merchant
Security in Big Data Systems Moderator: Gary Golomb
1:30 PM 30 Data-driven cybersecurity state-of-the-art and future directions Glenn Chisholm
2:00 PM 30 Security – Insights at Scale  Raffael Marty
2:30 PM 30 Security and Data Science Luke McConoughey
(Silicon Valley Bank)
Lightning Talks Moderator: Jacek Becla
3:00 PM 40
1. Vertica and Spark: Connecting Computation and Data Edward Ma / Software Engineer, Hewlett Packard Enterprise pdf
2. Big Data and Cyber Security Tom Plunkett / Oracle pdf
3. Fast and Scalable Inequality Joins Zuhair Yarub Khayyat / InfoCloud Research Group pdf
4. ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys Chiyoung Seo / Software Architect, Couchbase pdf
5. Bridging Oracle with Hadoop Zbigniew Baranowski / Researcher at CERN pdf
6. SQL in Silicon: SQL Processing on Specialized Hardware Weiwei Gong / Senior Member of Technical Staff at Oracle pdf
7. FileDB: Extending Relational Databases For Scientific Data Sets Gerard Lemson / Research Scientist - IDIES, John Hopkins University pdf
8. Virginia Connected Corridors - the velocity of data and advanced automotive research Clark Gaylord / Virginia Tech, Virginia Tech Transportation Institute pdf
4:15 PM 30 Streaming SQL Julian Hyde
4:45 PM Adjourn
Thursday, MAY 26, 2016
Late Bindings Moderator: Tim Frazier
The last 15 years has seen the quiet rise of technologies that enable the structure of the databases to be specified at query time rather than at the time of ingest. This new paradigm, “Schema on Need”, is supported by products such as Splunk, CouchBase, MongoDB, Vertica’s Flex Table and the native support of JSON in RDBMSs. These technologies are being paired with powerful data visualization tools to provide the ability to mine data sources typically not supported by relational databases. This session will explore the motivation for these technologies and some of the largest use cases where they have been deployed.
09:00 AM 30 Design Goals and Architectural Tradeoffs We Made with MongoDB Kelly Stirman
(MongoDB, Inc.)
09:30 AM 30 Late Binding and optimization Vinayak Borkar
(X15 Software, Inc.)
10:00 AM 30 Title Coming Soon Chris Pride
Long-term Storage Moderator: K-T Lim
Storing data for the long term takes significant effort and takes careful thought in the design. Modern science projects can take decades to run to completion, millions of dollars are spent to produce the data, but what happens to it after that? Industry an technology companies look more and more like long-term science now: many Internet companies are more than a decade old and still expect to keep and have access to their oldest data. How can these enormous quantities of data be retained and accessed cheaply and efficiently; or conversely, how can it be decided what data to throw away?
10:50 AM 35 BaBar Data Preservation and Access Concetta Cartaro
11:25 AM 35 Long-term Data Archiving with Amazon Glacier Henry Zhang
Closing Remarks
12:00 PM 15 Closing Remarks
Next conference planning, final conclusions and closeout
Jacek Becla
1:15 PM Adjourn

We are planning to conduct a small meeting right after lunch to brainstorm about future XLDB events with particular focus on topics to cover at the next event. If you are interested in getting involved and helping with organizing the next XLDB please attend this meeting.

