XLDB-2016 Conference Program

Tuesday, MAY 24, 2016

08:00 AM

Continental Breakfast (registration starts 7:30 AM, SLAC SUSB near Panofsky Auditorium)

9:00 AM

Welcome, Conference Introductions, Logistics

Jacek Becla
(SLAC XLDB2016 chair)

Real-time Analytics

Moderator: Stephen Brobst

The world of Big Data Analytics (BDA) is quickly evolving from a paradigm of batch file processing to the world of real-time stream processing. New tools, algorithms, and platforms are required to exploit BDA opportunities when handling XLDB sized data processing in real-time. This workshop will explore emerging design patterns and best practice experiences from the domains of both science and commercial implementations.

9:20 AM

Introduction

Stephen Brobst

9:30 AM

Real-time Data at LinkedIn

LinkedIn has a rich ecosystem of data-driven products like People you may know, Who viewed my Profile and a multitude of recommendation products as well as business facing insights products. Building a data product end-to-end requires ...»

Shirshanka Das
Kapil Surlaker
(LinkedIn)

10:10 AM

Coffee Break

10:30 AM

Real-time data analytics at the National Energy Research Scientific Computing Center

Berkeley Lab and NERSC are at the frontier of scientific research. Historically, NERSC has provided leadership computing for the computational science community, but we now...»

Prabhat
(NERSC)

11:10 AM

Scaling Blockchain Infrastructure: Multi-party Synchronization and Advanced Analytics

Blockchain technology has driven a new wave of profound technical innovations, yet remains largely misunderstood both in its fundamentals and enterprise applications. This nascent infrastructure technology opens the door for ...»

Greg Schvey
(Axoni)

11:50 AM

Real-time Analytics Discussion Panel

12:30 PM

Lunch

Decoupling Compute from Storage

Moderator: Per Brashers

What are the effects on networking, costs, data integrity, etc. of the different architectural choices? How does cascading failure get thwarted in each, so what is the 'blast zone' of a failure? How is longevity of data preserved, and what methods do I have to restore my RPO/RTO given a failure condition? In this session we plan to explore these and more through a series of talks. These talks will include vision, actionable design, and end-user experience. There will also be a group Q&A for broader understanding of the topic.

1:30 PM

Trends Affecting Disaggregation

In this talk we will examine the forces both pro and con that are enabling a new paradigm of compute and storage in warehouse style compute centers. We will briefly discuss different viewpoints all along the data lifecycle, as well as ...»

Per Brashers
(Yttibrium)

1:55 PM

Journey to the Enterprise Cloud

When storage became faster it challenged the decades old 3-tier architecture ...»

Binny Gill
(Nutanix)

2:20 PM

Intel Rack Scale Architecture

This talk provides an overview of Intel Rack Scale Architecture and discusses how this architecture addresses underutilized and stranded resources in a Data center ...»

Mohan Kumar
(Intel)

2:45 PM

Snowflake

Snowflake is a multi-tenant, transactional, secure, highly scalable and elastic system with full SQL support and built-in extensions for ...»

Marcin Zukowski
(Snowflake)

3:10 PM

Decoupling Compute from Storage Q&A

3:30 PM

Coffee Break

Lightning Talks

Moderator: Jacek Becla

4:10 PM

1.	Best Practices in Data Lake Deployment	Stephen Brobst / CTO Teradata
2.	SQL, Scaling, and What's Unique About PostgreSQL	Ozgun Erdogan / CTO and Co-Founder Citus Data
3.	Big Data Application Development Anti Patterns	Steve Gonzales / Principal Manager at Think Big Analytics
4.	MyFlashSQL	Sang-Won Lee / Professor at SKKU SICE
5.	Presto	Matthew Fuller / Engineer, Teradata
6.	Adding Analytical Behavioral Intelligence to the Block Storage Layer	Andy Mills / President/CEO and Co-Founder of enmotus
7.	Implementing Connected Component Labeling as a User Defined Operator for SciDB	Amidu O Oloso / Computational Scientist, NASA
8.	Using Co-processors to Accelerate Analytics	Debabrata Sarkar / Senior Engineering Manager at Oracle

Networking reception

5:00 PM

Networking reception

Light appetizers and drinks served. Included in the conference registration.

~6:30 PM

Adjourn

Wednesday, MAY 25, 2016

08:00 AM

Continental Breakfast (registration starts 7:30 AM, SLAC SUSB near Panofsky Auditorium)

9:00 AM

Sketching the Future of Data Processing

The costs of processing data generally scale with the amount of data to be processed. One of the staple techniques for reducing the size of a data set is summarization, where you choose to remove some dimensions and aggregate apriori. Summarization has...»

Eric Tschetter
(Yahoo!)

Big Data Management and Analytics as a Cloud Service

Moderator: Magdalena Balazinska

The quantity and variety of cloud services for data management and analytics is growing. Azure now offers Datalake, HDInsight, and ML services. Amazon deployed Redshift to complement their well established Elastic MapReduce service. Google offers BigQuery, Dataproc, and Datalab. This session will explore new advances in cloud services for big data management and analytics together with example of successful applications that leverage them.

9:30 AM

Big Data at Microsoft

Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is ...»

Raghu Ramakrishnan
(Microsoft)

10:10 AM

Data Analytics Services at AWS

With the ubiquity of data sources and cheap storage, today's enterprises want to collect and store a wide variety of data, even before they know what to do with it. ...»

Mehul Shah, Andrew Caldwell
(Amazon)

10:50 AM

Coffee Break

11:10 AM

Google Cloud Dataflow / Apache Beam

Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business (e.g. Web logs, mobile usage statistics, and sensor networks). At the same time, consumers of these datasets have ...»

Jelena Pjesivac-Grbovic
(Google)

11:50 AM

Ability and Audacity to scale your science: Building global communities with shared computational infrastructure

Over the last decade, the discipline of life sciences has benefited tremendously from new, massively parallel, and highly quantitative technologies. These technologies have facilitated ...»

Nirav Merchant
(CyVerse)

12:30 PM

Lunch

Security in Big Data Systems

Moderator: Gary Golomb

1:30 PM

Data-driven cybersecurity state-of-the-art and future directions

Breakthroughs in data analytics seem almost commonplace in the recent years, and yet the effective application of data analytics to information security seems elusive, as evidenced by ...»

Glenn Chisholm
(Cylance)

1:40 PM

Security – Insights at Scale

Ensuring security of a company’s data and infrastructure has largely become a data analytics challenge....»

Raffael Marty
(Sophos)

2:00 PM

Security and Data Science

The Security Operations Center (SOC) in a corporation is charged with protecting and defending assets and operations from attackers...»

Luke McConoughey
(Silicon Valley Bank)

2:10 PM

Discussion Panel

Lightning Talks

Moderator: Jacek Becla

3:00 PM

1.	Vertica and Spark: Connecting Computation and Data	Edward Ma / Software Engineer, Hewlett Packard Enterprise
2.	Big Data and Cyber Security	Tom Plunkett / Oracle
3.	Fast and Scalable Inequality Joins	Zuhair Yarub Khayyat / InfoCloud Research Group
4.	ForestDB: A Fast Key-Value Storage System for Variable-Length String Keys	Chiyoung Seo / Software Architect, Couchbase
5.	Bridging Oracle with Hadoop	Zbigniew Baranowski / Researcher at CERN
6.	SQL in Silicon: SQL Processing on Specialized Hardware	Weiwei Gong / Senior Member of Technical Staff at Oracle
7.	FileDB: Extending Relational Databases For Scientific Data Sets	Gerard Lemson / Research Scientist - IDIES, John Hopkins University
8.	Virginia Connected Corridors - the velocity of data and advanced automotive research	Clark Gaylord / Virginia Tech, Virginia Tech Transportation Institute

3:40 PM

Coffee Break & Ice Creams

4:15 PM

Streaming SQL

Streaming is a paradigm for data processing that is rapidly growing in popularity, because it allows high throughput, low latency responses, and efficiently manages multitudes of IoT devices. Is it an alternative to database processing...Â»

Julian Hyde
(Hortonworks)

4:45 PM

Adjourn

Thursday, MAY 26, 2016
08:00 AM		Continental Breakfast (registration starts 7:30 AM, SLAC SUSB near Panofsky Auditorium)
Late Bindings			Moderator: Tim Frazier
		The last 15 years has seen the quiet rise of technologies that enable the structure of the databases to be specified at query time rather than at the time of ingest. This new paradigm, â€œSchema on Needâ€, is supported by products such as Splunk, CouchBase, MongoDB, Verticaâ€™s Flex Table and the native support of JSON in RDBMSs. These technologies are being paired with powerful data visualization tools to provide the ability to mine data sources typically not supported by relational databases. This session will explore the motivation for these technologies and some of the largest use cases where they have been deployed.
09:00 AM	30	Design Goals and Architectural Tradeoffs We Made with MongoDB Most engineering projects are the end result of a series of compromises...»	Kelly Stirman (MongoDB, Inc.)
09:30 AM	30	Late Binding and optimization Traditional database management systems have mostly been designed and optimized to operate in a "schema-first" setting. Before any data can be loaded ...»	Vinayak Borkar (X15 Software, Inc.)
10:00 AM	30	Late Binding Schema	Chris Pride (Splunk)
10:30 AM	20	Coffee Break
Long-term Storage			Moderator: K-T Lim
		Storing data for the long term takes significant effort and takes careful thought in the design. Modern science projects can take decades to run to completion, millions of dollars are spent to produce the data, but what happens to it after that? Industry an technology companies look more and more like long-term science now: many Internet companies are more than a decade old and still expect to keep and have access to their oldest data. How can these enormous quantities of data be retained and accessed cheaply and efficiently; or conversely, how can it be decided what data to throw away?
10:50 AM	35	BaBar Data Preservation and Access	Concetta Cartaro (SLAC)
11:25 AM	35	Long-term Data Archiving with Amazon Glacier Preserving PB-scale datasets for the long-term is a challenging task that calls for a reliable, scalable, and cost-effective solution....»	Henry Zhang (Amazon)
Closing Remarks
12:00 PM	15	Closing Remarks Next conference planning, final conclusions and closeout	Jacek Becla
12:15 PM	60	Lunch
1:15 PM		Adjourn

We are planning to conduct a small meeting right after lunch to brainstorm about future XLDB events with particular focus on topics to cover at the next event. If you are interested in getting involved and helping with organizing the next XLDB please attend this meeting.