XLDB-2015 Conference Program

Tuesday, MAY 19, 2015
9:00 AM 20 Welcome, Conference Introductions, Logistics Jacek Becla
(SLAC XLDB2015 chair)
Statistical Tools and Machine Learning Moderator: Martin Kersten
Statistical tools, databases, programming languages and alternatives for advanced analytics and machine learning at scale. How are they used in real-world applications, what their limitations are, what is missing and how the XLDB community can help.
9:20 AM 10 Introduction Martin Kersten
9:30 AM 30 On the Practice of Predictive Modeling with Big Data: The Extra Steps that Make the Difference Nachum Shacham
10:00 AM 30 ROOT: a Data Storage and Analysis Framework Rene Brun
10:30 AM 40 Sentient Enterprise   Oliver Ratzesberger
11:30 AM 7 ElasticR: Connecting the Dots of Scientific Computing, from the pi to the Clouds Karim Chine
(Cloud Era Ltd)
11:37 AM 7 Lessons from your Parent's Big Data War Paul G. Brown
11:44 AM 7 From Walled Kingdom to Toolbox Hannes Muehleisen
11:51 AM 24 Discussion panel
12:15 PM 15 R in the World: Interfaces between Languages John Chambers
Special Keynote Moderator: Jacek Becla
1:30 PM 60
Stephen Wolfram
(Wolfram Research)
Lightning Talks Moderator: Jacek Becla
2:30 PM 50
1. Effective Model Calibration for Terascale Analytics Florin Rusu / University of California at Merced pdf
2. Lessons Learned from the Petabyte Scale Biomedical Data Commons and Clouds Robert Grossman / University of Chicago pdf
3. Data Partitioning in MapReduce Josh Walters / Yahoo pdf
4. MySQL + RocksDB for better storage efficiency than InnoDB Siying Dong / FaceBook pdf
5. R as a Query Language Hannes Muhleisen / CWI pdf
6. Data Complex @ Yahoo: Speed, Completeness and Accuracy Sundeep Narravula / Yahoo pdf
7. Extending Vertica with External Analytics Malu Castellanos / HP pdf
8. Reducing optimization time for complex analytical queries in an in-memory distributed data processing system Rajkumar Sen / MemSQL pdf
9. SPEC RG Big Data Working Group: An Introduction John Poelman / IBM pdf
Urban and Civic Science Moderator: Bill Howe
Previous XLDB events explored challenges, needs and lessons learned from a variety of domains in science and industry. This year special emphasis is given to the emerging field of data-intensive urban science and urban informatics -- "smart cities." Researchers, practitioners, and policy makers from a variety of fields are working to help cities become more efficient, more productive, more equitable, and more livable by bringing together new, massive data sources, complex analytics, and new technologies. In addition, there is an open data revolution underway aimed at improving transparency, efficiency, accountability The session will walk us through the relevant challenges, needs, approaches and lessons learned that the urban science community is facing, and how the XLDB community can engage.
4:00 PM 25 DataSF: Open Data Initiatives in the City of San Francisco Joy Bonaguro
(City and County of
San Francisco)
4:25 PM 25 Enabling Low Friction Sharing, Discovery and Analysis of Heterogeneous Civic Data Deep Dhillon
4:50 PM 25 Visual Exploration of Big Urban Data Huy Vo
(Center for Urban
Science+Progress, New York)
5:15 PM 25 Big Data Analytics in the Utilities Industry (a background article)
Timotej Gavrilovic,
Colin Kerrigan
5:40 PM 20 Discussion Panel
~7:30 PM Adjourn

9:00 AM 5 Announcements XLDB Organizers  
9:05 AM 30 Critical Technologies Necessary for Big Data Exploitation Stephen Brobst
9:35 AM 30 Creating an Effective Data Platform Kurt Brown
The New Big Data Ecosystem Moderator: Chris Kemp
Over the past few years, we have seen the introduction of a number of new infrastructure technologies from the operators of some of the largest infrastructure providers. From new distributed block and object storage technologies, container technologies and orchestration platforms like Kubernetes and Cloud Foundry, these technologies are reshaping the way large distributed systems are being built. This session will explore the new ecosystem of big data infrastructure technologies that are being used to build some of the world’s largest applications.
10:05 AM 15 How To Create the Google for Earth Data
Rainer Sternfeld
(Planet OS)
10:20 AM 35 Kubernetes and the Path to Cloud Native Eric Brewer
(Google, UC Berkeley)
11:15 AM 40 How Not To Use a Cluster Chris Holcombe
Big Data Current Practice & Next Steps Moderator:
Stephen Brobst
A description of current best practices and the next generation problems that are yet unsolved in the big data analytics arena. Research directions and start-up opportunities abound - how should these resources be focused to yield maximum benefit to the XLDB community.
11:55 AM 30 Unifying large-scale batch and stream processing at Google William Vambenepe
1:25 PM 30 Big Data Storage: Should We Pop the (Software) Stack? Mike Carey
(UC Irvine)
1:55 PM 30 There’s no data like more data Theo Vassilakis
2:25 PM 30 Accelerating Deep Learning at Facebook. Keith Adams
2:55 PM 15 Q&A
Lightning Talks Moderator: Jacek Becla
3:10 PM 45
1. Analyzing Large Scale Genomic Data Using the Google Cloud Platform Cuiping Pan / VA Palo Alto pdf
2. Apache Calcite: One planner fits all Julian Hyde / Hortonworks pdf
3. The Data Exacell and Bridges: Database-Enabling Technologies for the National Research Community Nick Nystrom / Pittsburgh Supercomputing Center pdf
4. Driving the revolution in Personalized Medicine Somalee Datta / Stanford pdf
5. Custom Tooling for Loading Petabytes of Genomic Data into SciDB Douglas J. Slotta pdf
6. Integrative Multi-scale Analysis in Biomedical Informatics Joel Saltz / Cherith Chair of Biomedical Informatics pdf
7. Applying scalable databases at CERN Kacper Surdy / CERN pdf
8. Sketchy Approximations; Exactly Overrated Eric Tschetter / Yahoo! pdf
4:35 PM 30 SeeDB - Towards Automatic Visualization of Query Results Manasi Vartak
5:05 PM 15 XLDB Healthcare Workshop Debrief Mary Saltz
(Stony Brook Univ)
5:20 PM 5 XLDB-Gov Chaitan Baru
5:25 PM 10 Closing Remarks
Next conference planning, final conclusions and closeout
Jacek Becla
5:35 PM Adjourn
