spark submmit大会(2017年6月5日

来源：互联网发布：windows应用商店的游戏编辑：程序博客网时间：2024/06/15 00:48

Spark Summit（2017年6月5日 - 7日，旧金山）议程发布

1、官方：http://spark.apache.org/news/spark-summit-june-2017-agenda-posted.html

2、议程：https://spark-summit.org/2017/schedule/

3、报名：https://prevalentdesignevents.com/sparksummit/ss17/?_ga=1.211902866.780052874.1433437196

很高兴的是有2位中国企业的工程师：

Ron Hu (Huawei Technologies)

Zhenhua Wang(Huawei Technologies)

4、内容如下：

7:00 AM

Registration

TRAINING ROOM 1

TRAINING ROOM 2

TRAINING ROOM 3

TRAINING ROOM 4

TRAINING ROOM 5

TRAINING ROOM 6

TRAINING ROOM 7

9:00 AM

Training: Data Science With Apache Spark 2.x

(9:00 AM–12:00 PM)

Training: Exploring Wikipedia 2 With Apache Spark 2.x

(9:00 AM–12:00 PM)

Training: Apache Spark Intro for Machine Learning and Data Science

(9:00 AM–12:00 PM)

Training: Apache Spark Intro for Data Engineering

(9:00 AM–12:00 PM)

Training: Just Enough Scala for Spark

(9:00 AM–12:00 PM)

Training: Architecting a Data Platform

(9:00 AM–12:00 PM)

Training: Building Your First Big Data Application on AWS

(9:00 AM–12:00 PM)

12:00 PM

Lunch

TRAINING ROOM 1

TRAINING ROOM 2

TRAINING ROOM 3

TRAINING ROOM 4

TRAINING ROOM 5

TRAINING ROOM 6

TRAINING ROOM 7

1:00 PM

Training: Data Science With Apache Spark 2.x

(1:00 PM–5:00 PM)

Training: Exploring Wikipedia 2 With Apache Spark 2.x

(1:00 PM–5:00 PM)

Training: Apache Spark Intro for Machine Learning and Data Science

(1:00 PM–5:00 PM)

Training: Apache Spark Intro for Data Engineering

(1:00 PM–5:00 PM)

Training: Just Enough Scala for Spark

(1:00 PM–5:00 PM)

Training: Architecting a Data Platform

(1:00 PM–5:00 PM)

Training: Building Your First Big Data Application on AWS

(1:00 PM–5:00 PM)

6:00 PM

Meetup

Join us for an evening Bay Area Apache Spark Meetup at the 10th Spark Summit featuring tech-talks about using Apache

Spark at scale from Pepperdata’s CTO Sean Suchter, RISELab’s Dan Crankshaw, and Databricks’ Spark committers… Read more

DAY 2 • TUESDAY, JUNE 6 • DEVELOPER DAY

7:00 AM

Registration

9:05 AM

What to Expect in 2017 for Big Data and Apache Spark

Matei Zaharia (Databricks)
Tim Hunter (Databricks)

9:30 AM

Snorkel: Dark Data and Machine Learning

Christopher Ré (Stanford)

Building applications that can read and analyze a wide variety of data may change the way we do science and make business decisions.

However, building such applications is challenging: real world data is expressed in… Read more

9:45 AM

Unleashing Data Intelligence with Intel and Apache Spark

Michael Greene (Intel)

Organizations are developing deep learning applications to derive new insights, identify new opportunities and uncover new efficiencies.

However, deep learning application development often means tapping into multiple frameworks, libraries, and clusters—a complex,

time-consuming, and costly… Read more

9:55 AM

Rise Lab Fireside Chat

Ben Lorica (O’Reilly Media)
Ion Stoica (UC Berkeley AMP/RISE Lab & Databricks)

Ben Lorica and Ion Stoica discuss the growth and new projects taking place at Rise Lab.

10:15 AM

Keynote by Riot Games

Wes Kerr (Riot Games)

10:30 AM

Break

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

11:00 AM

DEVELOPER

A Deep Dive into Spark SQL's Catalyst Optimizer

Yin Huai(Databricks)

(11:00 AM–11:30 AM)

MACHINE LEARNING

Challenging Web-Scale Graph Analytics with Apache Spark

Xiangrui Meng(Databricks)

(11:00 AM–11:30 AM)

SPARK ECOSYSTEM

Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp Data Fabric and NetApp Private Storage

Karthikeyan Nagalingam(NetApp)
Nilesh Bagad(NetApp)

(11:00 AM–11:30 AM)

SPARK EXPERIENCE AND USE CASES

Scaling Up: How Switching to Apache Spark Improved Performance, Realizability, and Reduced Cost

on a Very Large Scale ML Application

Kexin Xie(Salesforce)
Yacov Salomon(Salesforce)

(11:00 AM–11:30 AM)

ENTERPRISE

Spark Compute as a Service at Paypal

Prabhu Kasinathan(PayPal)

(11:00 AM–11:30 AM)

STREAMING

SSR: Structured Streaming on R for Machine Learning

Felix Cheung(Microsoft)

(11:00 AM–11:30 AM)

RESEARCH

Scaling Genetic Data Analysis with Apache Spark

Jonathan Bloom(Broad Institute of MIT and Harvard)
Timothy Poterba(Broad Institute of MIT and Harvard)

(11:00 AM–11:30 AM)

SPONSORED SESSIONS

TBA

(11:00 AM–11:30 AM)

TECHNICAL DEEP DIVES

Data Science Deep Dive: Spark ML with High Dimensional Labels

Michael Zargham(Cadent)
Stefan Panayotov(Cadent)

(11:00 AM–11:30 AM)

11:40 AM

DEVELOPER

TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters

Andy Feng (Yahoo)
Lee Yang (Yahoo)

(11:40 AM–12:10 PM)

MACHINE LEARNING

Needle in the Haystack—User Behavior Anomaly Detection for Information Security

Ping Yan(Salesforce.com)
Wei Deng(Salesforce)

(11:40 AM–12:10 PM)

SPARK ECOSYSTEM

Apache Kylin: Speed Up Cubing with Apache Spark

Luke Han(Kyligence, Inc.)
Shaofeng Shi(Kylingence Inc)

(11:40 AM–12:10 PM)

SPARK EXPERIENCE AND USE CASES

Incremental Processing on Large Analytical Datasets

Prasanna Rajaperumal (Uber)
Vinoth Chandar(Uber)

(11:40 AM–12:10 PM)

ENTERPRISE

Using SparkML to Power a DSaaS (Data Science as a Service)

Kiran Muglurmath(Comcast)
Sridhar Alla(Comcast)

(11:40 AM–12:10 PM)

STREAMING

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling

Jim Dowling (KTH Royal Institute of Technology)

(11:40 AM–12:10 PM)

RESEARCH

Lazy Join Optimizations Without Upfront Statistics

Matteo Interlandi(UCLA)

(11:40 AM–12:10 PM)

SPONSORED SESSIONS

TBA

(11:40 AM–12:10 PM)

TECHNICAL DEEP DIVES

Data Science Deep Dive: Spark ML with High Dimensional Labels (continues)

Michael Zargham(Cadent)
Stefan Panayotov(Cadent)

(11:40 AM–12:10 PM)

12:20 PM

DEVELOPER

Hive Bucketing in Apache Spark

Tejas Patil(Facebook)

(12:20 PM–12:50 PM)

MACHINE LEARNING

Random Walks on Large Scale Graphs with Apache Spark

Min Shen(LinkedIn)

(12:20 PM–12:50 PM)

SPARK ECOSYSTEM

Building a Unified Data Pipeline with Apache Spark and XGBoost

Nan Zhu(Microsoft)

(12:20 PM–12:50 PM)

SPARK EXPERIENCE AND USE CASES

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x

Richard Garris(Databricks)

(12:20 PM–12:50 PM)

ENTERPRISE

How Apache Spark and AI Powers UberEats

Chen Jin (Uber)
Xian Xing Zhang(Uber Technologies)

(12:20 PM–12:50 PM)

STREAMING

The Top Five Mistakes Made When Writing Streaming Applications

Mark Grover(Cloudera)
Ted Malaska(Blizzard, Inc.)

(12:20 PM–12:50 PM)

RESEARCH

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash

Patrick Stuedi(IBM)

(12:20 PM–12:50 PM)

SPONSORED SESSIONS

TBA

(12:20 PM–12:50 PM)

TECHNICAL DEEP DIVES

Ray: A Cluster Computing Engine for Reinforcement Learning Applications

Philipp Moritz ()
Robert Nishihara ()

(12:20 PM–12:50 PM)

12:50 PM

Lunch

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

2:00 PM

DEVELOPER

Apache Spark MLlib's Past Trajectory and New Directions

Joseph Bradley(Databricks)

(2:00 PM–2:30 PM)

MACHINE LEARNING

Extending Spark Machine Learning: Adding Your Own Algorithms and Tools

Holden Karau (IBM)
Seth Hendrickson(IBM)

(2:00 PM–2:30 PM)

SPARK ECOSYSTEM

Building Data Product Based on Apache Spark at Airbnb

Jingwei Lu (Airbnb)
Liyin Tang (Airbnb)

(2:00 PM–2:30 PM)

SPARK EXPERIENCE AND USE CASES

Building a Versatile Analytics Pipeline on Top of Apache Spark

Mikhail Chernetsov(Grammarly)

(2:00 PM–2:30 PM)

ENTERPRISE

Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark

John Cavanaugh(HP)

(2:00 PM–2:30 PM)

STREAMING

Real-Time Machine Learning Analytics Using Structured Streaming and Kinesis Firehose

Caryl Yuhas(Databricks)
Myles Baker(Databricks)

(2:00 PM–2:30 PM)

RESEARCH

Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on Spark and MPI Using Three Case Studies

Michael Mahoney(UC Berkeley)

(2:00 PM–2:30 PM)

SPONSORED SESSIONS

TBA

(2:00 PM–2:30 PM)

TECHNICAL DEEP DIVES

Cost-Based Optimizer in Apache Spark 2.2

Ron Hu (Huawei Technologies)
Sameer Agarwal(Databricks)

(2:00 PM–2:30 PM)

2:40 PM

DEVELOPER

Informational Referential Integrity Constraints Support in Apache Spark

Ioana Delaney(IBM)
Suresh Thalamati(IBM)

(2:40 PM–3:10 PM)

MACHINE LEARNING

Fuzzy Matching on Apache Spark

Jennifer Shin (8 Path Solutions)

(2:40 PM–3:10 PM)

SPARK ECOSYSTEM

Extending the R API for Spark with sparklyr and Microsoft R Server

Ali Zaidi (Microsoft)

(2:40 PM–3:10 PM)

SPARK EXPERIENCE AND USE CASES

Best Practices for Using Alluxio with Apache Spark

Cheng Chang(Alluxio)
Haoyuan Li (Alluxio)

(2:40 PM–3:10 PM)

ENTERPRISE

Scaling Data Science Capabilities with Apache Spark at Stitch Fix

Derek Bennett(Stitch Fix)

(2:40 PM–3:10 PM)

STREAMING

A Practical Approach to Building a Streaming Processing Pipeline for an Online Advertising Platform

Amit Ramesh(Yelp)
Yifan Wang (Yelp)

(2:40 PM–3:10 PM)

RESEARCH

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy

Costin Iancu(Lawrence Berkeley National Laboratory)
Nicholas Chaimov(University of Oregon)

(2:40 PM–3:10 PM)

SPONSORED SESSIONS

TBA

2:40 PM (2:40 PM–2:55 PM)

SPONSORED SESSIONS

TBA

2:55 PM (2:55 PM–3:10 PM)

TECHNICAL DEEP DIVES

Cost-Based Optimizer in Apache Spark 2.2 (continues)

Wenchen Fan(Databricks)
Zhenhua Wang(Huawei Technologies)

(2:40 PM–3:10 PM)

3:20 PM

DEVELOPER

Tricks of the Trade to be an Apache Spark Rock Star

Ted Malaska(Blizzard, Inc.)

(3:20 PM–3:50 PM)

MACHINE LEARNING

Assigning Responsibility for Deteriorations in Video Quality

Henry Milner(Conviva)
Oleg Vasilyev(Conviva)

(3:20 PM–3:50 PM)

SPARK ECOSYSTEM

Apache Spark on Kubernetes

Anirudh Ramanathan(Google)
Tim Chen(Hyperpilot)

(3:20 PM–3:50 PM)

SPARK EXPERIENCE AND USE CASES

Experiences Migrating Hive Workload to SparkSQL

Jie Xiong(Facebook)
Zhan Zhang(Facebook)

(3:20 PM–3:50 PM)

ENTERPRISE

Transforming B2B Sales with Spark-Powered Sales Intelligence

Songtao Guo(LinkedIn)
Wei Di (LinkedIn)

(3:20 PM–3:50 PM)

STREAMING

An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining with Spark Streaming

J White Bear (IBM)

(3:20 PM–3:50 PM)

RESEARCH

Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire!

Tiark Rompf(Purdue University)

(3:20 PM–3:50 PM)

SPONSORED SESSIONS

TBA

3:20 PM (3:20 PM–3:35 PM)

SPONSORED SESSIONS

TBA

3:35 PM (3:35 PM–3:50 PM)

TECHNICAL DEEP DIVES

TBA

(3:20 PM–3:50 PM)

3:50 PM

Break

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

4:20 PM

DEVELOPER

Improving Python and Spark Performance and Interoperability with Apache Arrow

Julien Le Dem(Dremio)
Li Jing (Two Sigma Investments, LP)

(4:20 PM–4:50 PM)

MACHINE LEARNING

Multi-Label Graph Analysis and Computations Using GraphX

Qiang Zhu(LinkedIn)
Qingbo Hu(LinkedIn)

(4:20 PM–4:50 PM)

SPARK ECOSYSTEM

More Algorithms and Tools for Genomic Analysis on Apache Spark

Ryan Williams(Mount Sinai School of Medicine)

(4:20 PM–4:50 PM)

SPARK EXPERIENCE AND USE CASES

Lessons Learned from Managing Thousands of Production Apache Spark Clusters Daily

Henry Davidge(Databricks)
Josh Rosen(Databricks)

(4:20 PM–4:50 PM)

ENTERPRISE

GoDaddy Customer Success Dashboard Using Apache Spark

Baburao Kamble(GoDaddy)

(4:20 PM–4:50 PM)

STREAMING

Dynamic DDL: Adding Structure to Streaming Data on the Fly

David Winters(GoPro)
Hao Zou (GoPro)

(4:20 PM–4:50 PM)

RESEARCH

Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren

Eric Jonas (UC Berkeley)
Shivaram Venkataraman (UC Berkeley)

(4:20 PM–4:50 PM)

SPONSORED SESSIONS

TBA

4:20 PM (4:20 PM–4:35 PM)

SPONSORED SESSIONS

TBA

4:35 PM (4:35 PM–4:50 PM)

TECHNICAL DEEP DIVES

Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark

Michael Armbrust(Databricks)
Tathagata Das(Databricks)

(4:20 PM–4:50 PM)

5:00 PM

DEVELOPER

Building Robust ETL Pipelines with Apache Spark

Xiao Li (Databricks)

(5:00 PM–5:30 PM)

MACHINE LEARNING

Visualization of Enhanced Spark Induced Naive Bayes Classifier

Barry Becker (ESI Group)

(5:00 PM–5:30 PM)

SPARK ECOSYSTEM

Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spark SQL

Bikas Saha(Hortonworks)
Weiqing Yang(Hortonworks)

(5:00 PM–5:30 PM)

SPARK EXPERIENCE AND USE CASES

From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets

Avi Aminov(Akamai Technologies)

(5:00 PM–5:30 PM)

ENTERPRISE

Applying Machine Learning to Construction

Charis Kaskiris(Autodesk)
Shubham Goel(Autodesk)

(5:00 PM–5:30 PM)

STREAMING

Building Continuous Application with Structured Streaming and Real-Time Data Source

Arijit Tarafdar(Microsoft)
Nan Zhu(Microsoft)

(5:00 PM–5:30 PM)

RESEARCH

Speeding Up Spark with Data Compression on Xeon+FPGA

David Ojika(University of Florida)

(5:00 PM–5:30 PM)

SPONSORED SESSIONS

TBA

5:00 PM (5:00 PM–5:15 PM)

SPONSORED SESSIONS

TBA

5:15 PM (5:15 PM–5:30 PM)

TECHNICAL DEEP DIVES

Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark (continues)

Michael Armbrust(Databricks)
Tathagata Das(Databricks)

(5:00 PM–5:30 PM)

5:40 PM

DEVELOPER

Behavior-Driven Development (BDD) Testing with Apache Spark

Aaron Colcord (FIS Global)
Zachary Nanfelt(FIS)

(5:40 PM–6:10 PM)

MACHINE LEARNING

The Key to Machine Learning is Prepping the Right Data

Jean Georges Perrin (Zaloni)

(5:40 PM–6:10 PM)

SPARK ECOSYSTEM

Building a Large Scale Recommendation Engine with Spark and Redis-ML

Dvir Volk (Redis Labs)
Shay Nativ (Redis Labs)

(5:40 PM–6:10 PM)

SPARK EXPERIENCE AND USE CASES

Apache Spark and Citizen Science: Using eBird Data to Predict Bird Abundance at Scale

Tom Auer (Cornell University)

(5:40 PM–6:10 PM)

ENTERPRISE

Rental Cars and Industrialized Learning to Rank

Sean Downes(Expedia)

(5:40 PM–6:10 PM)

STREAMING

Scalable Monitoring Using Apache Spark and Friends

Utkarsh Bhatnagar(Tinder)

(5:40 PM–6:10 PM)

RESEARCH

Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform

Zhankun Tang(Intel)
Zhongyue Nah(Intel)

(5:40 PM–6:10 PM)

SPONSORED SESSIONS

TBA

(5:40 PM–6:10 PM)

TECHNICAL DEEP DIVES

TBA

(5:40 PM–6:10 PM)

6:10 PM

Attendee Reception

Have fun mingling with other attendees over hors d’oeuvres and cocktails as you tour the Spark Summit Expo Hall.

DAY 3 • WEDNESDAY, JUNE 7 • ENTERPRISE DAY

8:00 AM

Registration

9:00 AM

Databricks Keynote

Ali Ghodsi (Databricks)
Greg Owen (Databricks)
Michael Armbrust (Databricks)

9:40 AM

Keynote-TBA

9:55 AM

Keynote by Hotels.com

Matt Fryer (Hotels.com)

10:10 AM

Cutting Edge Predictive Analytics

Eric Siegel (Predictive Analytics World)

Apache Spark empowers predictive analytics and machine learning by increasing the reach and potential.

But, before jumping to new deployments, it’s critical we 1) get the analytics right and 2) not overlook

less conspicuous business… Read more

10:30 AM

Break

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

11:00 AM

DEVELOPER

Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop

Carl Steinbach(LinkedIn)
Simon King(Pepperdata)

(11:00 AM–11:30 AM)

MACHINE LEARNING

Embracing a Taxonomy of Types to Simplify Machine Learning

Leah McGuire(Salesforce.com)

(11:00 AM–11:30 AM)

SPARK ECOSYSTEM

HDFS on Kubernetes—Lessons Learned

Kimoon Kim(Pepperdata)

(11:00 AM–11:30 AM)

SPARK EXPERIENCE AND USE CASES

Spinach: Providing Ad-Hoc Query Support on Top of Spark SQL

Daoyuan Wang(Intel)
Yuanjian Li (Baidu)

(11:00 AM–11:30 AM)

ENTERPRISE

Archiving, E-Discovery, and Supervision with Spark and Hadoop

Jordan Volz(Cloudera)

(11:00 AM–11:30 AM)

DATA SCIENCE

Yelp Ad Targeting at Scale with Apache Spark

Inaz Alaei-Novin(Yelp)
Joe Malicki (Yelp)

(11:00 AM–11:30 AM)

RESEARCH

Debugging Big Data Analytics in Apache Spark with BigDebug

Matteo Interlandi(UCLA)
Muhammad Ali Gulzar (UCLA)

(11:00 AM–11:30 AM)

SPONSORED SESSIONS

TBA

(11:00 AM–11:30 AM)

TECHNICAL DEEP DIVES

Deep Dive Into Apache Spark Multi-User Performance

Mikhail Genkin(IBM)
Peter Lankford(STAC)

(11:00 AM–11:30 AM)

11:40 AM

DEVELOPER

Productive Use of the Apache Spark Prompt

Sam Penrose(Mozilla)

(11:40 AM–12:10 PM)

MACHINE LEARNING

Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Regulatory Landscapes Using Deep Learning Frameworks

Yi-Hsiang Hsu ()
Yongsheng Huang(Databricks)

(11:40 AM–12:10 PM)

SPARK ECOSYSTEM

Homologous Apache Spark Clusters Using Nomad

Alex Dadgar(Hashicorp)

(11:40 AM–12:10 PM)

SPARK EXPERIENCE AND USE CASES

Social Media, Spark, Machine Learning, and Data Visualization to Find Patterns and Insight

Erik Schlegel(Microsoft)

(11:40 AM–12:10 PM)

ENTERPRISE

Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark

Bernhard Schlegel(BMW)

(11:40 AM–12:10 PM)

DATA SCIENCE

Data Wrangling with PySpark for Data Scientists Who Know Pandas

Andrew Ray(Silicon Valley Data Science)

(11:40 AM–12:10 PM)

RESEARCH

Building Genomic Data Processing and Machine Learning Workflows Using Apache Spark

Anupama Joshi(Epinomics)
Matt Negulescu(Epinomics)

(11:40 AM–12:10 PM)

SPONSORED SESSIONS

TBA

(11:40 AM–12:10 PM)

TECHNICAL DEEP DIVES

Deep Dive Into Apache Spark Multi-User Performance (continues)

(11:40 AM–12:10 PM)

12:20 PM

DEVELOPER

Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust

David Taieb (IBM)

(12:20 PM–12:50 PM)

MACHINE LEARNING

Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned

Yanbo Liang(Hortonworks)

(12:20 PM–12:50 PM)

SPARK ECOSYSTEM

Interoperating a Zoo of Data Processing Platforms Using Rheem

Sebastian Kruse(PhD Student)
Yasser Idris (Qatar Computing Research Institute)

(12:20 PM–12:50 PM)

SPARK EXPERIENCE AND USE CASES

Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for Forensics, Fraud, and Finance

Bryan Cheng(BlockCypher)
Karen Hsu(BlockCypher)

(12:20 PM–12:50 PM)

ENTERPRISE

Big Data at Audi: Root Cause Analysis in an Automotive Paint Shop Using MLlib

Christian Raimann(Audi Business Innovation GmbH)
Christoph Kreibich(Audi)

(12:20 PM–12:50 PM)

DATA SCIENCE

Smart Scalable Feature Reduction With Random Forests

Erik Erlandson (Red Hat)

(12:20 PM–12:50 PM)

RESEARCH

Neuro-Symbolic AI for Sentiment Analysis

Michael Malak(Oracle)

(12:20 PM–12:50 PM)

SPONSORED SESSIONS

Women in Big Data Lunch

(12:20 PM–12:50 PM)

TECHNICAL DEEP DIVES

From Pipelines to Refineries: Building Complex Data Applications with Apache Spark

Tim Hunter(Databricks)

(12:20 PM–12:50 PM)

12:50 PM

Lunch

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

2:00 PM

DEVELOPER

Improving Apache Spark with S3

Ryan Blue (Netflix)

(2:00 PM–2:30 PM)

MACHINE LEARNING

Building Competing Models Using Apache Spark DataFrames

Abdulla Al-Qawasmeh (Credit Karma)

(2:00 PM–2:30 PM)

SPARK ECOSYSTEM

Cassandra and SparkSQL: You Don't Need Functional Programming for Fun

Russell Spitzer(DataStax)

(2:00 PM–2:30 PM)

SPARK EXPERIENCE AND USE CASES

Tuning Apache Spark for Large-Scale Workloads

Gaoxiang Liu(Facebook)
Sital Kedia(Facebook)

(2:00 PM–2:30 PM)

ENTERPRISE

From Data to Actions and Insights at Conviva

Rui Zhang(Conviva)
Yan Li (Conviva)

(2:00 PM–2:30 PM)

DATA SCIENCE

Fully-Reproducible ML Deployment with Spark, Pachyderm, and MLeap

Daniel Whitenack(Pachyderm)
Hollin Wilkins(Combust, Inc.)

(2:00 PM–2:30 PM)

DATA SCIENCE

Natural Language Processing with CNTK and Apache Spark

Ali Zaidi (Microsoft)

(2:00 PM–2:30 PM)

SPONSORED SESSIONS

TBA

(2:00 PM–2:30 PM)

TECHNICAL DEEP DIVES

Sparklyr: Recap, Updates, and Use Cases

Javier Luraschi(RStudio)

(2:00 PM–2:30 PM)

2:40 PM

DEVELOPER

Demystifying DataFrame and Dataset

Dr. Kazuaki Ishizaki(IBM)

(2:40 PM–3:10 PM)

MACHINE LEARNING

Real-Time Image Recognition with Apache Spark

Nikita Shamgunov(MemSQL)

(2:40 PM–3:10 PM)

SPARK ECOSYSTEM

Applying SparkSQL to Big Spatio-Temporal Data Using GeoMesa

Anthony Fox (CCRi)

(2:40 PM–3:10 PM)

SPARK EXPERIENCE AND USE CASES

Performance Optimization of Recommendation Training Pipeline at Netflix

Hua Jiang (Netflix)

(2:40 PM–3:10 PM)

ENTERPRISE

Changing the Way Viacom Looks at Video Performance

Mark Cohen(Viacom)
Michael Rosencrantz(Viacom, Inc.)

(2:40 PM–3:10 PM)

DATA SCIENCE

Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark

Yanwei Zhang(Uber)

(2:40 PM–3:10 PM)

DATA SCIENCE

ADMM-Based Scalable Machine Learning on Apache Spark

Mohak Shah(Robert Bosch LLC)
Sauptik Dhar(Robert Bosch LLC)

(2:40 PM–3:10 PM)

SPONSORED SESSIONS

TBA

(2:40 PM–3:10 PM)

TECHNICAL DEEP DIVES

Sparklyr: Recap, Updates, and Use Cases (continues)

(2:40 PM–3:10 PM)

3:20 PM

DEVELOPER

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

Denis Magda(GridGain)

(3:20 PM–3:50 PM)

MACHINE LEARNING

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark

Masato Asahara(NEC)
Ryohei Fujimaki(NEC)

(3:20 PM–3:50 PM)

SPARK ECOSYSTEM

Just-in-Time Analytics and the Need for Autonomous Database Administration

Kristian Alexander(Algebraix Data Corp.)
Wes Holler(Algebraix)

(3:20 PM–3:50 PM)

SPARK EXPERIENCE AND USE CASES

Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Codeless Modeling

Zhengyi Le (Suning R&D)

(3:20 PM–3:50 PM)

ENTERPRISE

Leveraging Apache Spark to Disrupt Airline Pricing Distribution

Anton Diego(EveryMundo)
Daniel Santana(EveryMundo)

(3:20 PM–3:50 PM)

DATA SCIENCE

Write Graph Algorithms Like a Boss

Andrew Ray(Silicon Valley Data Science)

(3:20 PM–3:50 PM)

DATA SCIENCE

A Predictive Analytics Workflow on DICOM Images using Apache Spark

Anahita Bhiwandiwalla(Intel)
Karthik Vadla (Intel)

(3:20 PM–3:50 PM)

SPONSORED SESSIONS

TBA

(3:20 PM–3:50 PM)

TECHNICAL DEEP DIVES

TBA

(3:20 PM–3:50 PM)

3:50 PM

Break

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

4:20 PM

DEVELOPER

A Developer’s View into Spark's Memory Model

Wenchen Fan(Databricks)

(4:20 PM–4:50 PM)

MACHINE LEARNING

Deep Learning in Security—Are We Ready?

Dr. Jisheng Wang(Niara)

(4:20 PM–4:50 PM)

SPARK ECOSYSTEM

Getting Ready to Use Redis with Apache Spark

Tague Griffith(Redis Labs)

(4:20 PM–4:50 PM)

SPARK EXPERIENCE AND USE CASES

Why You Should Care about Data Layout in the Filesystem

Cheng Lian(Databricks)
Vida Ha(Databricks)

(4:20 PM–4:50 PM)

ENTERPRISE

Leveraging Spark in Ecommerce Platform to Democratize Data

Shafaq Abdullah(Honest Company)

(4:20 PM–4:50 PM)

DATA SCIENCE

Using AI for Providing Insights and Recommendations on Activity Data

Alexis Roos(Salesforce)
Sammy Nammari(Salesforce)

(4:20 PM–4:50 PM)

DATA SCIENCE

Apache SparkR Under the Hood: How to Debug your SparkR Applications

Hossein Falaki(Databricks)

(4:20 PM–4:50 PM)

SPONSORED SESSIONS

TBA

(4:20 PM–4:50 PM)

TECHNICAL DEEP DIVES

Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more

Cihan Biyikoglu(Redis Labs)

(4:20 PM–4:50 PM)

5:00 PM

DEVELOPER

Continuous Application with FAIR Scheduler

Robert Xue(Groupon)

(5:00 PM–5:30 PM)

MACHINE LEARNING

Deep Learning to Big Data Analytics on Apache Spark Using BigDL

Xianyan Jia (Intel)
Yuhao Yang (Intel)

(5:00 PM–5:30 PM)

SPARK ECOSYSTEM

From R Script to Production Using rsparkling

Navdeep Gill(H2O.ai)

(5:00 PM–5:30 PM)

SPARK EXPERIENCE AND USE CASES

RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Environment

Adrian Petrescu(Rubikloud)

(5:00 PM–5:30 PM)

ENTERPRISE

Stream All Things—Patterns of Modern Data Integration

Gwen Shapira(Confluent)

(5:00 PM–5:30 PM)

DATA SCIENCE

NLP with MLlib: Global Empire-Building for Fun and Profit

Michelle Casbon(Qordoba)

(5:00 PM–5:30 PM)

DATA SCIENCE

Building Smart IoT Applications Using Spark

Rafael Schultze-Kraft (WATTx)

(5:00 PM–5:30 PM)

SPONSORED SESSIONS

TBA

(5:00 PM–5:30 PM)

TECHNICAL DEEP DIVES

Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more (continues)

(5:00 PM–5:30 PM)

5:40 PM

DEVELOPER

SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring

Yiannis Gkoufas(IBM)

(5:40 PM–6:10 PM)

MACHINE LEARNING

Deep Learning with Apache Spark and GPUs

Pierce Spitler(Bitfusion)
Tim Gasper(Bitfusion)

(5:40 PM–6:10 PM)

SPARK ECOSYSTEM

Distributed End-to-End Drug Similarity Analytics and Visualization Workflow

Anahita Bhiwandiwalla(Intel)
Dina Suehiro (Intel Corporation)

(5:40 PM–6:10 PM)

SPARK EXPERIENCE AND USE CASES

The Smart Data Warehouse: Goal-Based Data Production

Sim Simeonov(Swoop)

(5:40 PM–6:10 PM)

ENTERPRISE

TBA

(5:40 PM–6:10 PM)

DATA SCIENCE

Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While Looking for Signs of Extra-Terrestrial Life

Gil Vernik (IBM Corporation)
Graham Mackintosh (IBM)

(5:40 PM–6:10 PM)

DATA SCIENCE

Semantic Search: Fast Results from Large, Non-Native Language Corpora

Rob Lantz (Novetta)

(5:40 PM–6:10 PM)

SPONSORED SESSIONS

TBA

(5:40 PM–6:10 PM)

TECHNICAL DEEP DIVES

TBA

(5:40 PM–6:10 PM)

8:00 PM

JOIN Party

Come close out the 10th edition of Spark Summit at the JOIN attendee party. This rockin’ celebration includes drinks, games,

DJs, dancing and a few fun surprises. In the coming weeks, we will announce even… Read more

2 0