BCS SPA2015

Software Practice Advancement Conference

SPA Conference session: Let's Meet Apache Spark

One-line description:Memory Processors... coz Hadoop's too slow
 
Session format: Tutorial (150 minutes) [read about the different session types]
 
Abstract:Apache Spark is one of the most interesting technologies in data processing in recent years. It can consume data from numerous sources and can do a great many things with it. At its core, it's a distributed memory processor - you have a cluster of machines and put their memory and cores to work on jobs. Sounds similar to Hadoop... only the constant reading and writing to HDFS is no more. Data is processed in stream, which speeds up jobs considerably. Spark can work with data in files, in HDFS, in Cassandra, in SQL Server, from web services, receive messages from queues...

Spark also brings about some sub projects that work very well with Spark. Spark SQL allows Hive queries, MLlib is a good machine learning library, GraphX allows distributed, fast graph processing, and Spark Streaming provides fault tolerant microbatch streaming capabilities. All of this is brought together in one place - all in one tool.

In this session, we will look at Spark, and see what makes it tick. We will look at the various components, and go through some hands on exercises. While Spark supports Scala, Java, and Python, we will be using Scala. For the exercises, participants will need tooling set up so that they can build and run sbt projects. A Spark server won't be necessary.
 
Audience background:Interest in distributed processing. Some pain with Hadoop may act as a motivator.
 
Benefits of participating:Learning about Spark.
 
Materials provided:Slides, exercise files, code repository with commits for steps.
 
Process:Discussions and exercises.
 
Detailed timetable:00 - 20: Spark: What, why.
30 - 50: Exercise: Creating spark jobs.
60 - 80: Spark Streaming
90 - 110: MLLib
120 - 135: GraphX + Spark SQL
140 - 150: QA and Wrap Up
 
Outputs:Slides, Exercise files, scripts to spin up servers (not used in tutorial).
 
History:Delivered to a few clients.
 
Presenters
1. Ashic Mahtab
Heartysoft Solutions Limited
2. 3.