Apache Spark

23 Nov 2015

17:45-20:00

Campus de Kirchberg 6, rue Richard Coudenhove-Kalergi L-1359 Luxembourg

Xavier Tordoir

Andy Petrella

Disclaimer : This event will be presented in English

What is Distributed Computing, Why Apache Spark (45′) – Xavier Tordoir

In this talk, Xavier will first introduce the different concepts and mechanisms of Distributed Computing.
This introduction we help us to understand at which levels this is going to be more and more required in the coming years even without Big Data (whatever it means).

However, this comes with challenges: mental shift, programming model, execution model, resources management and so on.

This is why the second part of the talk will focus on Apache Spark that brings a plenty of solutions for many of those challenges.
To do so, the Spark Notebook will be used to cover Apache Spark supported with examples, it will also demonstrate why interactive programming is a must have.

What is a Distributed Data Science Pipeline, How with Apache Spark and Friends (45′) – Andy Petrella

So far so good, you have a model! Now what?

In this talk, we’ll cover the different steps in your production environment to run your model on your fast or cold data.
For this, Apache Spark is clearly an enabler and thanks to its ecosystem there is a hope for a better consistency and productivity.

Hence, along the dissertation, Andy will elaborate an architecture that matches the needs of your team, customers and infrastructure.