Apache Spark
Disclaimer : This event will be presented in English
What is Distributed Computing, Why Apache Spark (45′) – Xavier Tordoir
In this talk, Xavier will first introduce the different concepts and mechanisms of Distributed Computing.
This introduction we help us to understand at which levels this is going to be more and more required in the coming years even without Big Data (whatever it means).
However, this comes with challenges: mental shift, programming model, execution model, resources management and so on.
This is why the second part of the talk will focus on Apache Spark that brings a plenty of solutions for many of those challenges.
To do so, the Spark Notebook will be used to cover Apache Spark supported with examples, it will also demonstrate why interactive programming is a must have.
What is a Distributed Data Science Pipeline, How with Apache Spark and Friends (45′) – Andy Petrella
So far so good, you have a model! Now what?
In this talk, we’ll cover the different steps in your production environment to run your model on your fast or cold data.
For this, Apache Spark is clearly an enabler and thanks to its ecosystem there is a hope for a better consistency and productivity.
Hence, along the dissertation, Andy will elaborate an architecture that matches the needs of your team, customers and infrastructure.