Big Data Madrid: Mesos Session
How to build a secure (TLS and Kerberos) Machine Learning Pipeline with Mesosphere DC/OS, Apache NIFI, Kafka, HDFS, Spark and use Jupyterlab to provide a nice user experience.
Author: Denis Jannot
Denis is a passionate technologist, working on innovative projects leveraging various technologies (Mesos, Kubernetes, Kafka, Spark, …).
Mesosphere DC/OS overview
Challenges when trying to build a secure ML pipeline
Demo of a Twitter sentiment analysis pipeline with the following high level design:
Apache NIFI is used to listen the the Twitter Streaming API
Apache NIFI persist the tweets in HDFS and produce them in a Kafka topic
A Jupyter notebook is used to create a model using Spark and the tweets stored in HDFS
Another Jupyter notebook is used to apply the model to the tweets consumed from Kafka