AI/ML Workshop hosted by Sean Owen of Databricks

Wednesday, October 28, 2020
3:15 PM 4:15 PM 15:15 16:15

Google Calendar ICS

Workshop - Part One

Solving a data science problem is about more than making a model. It entails data cleaning, exploration, modeling and tuning, production deployment, and workflows governing each of these steps. In this simple example, we'll take a look at how health data can be used to predict life expectancy. It will start with data engineering in Apache Spark, data exploration, model tuning and logging with hyperopt and MLflow. It will continue with examples of how the model registry governs model promotion, and simple deployment to production with MLflow as a job or dashboard.

Download Notebook 1 - Predicting Life Expectancy (mlflow and Databricks demo)

Workshop - Part Two

We hear about "model bias," but really models are just mirrors to the data they trained on. Can we use them to detect instances of bias in data, and not just make predictions? this talk will examine the results of the 2019 StackOverflow Developer Survey, and apply Apache Spark and SHAP (Shapley Additive Explanations) to study whether attributes like gender have outsized effects on developer salaries in certain instances.

Download Notebook 2 - StackOverflow Developer Survey (SHAP demo)

Return to the conference page ▶