Workshop - Part One
Solving a data science problem is about more than making a model. It entails data cleaning, exploration, modeling and tuning, production deployment, and workflows governing each of these steps. In this simple example, we'll take a look at how health data can be used to predict life expectancy. It will start with data engineering in Apache Spark, data exploration, model tuning and logging with hyperopt and MLflow. It will continue with examples of how the model registry governs model promotion, and simple deployment to production with MLflow as a job or dashboard.
Download Notebook 1 - Predicting Life Expectancy (mlflow and Databricks demo)
Workshop - Part Two
We hear about "model bias," but really models are just mirrors to the data they trained on. Can we use them to detect instances of bias in data, and not just make predictions? this talk will examine the results of the 2019 StackOverflow Developer Survey, and apply Apache Spark and SHAP (Shapley Additive Explanations) to study whether attributes like gender have outsized effects on developer salaries in certain instances.
Download Notebook 2 - StackOverflow Developer Survey (SHAP demo)
