In the world of data science where changing anything changes everything, it’s becoming increasingly difficult for data scientists to communicate results clearly, and with confidence. In this session, we’ll demonstrate the importance of data provenance & data lineage when it comes to applied machine learning, and show how data scientists can put “the science” back in data science using open source software. We’ll also walk through building two versions of the same RNN (tensorflow on kubernetes), one where data provenance was considered from the start and one without. We’ll then compare results, and discuss the projects ability be deployed in production, reproduce results, and scale with the organization.