It's been over a year since we first introduced introduced the Team Data Science Process (TDSP). The data, technology and practices behind Data Science continue to evolve, and the TDSP has evolved in parallel. Over the past year, several new facets have been added, including:
- The IDEAR (Interactive Data Exploration, Analysis and Reporting) framework, an open source extension to R and Python designed to standardize the process of data exploration and reporting;
- Guidance for use of Spark 2.0, including an end-to-end Spark v2.0 walkthrough;
- Guidance for use of in-database Python with SQL Server, including an end-to-end in-database Python tutorial;
- Instantiation of TDSP projects and templates within the new Azure Machine Workbench.
For an example of applying the TDSP to effective data science projects, check out Buck Woody's 10-part series walking through every stage of a typical data science project.
As the practice of data science changes, the TDSP continues to evolve. The TDSP is an open project hosted on Github, and your contributions are welcome.
Cortana Intelligence and Machine Learning Blog: The Microsoft Team Data Science Process (TDSP) – Recent Updates