noisy Y

YData: An Introduction to Data Science

YData aims to enhance students’ knowledge and capabilities in the fundamental ideas and skills of data science. Based on Berkeley’s popular Data 8 course, YData is an introduction to data science that emphasizes computational and programming skills along with inferential thinking.

The course is designed to be accessible to students with little or no background in computing, programming, or statistics. At the same time, it is meant to be engaging for more technically oriented students, through the extensive use of examples and hands-on data analysis.

Materials

Much of the content for the course is derived from Data 8, which has made assignments, the textbook and other materials available online under a Creative Commons license. The textbook is Computational and Inferential Thinking: The Foundations of Data Science, a free online resource that includes interactive Jupyter notebooks and public data sets for all examples. It’s maintained as an open source project.

Computing

The course is based on the Python programming language and a special-purpose cloud computing platform for students to edit and execute their code in Jupyter notebooks. This “levels the playing field” with respect to prior computing and programming experience, and makes it easy for students to get started. The computing platform for YData is being developed by Ben Evans at the Yale Center for Research Computing (YCRC).

(For the code cognoscenti, the platform is based on a Kubernetes-based deployment of JupyterHub, and notebooks use a Python 3 installation with the standard modules from an Anaconda installation such as Numpy and Matplotlib, as well as the Berkeley datascience module.)

Spring 2022

The calendar for the course, with links to the programming materials, is at ydata123.org/sp22