Programming for Data Science is an introductory course focused on essential programming concepts, structures, and techniques. Students will develop the skills to write efficient code, as well as to read, understand, and debug it. The course is primarily based on Python and covers its basic and intermediate programming features. Additionally, it introduces some of the fundamental Python libraries for Data Science, namely NumPy and Pandas. Given the importance of R in Data Science, a brief introduction to this language is also included. The course concludes with an introduction to the Command Line and GitHub, two essential tools commonly used by Data Scientists in their daily work.
As most of the code in Python, this book is also organized into a few modules:
- Module 1: Getting Started introduces the course and tools used during this.
- Module 2: Python (Essential) covers the basic Python programming features such as variables, expressions, structures, sequential control and functions.
- Module 3: Python II (Advance) provides an introduction to more advance Python topics such as error handling, working with files, object-oriented programming and packages.
- Module 4: Python for Data Science covers two of the most common and basic Python packages for Data Science, namely, NumPy and Pandas.
- Module 5: R provides a brief introduction to R programming.
- Module 6: Miscellaneous includes some basic tools for a Data Scientist such as Command Line Terminal (CLI) and GitHub.