python-programming/python-data-science at master · nbonacchi/python-programming · GitHub
Skip to content

Latest commit

 

History

History

Folders and files

README.md

Python for Data Science

This is an introductory course for using Python for Data Science applications. In the recent years Python has become popular within the data science community mainly due to its ease of use, open-source nature and the fact that it is completely free. This couse aims to introduce newcomers to the most popular packages used today - numpy, pandas and matplotlib. Note that it assumes basic knowledge of python (i.e. lists, dicts, indexing).

However, the course content is entirely self-contained and can be studied without the taught course. You can do it in your own time and it shouldn't take more than 6 hours to go throught all of the material. However, this course is by no means a complete guide to using Python for data science applications. It serves the purpose of an introduction into the world of data analysis and make you comfortable with looking at seemingly random numbers and trying to extract meeting from them.

The course was originally based on jupyter-notebook and has been updated to use Jupyter Lab. All the notebooks will be run and edited using the environment described in the root folder.

Getting started

For an overview of how Jupyter notebooks work, you can check out the short notebook python-data-jupyter-readme.ipynb.

Structure

The course is split into 4 core notebooks followed by a couple of extra notebooks which allow you to apply everything you have learned to practical tasks in various fields.

Notebook 0

A recap of the assumed Python knowledge for the course. Skip this if you're a pro in Python already.

Notebook 1

Python warm-up with some text analysis exercises.

Notebook 2

Introduction to vectorised computing and dealing with large data with numpy.

Notebook 3

Introduction to plotting in Python with matplotlib.

Notebook 4

Introduction to pandas and dealing with tabular data.

Extra notebooks

At the end of the course, there are notebooks starting with extra which over a wide range of data science topics.

  • python-data-extra-machine-learning, introducing machine learning with scikit-learn.
  • python-data-extra-networks, introducing network analysis with NetworkX.
  • python-data-extra-regex, introducing regular expressions, a powerful tool for working with text data.
  • python-data-extra-text-analysis, introducing tools for analysing text and performing sentiment analysis (see also python-data-extra-regex)
  • python-data-extra-scipy, introducing SciPy and exploring its signal processing module.

TODO: python-data-extra-plotting-seaborn, introducing seaborn package for plotting et al.