Python Data Science Handbook

Introduktion

IPython: Beyond Normal Python8 Ämnen

Introduction to NumPy9 Ämnen

Understanding Data Types in Python

The Basics of NumPy Arrays

Computation on NumPy Arrays: Universal Functions

Aggregations: Min, Max, and Everything In Between

Computation on Arrays: Broadcasting

Comparisons, Masks, and Boolean Logic

Fancy Indexing

Sorting Arrays

Structured Data: NumPy's Structured Arrays

Understanding Data Types in Python

Data Manipulation with Pandas13 Ämnen

Introducing Pandas Objects

Data Indexing and Selection

Operating on Data in Pandas

Handling Missing Data

Hierarchical Indexing

Combining Datasets: Concat and Append

Combining Datasets: Merge and Join

Aggregation and Grouping

Pivot Tables

Vectorized String Operations

Working with Time Series

HighPerformance Pandas: eval() and query()

Further Resources

Introducing Pandas Objects

Visualization with Matplotlib15 Ämnen

Simple Line Plots

Simple Scatter Plots

Visualizing Errors

Density and Contour Plots

Histograms, Binnings, and Density

Customizing Plot Legends

Customizing Colorbars

Multiple Subplots

Text and Annotation

Customizing Ticks

Customizing Matplotlib: Configurations and Stylesheets

ThreeDimensional Plotting in Matplotlib

Geographic Data with Basemap

Visualization with Seaborn

Further Resources

Simple Line Plots

Machine Learning15 Ämnen

What Is Machine Learning?

Introducing ScikitLearn

Hyperparameters and Model Validation

Feature Engineering

In Depth: Naive Bayes Classification

In Depth: Linear Regression

InDepth: Support Vector Machines

InDepth: Decision Trees and Random Forests

In Depth: Principal Component Analysis

InDepth: Manifold Learning

In Depth: kMeans Clustering

In Depth: Gaussian Mixture Models

InDepth: Kernel Density Estimation

Application: A Face Detection Pipeline

Further Machine Learning Resources

What Is Machine Learning?

Appendix: Figure Code
Further Machine Learning Resources
april 18, 2021
This chapter has been a quick tour of machine learning in Python, primarily using the tools within the ScikitLearn library. As long as the chapter is, it is still too short to cover many interesting and important algorithms, approaches, and discussions. Here I want to suggest some resources to learn more about machine learning for those who are interested.
Machine Learning in Python
To learn more about machine learning in Python, I’d suggest some of the following resources:
 The ScikitLearn website: The ScikitLearn website has an impressive breadth of documentation and examples covering some of the models discussed here, and much, much more. If you want a brief survey of the most important and oftenused machine learning algorithms, this website is a good place to start.
 SciPy, PyCon, and PyData tutorial videos: ScikitLearn and other machine learning topics are perennial favorites in the tutorial tracks of many Pythonfocused conference series, in particular the PyCon, SciPy, and PyData conferences. You can find the most recent ones via a simple web search.
 Introduction to Machine Learning with Python: Written by Andreas C. Mueller and Sarah Guido, this book includes a fuller treatment of the topics in this chapter. If you’re interested in reviewing the fundamentals of Machine Learning and pushing the ScikitLearn toolkit to its limits, this is a great resource, written by one of the most prolific developers on the ScikitLearn team.
 Python Machine Learning: Sebastian Raschka’s book focuses less on Scikitlearn itself, and more on the breadth of machine learning tools available in Python. In particular, there is some very useful discussion on how to scale Pythonbased machine learning approaches to large and complex datasets.
General Machine Learning
Of course, machine learning is much broader than just the Python world. There are many good resources to take your knowledge further, and here I will highlight a few that I have found useful:
 Machine Learning: Taught by Andrew Ng (Coursera), this is a very clearlytaught free online course which covers the basics of machine learning from an algorithmic perspective. It assumes undergraduatelevel understanding of mathematics and programming, and steps through detailed considerations of some of the most important machine learning algorithms. Homework assignments, which are algorithmically graded, have you actually implement some of these models yourself.
 Pattern Recognition and Machine Learning: Written by Christopher Bishop, this classic technical text covers the concepts of machine learning discussed in this chapter in detail. If you plan to go further in this subject, you should have this book on your shelf.
 Machine Learning: a Probabilistic Perspective: Written by Kevin Murphy, this is an excellent graduatelevel text that explores nearly all important machine learning algorithms from a groundup, unified probabilistic perspective.
These resources are more technical than the material presented in this book, but to really understand the fundamentals of these methods requires a deep dive into the mathematics behind them. If you’re up for the challenge and ready to bring your data science to the next level, don’t hesitate to divein!