Python Data Science Handbook
-
Introduktion
-
IPython: Beyond Normal Python8 Ämnen
-
Introduction to NumPy9 Ämnen
-
Understanding Data Types in Python
-
The Basics of NumPy Arrays
-
Computation on NumPy Arrays: Universal Functions
-
Aggregations: Min, Max, and Everything In Between
-
Computation on Arrays: Broadcasting
-
Comparisons, Masks, and Boolean Logic
-
Fancy Indexing
-
Sorting Arrays
-
Structured Data: NumPy's Structured Arrays
-
Understanding Data Types in Python
-
Data Manipulation with Pandas13 Ämnen
-
Introducing Pandas Objects
-
Data Indexing and Selection
-
Operating on Data in Pandas
-
Handling Missing Data
-
Hierarchical Indexing
-
Combining Datasets: Concat and Append
-
Combining Datasets: Merge and Join
-
Aggregation and Grouping
-
Pivot Tables
-
Vectorized String Operations
-
Working with Time Series
-
High-Performance Pandas: eval() and query()
-
Further Resources
-
Introducing Pandas Objects
-
Visualization with Matplotlib15 Ämnen
-
Simple Line Plots
-
Simple Scatter Plots
-
Visualizing Errors
-
Density and Contour Plots
-
Histograms, Binnings, and Density
-
Customizing Plot Legends
-
Customizing Colorbars
-
Multiple Subplots
-
Text and Annotation
-
Customizing Ticks
-
Customizing Matplotlib: Configurations and Stylesheets
-
Three-Dimensional Plotting in Matplotlib
-
Geographic Data with Basemap
-
Visualization with Seaborn
-
Further Resources
-
Simple Line Plots
-
Machine Learning15 Ämnen
-
What Is Machine Learning?
-
Introducing Scikit-Learn
-
Hyperparameters and Model Validation
-
Feature Engineering
-
In Depth: Naive Bayes Classification
-
In Depth: Linear Regression
-
In-Depth: Support Vector Machines
-
In-Depth: Decision Trees and Random Forests
-
In Depth: Principal Component Analysis
-
In-Depth: Manifold Learning
-
In Depth: k-Means Clustering
-
In Depth: Gaussian Mixture Models
-
In-Depth: Kernel Density Estimation
-
Application: A Face Detection Pipeline
-
Further Machine Learning Resources
-
What Is Machine Learning?
-
Appendix: Figure Code
Further Machine Learning Resources
januari 28, 2021
This chapter has been a quick tour of machine learning in Python, primarily using the tools within the Scikit-Learn library. As long as the chapter is, it is still too short to cover many interesting and important algorithms, approaches, and discussions. Here I want to suggest some resources to learn more about machine learning for those who are interested.
Machine Learning in Python
To learn more about machine learning in Python, I’d suggest some of the following resources:
- The Scikit-Learn website: The Scikit-Learn website has an impressive breadth of documentation and examples covering some of the models discussed here, and much, much more. If you want a brief survey of the most important and often-used machine learning algorithms, this website is a good place to start.
- SciPy, PyCon, and PyData tutorial videos: Scikit-Learn and other machine learning topics are perennial favorites in the tutorial tracks of many Python-focused conference series, in particular the PyCon, SciPy, and PyData conferences. You can find the most recent ones via a simple web search.
- Introduction to Machine Learning with Python: Written by Andreas C. Mueller and Sarah Guido, this book includes a fuller treatment of the topics in this chapter. If you’re interested in reviewing the fundamentals of Machine Learning and pushing the Scikit-Learn toolkit to its limits, this is a great resource, written by one of the most prolific developers on the Scikit-Learn team.
- Python Machine Learning: Sebastian Raschka’s book focuses less on Scikit-learn itself, and more on the breadth of machine learning tools available in Python. In particular, there is some very useful discussion on how to scale Python-based machine learning approaches to large and complex datasets.
General Machine Learning
Of course, machine learning is much broader than just the Python world. There are many good resources to take your knowledge further, and here I will highlight a few that I have found useful:
- Machine Learning: Taught by Andrew Ng (Coursera), this is a very clearly-taught free online course which covers the basics of machine learning from an algorithmic perspective. It assumes undergraduate-level understanding of mathematics and programming, and steps through detailed considerations of some of the most important machine learning algorithms. Homework assignments, which are algorithmically graded, have you actually implement some of these models yourself.
- Pattern Recognition and Machine Learning: Written by Christopher Bishop, this classic technical text covers the concepts of machine learning discussed in this chapter in detail. If you plan to go further in this subject, you should have this book on your shelf.
- Machine Learning: a Probabilistic Perspective: Written by Kevin Murphy, this is an excellent graduate-level text that explores nearly all important machine learning algorithms from a ground-up, unified probabilistic perspective.
These resources are more technical than the material presented in this book, but to really understand the fundamentals of these methods requires a deep dive into the mathematics behind them. If you’re up for the challenge and ready to bring your data science to the next level, don’t hesitate to dive-in!