Gå till index
Python Data Science Handbook
0% färdig
0/67 Steps

Introduktion

IPython: Beyond Normal Python8 Ämnen

Introduction to NumPy9 Ämnen

Understanding Data Types in Python

The Basics of NumPy Arrays

Computation on NumPy Arrays: Universal Functions

Aggregations: Min, Max, and Everything In Between

Computation on Arrays: Broadcasting

Comparisons, Masks, and Boolean Logic

Fancy Indexing

Sorting Arrays

Structured Data: NumPy's Structured Arrays

Understanding Data Types in Python

Data Manipulation with Pandas13 Ämnen

Introducing Pandas Objects

Data Indexing and Selection

Operating on Data in Pandas

Handling Missing Data

Hierarchical Indexing

Combining Datasets: Concat and Append

Combining Datasets: Merge and Join

Aggregation and Grouping

Pivot Tables

Vectorized String Operations

Working with Time Series

HighPerformance Pandas: eval() and query()

Further Resources

Introducing Pandas Objects

Visualization with Matplotlib15 Ämnen

Simple Line Plots

Simple Scatter Plots

Visualizing Errors

Density and Contour Plots

Histograms, Binnings, and Density

Customizing Plot Legends

Customizing Colorbars

Multiple Subplots

Text and Annotation

Customizing Ticks

Customizing Matplotlib: Configurations and Stylesheets

ThreeDimensional Plotting in Matplotlib

Geographic Data with Basemap

Visualization with Seaborn

Further Resources

Simple Line Plots

Machine Learning15 Ämnen

What Is Machine Learning?

Introducing ScikitLearn

Hyperparameters and Model Validation

Feature Engineering

In Depth: Naive Bayes Classification

In Depth: Linear Regression

InDepth: Support Vector Machines

InDepth: Decision Trees and Random Forests

In Depth: Principal Component Analysis

InDepth: Manifold Learning

In Depth: kMeans Clustering

In Depth: Gaussian Mixture Models

InDepth: Kernel Density Estimation

Application: A Face Detection Pipeline

Further Machine Learning Resources

What Is Machine Learning?

Appendix: Figure Code
avsnitt 4, Ämne 13
Pågår
Further Resources
oktober 26, 2020
avsnitt Progress
0% färdig
In this chapter, we’ve covered many of the basics of using Pandas effectively for data analysis. Still, much has been omitted from our discussion. To learn more about Pandas, I recommend the following resources:
 Pandas online documentation: This is the goto source for complete documentation of the package. While the examples in the documentation tend to be small generated datasets, the description of the options is complete and generally very useful for understanding the use of various functions.
 Python for Data Analysis Written by Wes McKinney (the original creator of Pandas), this book contains much more detail on the Pandas package than we had room for in this chapter. In particular, he takes a deep dive into tools for time series, which were his bread and butter as a financial consultant. The book also has many entertaining examples of applying Pandas to gain insight from realworld datasets. Keep in mind, though, that the book is now several years old, and the Pandas package has quite a few new features that this book does not cover (but be on the lookout for a new edition in 2017).
 Stack Overflow: Pandas has so many users that any question you have has likely been asked and answered on Stack Overflow. Using Pandas is a case where some GoogleFu is your best friend. Simply go to your favorite search engine and type in the question, problem, or error you’re coming across–more than likely you’ll find your answer on a Stack Overflow page.
 Pandas on PyVideo: From PyCon to SciPy to PyData, many conferences have featured tutorials from Pandas developers and power users. The PyCon tutorials in particular tend to be given by very wellvetted presenters.
Using these resources, combined with the walkthrough given in this chapter, my hope is that you’ll be poised to use Pandas to tackle any data analysis problem you come across!