Statistics in Python

After you have installed Python, you are ready to import your first data set and run some basic statistics. This page provides a brief summary of how to load pandas in the Spyder IDE as part of the Anaconda release of Python, load a data set in, and run basic descriptive statistics. If you need help downloading and installing the Anaconda release of Python and major packages, make sure to take a look at our Installing Statistical Software Programs page.

If you are looking to calculate basic statistics in other statistics packages, such as R, Stata or SPSS, click here for a step-by-step guide for these programs.

# Code to import a Comma Separated Values (.csv) data set 

mydata1 = pd.read_csv("C:\\file_path\\file1.csv", header = None)

# Code to import an Excel (.xlsx) data set into pandas in Spyder

mydata = pd.read_excel("",sheetname="Data 1", skiprows=2) 

# If you would like to add column (i.e., variable) names, use this code

mydata2 = pd.read_csv("C:\\file_path\\file1.csv", header = None, names = ['ID', 'first_name', 'salary'])

# Another really cool feature of pandas is the ease of
# importing .csv files from internet sources

mydata = pd.read_csv("")

Next, you are ready to run some basic statistics, including descriptives to show averages and central tendency measures, correlations to examine potentially significant relationships between variables and visualizations to illustrate these relationships.

# Basic descriptive statistics in Python using pandas

# Ensure that you have loaded each of these packages
# with their typical abbreviation in most data science
# applications

import numpy as np

import scipy as sp

import pandas as pd

import matplotlip as mp

# Summary statistics including frequencies, mean, 
# median, mode, standard deviation and range

# Based on an example data frame named "df"


# Correlations between all variables in data frame


# Correlations for two specific variable columns
# in data frame

df['Variable 1 name'].corr(df['Variable 2 name'])

# Scatterplot

plot1 = df.plot.scatter(x='var1',

There are many additional advanced topics in data science and statistics, and we at Data Science for Anyone are hard at work on creating new expert guides for R, Python, Stata and SPSS. For a really helpful and easy-to-use listing of Python commands for analyzing data and running statistics with the pandas package, check out this amazing guide and search engine for commands in pandas.

Make sure to check back on our site often for updates and we highly recommended the educational resources on data science and statistics hosted by the Institute for Digital Research & Education Statistical Consulting (IDRE) at UCLA!

%d bloggers like this: