Before you can start your adventures in data science and statistics, you will need to download and install the right software packages. Luckily, we are here to make the process as smooth and easy as possible.
Below, we explain how to install software packages, including the R Project for Statistical Computing (aka R), Python, SPSS and Stata. For anyone looking to develop skills in making data look pretty to a variety of audiences, we also show how to install Tableau, a full-featured platform for data visualization and presentation.
How to install R on your computer
One of the most popular data science platforms is R (aka the R Project for Statistical Computing), which is a full-featured suite of statistical analysis, data management and graphical visualization tools. The base R software itself is all command line based and is known to be difficult to learn for some newcomers. Luckily, the amazing team at RStudio created a graphical user interface (aka GUI) that runs base R in the background and makes everything a lot easier. Here are the steps to install RStudio and base R at the same time:
- First, go to the RStudio website to the page that includes download links.
- Next, download the correct installer file for your operating system (e.g., Windows if you are running Windows 10).
- Because RStudio is only an Integrated Development Environment (IDE) that was built on top of the base R software, you also need to download the installer for base R from this website for Windows (https://cran.r-project.org/bin/windows/base/) and MacOS (https://cran.r-project.org/bin/macosx/).
- After you have done that, double click on the Installer package for base R and follow the prompts to download it on your computer.
- Once you have the base R software installed, you are all set to install the RStudio software by double clicking on the RStudio installer package and following the prompts.
- Any time you want to run R, just click on the RStudio software and it will run R in the background with all the super useful features of RStudio, including an automatically highlighting syntax code editor for R scripts, a console for running individual R commands, a data viewer for examining rows and columns of data frames, as well as an environment viewer that shows all of the objects in R’s memory for a particular session. Always make sure to check for updates to RStudio by clicking on “Help” and then “Check for updates…” on the top menu bar of RStudio. Base R is more a little bit more difficult to update and we provide some extra tips below on how to update it.
- That’s it! You should now be ready to follow all of our R resources on our website and should start with our how-to guide to Statistics in R!
Bonus tip for R: A quick and convenient way to automatically update base R from within RStudio is to use the “InstallR” package. Try running the code below in RStudio to update base R on your computer.
# Code to install the "installR" package # and run the package to update base R # to the newest version install.packages("installR") library(installR) updateR() # Follow the prompts and click through to # make sure that the newest version of # base R is installed on your computer # You should also restart RStudio (make # sure to save your workspace) to make sure # that you are running the newest version of # R by seeing which version shows up in your # installation of RStudio (see screenshot below)
How to install Python on your computer
Python is available on several different websites with installations of varying difficulties. To make the process as easy and straightforward as possible, we suggest following these instructions to download Python as part of the Anaconda distribution. This method installs the Python software program as part of a suite of several Integrated Development Environments (IDEs), most importantly the Spyder IDE. The Spyder IDE is a software program that includes a Python script editor to write, save and run Python syntax code, a console to run individual Python commands, and data and variable viewer windows that show the items currently in memory. The best part is Python, just like R, is free and open source software! If you was to install Python and the four most popular Python packages for data science and statistics in just a few minutes, follow the steps below to get everything installed and running on your computer.
- First, go to the Anaconda website and download the correct Installer package for your operating system.
- Next, double click on the file you downloaded and follow the prompts to install Python 3.7 and all optional packages.
- After you have finished the install, double click on the Anaconda Navigator and then click on the program named Spyder to open the Integrated Development Environment (IDE) for Python 3.7.
- Next, ensure to check for any updates in Spyder by clicking on “Help” from the top menu bar and then clicking on “Check for updates…”.
5. After you have the most up-to-date version of the Spyder IDE software, make sure to install the four key packages for data science in Python, by typing the following:
# First you need to install the four major data science # packages in Spyder, Python's IDE from the Anaconda # Navigator program in Windows or MacOS pip install numpy pip install scipy pip install pandas pip install matplotlib # Next, you need to load each of these packages # to allow you to run their commands on your data set(s) import numpy as np import scipy as sp import pandas as pd import matplotlip as mp
6. There you go! You should be all set to use Python in the Spyder IDE, load the major data science packages and to follow all of the Python statistics tutorials on our website!
How to install Stata on your computer
Similar to the other software packages covered on this page, Stata is available as an Installer package for both Windows and MacOS. Following these steps will get it all installed on your laptop or desktop computer.
1. First, check out this page on the official Stata website to see if there are any free trials currently offered for Stata software programs. If not, you can always request a free trial by filling out this form on the official Stata website and waiting until they email you information about a free trial of state for evaluation purposes.
2. After you have downloaded the correct Installer package for your operating system, please follow the prompts to install Stata on your computer.
3. Launch Stata for the first time and make sure to check for updates in your Stata program by clicking on “Help” and then “Check for updates”.
4. One you have open Stata for the first time and want to run some data analyses, check out our guide to Basic Statistics in Stata.
How to install SPSS on your computer
SPSS is provided in a free trial version as an Installer package for Windows 10 or MacOS operating systems. The following steps will get you all set up with the latest version of IBM’s Statistical Package for the Social Sciences (SPSS) on your desktop or laptop computer.
- First, go to IBM’s website and click on the correct Installer package for your operating system.
- After that, double click on the SPSS Installer package on your computer.
- Third, follow all of the prompts to install SPSS and Python essentials (this extra program lets you run Python commands in SPSS syntax, a great feature!) on your computer.
- Fourth, double click on the SPSS program that was installed and enjoy the trial version! Depending on when you download it, you will be able to use all features for a period of time ranging from 14 to 60 days, depending on the time of year and whether there are any special software bonuses running.
- Now you are all set to run point and click commands, as well as syntax text commands in SPSS, and are ready to try our guide to running Basic Statistics in SPSS!
The bottom line
As we discuss on our page that covers all four statistical software packages and our article on the Top 5 Tips to Succeed in Data Science, it is always a smart idea to develop skills in multiple statistical software packages.
This is especially important when conducting many types of analyses, because one package might have a certain function or capability that you need and is not offered by another package. Potential employers also seek candidates with skills in multiple data science software packages, so demonstrating that you can use more than one package is always a plus for landing a position in data science.
If you enjoyed this tutorial to installing R, Python, SPSS and Stata, don’t forget to take a look at our reviews of the Top 5 Books for Learning Data Science and make sure to follow us on Twitter and Instagram for the latest news, tips and more.