To get started with data science, you need to have some data to import into a software program and work with it. The great news is there are numerous resources on the internet that make data publicly available to anyone who wants to download it.
The best part is the data are free! On this page, we provide a detailed list of datasets from a wide variety of sources that are available to download, use and analyze. Check back often as we keep this page updated and add new datasets!
If you are looking to collect your own data to test a specific hypothesis or multiple hypotheses, make sure to take a look at our detailed reviews of online survey providers, including SurveyMonkey, Google Forms and Qualtrics.
To help make it easier to pick a data set, we have categorized these lists into “Individual-level Data” (data collected from individual people) and “Aggregate-level Data” (data collected for larger areas, such as neighborhoods, cities, states or countries).
Individual-level Data Sets to Download
An excellent resource for individual-level (and some aggregate-level data as well) data is the Inter-University Consortium for Political and Social Research (ICPSR) Data Repository.
This globally-recognized database of secondary data sets is hosted by the University of Michigan and has an extensive collection of data from disciplines ranging from public health to social psychology to political science.
Aggregate-level Data Sets to Download
The United States Census provides numerous datasets with thousands of variables, including demographic, economic, labor market, business, and population indicators. All datasets are free to download in comma separated value (.csv) or Microsoft Excel (.xlsx) and are provided with extensive help documentation.
The U.S. Census provides a wide variety of data tools to examine, organize and prepare data for download, and the CensusData package in Python provides an excellent API tool that can easily download data files directly from the U.S. Census.
For anyone interested in geospatial data sets (check out our QGIS guide here), the U.S. Census also offers shapefiles (.shp) for mapping in ArcGIS and QGIS.
Kaggle: Free datasets for data science applications
Beyond the two enormous data repositories described above, ICPSR and the U.S. Census, there are other excellent places to download a wide variety of datasets online. Data science is such as fascinating field of study because the proliferation of data collection has made huge data sets available to companies at unprecedented scale.
Kaggle is another excellent online resource for obtaining datasets to work with for data science applications, including formats such as .csv, .xls and .json. The site also hosts a community of data science, analysis and visualization enthusiasts, with learning resources, free notebooks with example code and message boards.
They host frequent data science competitions with real money prizes that also offer a great way to become more acquainted with the internet data science community. Kaggle falls somewhere in between when it comes to the data provided, because they offer both de-identified individual-level data and aggregated data for download.
The bottom line on downloading data
One of the most amazing things about our modern society is the remarkable amount of data available and the incredible good that can come out of responsible use of data within ethical guidelines.
The resources provided on this page are excellent starting points for downloading anonymized data on individual people or aggregated data on larger geographic areas.
Whenever you work with data and before downloading it, always make sure you follow the rules and laws of your place of residence, especially regarding data privacy.
If you’re more interested in collecting your own survey data from real people, make sure to read our detailed guide to online survey providers, with full reviews of the top survey providers, including SurveyMonkey, Google Forms and Qualtrics.