Skip to Main Content

Research Data Management

Support for storing, sharing, and preserving research data.

Finding Open Data

When working on a research project, its is important to have access to large and diverse data sets. Open data is data that can be freely used and redistributed by anyone. Open data is subject at most to the requirement of attribution.

Some general qualifications to working with open data:

  • Consider the source: Is the hosting site an officially recognized agency, research institution, or company? Do other people within your field use this data source?
  • Review information: Before analysis or synthesis using open data, make sure to clean up data sets.
  • Make sure to attribute your source: Give proper credit to the source of the data and consider what you could do to contribute to the field you are working in.

Use your best judgment when working with open data. As open data can be redistributed, make sure you are getting data from its source to ensure data fidelity.

Open data can be found anywhere. The first step is to refine what your are looking for. This could include subject matter, type of data, time period, particular demographics, etc. It is important to start broad and continually narrow down your search to find data that is appropriate for your research.


  • Cambridge Structural Database – small molecule crystal structures ChemSpider – free-to-access collection of chemical structures and their associated information
  • eCrystals – x-ray crystallographic data
  • PubChem – NCBI’s repository of bioactivy/bioassay data and information for “small” molecules (i.e. not macromolecular). Both text-based and structure-based search tools are provided
  • ICPSR (Inter-university Consortium for Political and Social Research) A non-profit, membership-based data archive located at the University of Michigan. The UO is a member of ICPSR, which allows students, staff, and faculty to access ICPSR data files and documentation for research.
  • Dataverse Network is a collection of social science research data contained in virtual data archives called “dataverses”. Maintained by the IQSS (Institute for Quantitative Social Sciences at Harvard), you can create your own “dataverse” and upload your data, subject to certain terms.
  • re3data (“REgistry of REsearch REpositories”) List of repositories
  • DataBib List of repositories
  • DataCite List of Repositories Compiled by the British Library, BioMed Central, and the UK’s Digital Curation Centre.
  • Distributed Data Curation Center: Other Data Repositories Managed by Purdue University Libraries, the Distributed Data Curation Center lists of more than 50 open data repositories from a range of science disciplines.
  • Gene Expression Omnibus The Gene Expression Omnibus (GEO) is an open data repository which provides access to microarray, next-generation sequencing, and other forms of functional genomic data submitted by the scientific community.
  • Global Change Master Directory The Global Change Master Directory, maintained by the Earth Sciences Directorate at the National Aeronautics and Space Administration (NASA), provides access to more than 25,000 earth and environmental science data sets, relevant to global change and Earth science research.
  • MIT Data Management and Publishing: Sharing Your Data The MIT Libraries’ subject guide on data management and publishing includes a list of open data repositories spanning the disciplines of astronomy, atmospheric science, biology, chemistry, earth science, oceanography and space science.
  • Oceanographic Data Repositories Funded by the National Science Foundation, the Biological and Chemical Oceanography Data Management Office (BCO-DMO) provides access to several oceanographic data repositories created by the US Joint Global Ocean Flux Study and US Global Ocean Ecosystem Dynamic programs.
  • Open Access Directory: Data Repositories Launched in 2008 and hosted by the Graduate School of Library and Information Science at Simmons College, the Open Access Directory is a wiki that lists links to over 50 open data repositories in the disciplines of archaeology, biology, chemistry, environmental sciences, geology, geosciences and geospatial data, marine sciences, medicine and physics, as well as multidisciplinary open data repositories.
  • Public Data Sets on Amazon Web Services Amazon Web Services provides a centralized place to download public domain and non-proprietary astronomy, biology, chemistry and climatology data sets.