This list is far from complete. These are some of the prominent archived data sites:
Internet Archive featuring the Wayback Machine
"Our mission is to provide Universal Access to All Knowledge. We collect published works and make them available in digital formats. Today our archive contains:
330 billion web pages
20 million books and texts
4.5 million audio recordings (including 160,000 live concerts)
4 million videos (including 1 million Television News programs)
3 million images
200,000 software programs"
End of Term Web Archive
"The End of Term Web Archive captures and saves U.S. Government websites at the end of presidential administrations. Beginning in 2008, the EOT has thus far preserved websites from administration changes in 2008, 2012 and 2016. The End of Term Web Archive contains federal government websites (.gov, .mil, etc) in the Legislative, Executive, or Judicial branches of the government. Websites that were at risk of changing (i.e., whitehouse.gov) or disappearing altogether during government transitions were captured. Local government websites, or any other site not part of the federal government domain were out of scope."
"Data Refuge is a community-driven, collaborative project to preserve public climate and environmental data. When we document the many ways diverse communities use data, we can also advocate for future data."
"DataLumos is an ICPSR archive for valuable government data resources. ICPSR has a long commitment to safekeeping and disseminating US government and other social science data. DataLumos accepts deposits of public data resources from the community and recommendations of public data resources that ICPSR itself might add to DataLumos."
"The Memory Hole 2 - run by writer and anthologist Russ Kick - saves important documents from oblivion. Its predecessor, The Memory Hole (2002-2009), posted hundreds of documents, many of which will be reposted on the new site."
"The goals of AltGov2 are to increase government transparency, keep an eye on the powerful through citizen oversight, and dig up important documents. I file hundreds of Freedom of Information Act (FOIA) requests, then post the documents I get. I also repost material the government pulls offline."
Environmental Data and Governance Initiative (EDGI)
"EDGI documents, contextualizes, and analyzes current changes to environmental data and governance practices through multidisciplinary and cross-professional collaborative work. EDGI fosters the stewardship and expansion of public knowledge through building participatory civic technologies and infrastructures to make data and decision-making more accessible. EDGI creates new communities of practice to enable government and industry accountability."
"The Libraries+ Network is a nascent consortium of research libraries, library organizations, and open data communities with a shared interest in saving, preserving, and making accessible born-digital federal government information upon which researchers, citizens, and communities rely."