So much of what we have been able to accomplish over the past two years is enabled by the Internet Archive, and in particular the Wayback Machine. For example, our first event in December 2016 sought to archive EPA websites, prior to Trump’s inauguration, by nominating key pages and datasets for inclusion in the Wayback Machine. subsequent 5 months, as over 49 DataRescue events were held across the country, and over 63,000 web pages from environmental agencies like EPA, NOAA, NASA, and OSHA were nominated to the archive. The DataRescue project ended in June 2017, but not before raising important questions about the politics of data accessibility and stewardship.
Through DataRescue we began partnering buy sales lead with the Internet Archive, which has become essential in another EDGI project: tracking ongoing changes at federal agency websites. Initially using a fee-based software program, Versionista, to crawl government web pages (currently crawling 42,000 URLs), we have been able to locate and report on the removal or alteration of web content on climate, non-renewable energy sources, and important environmental treaties.
This kind of work increasingly relies on the Wayback Machine, and our reports systematically include references and screenshots from it. In our commitment to building participatory and responsive civic technologies and data infrastructure (partly inspired by the Internet Archive), we also developed our own web monitoring software, called Scanner, that is free and open-source, and which we plan to turn into a public platform. We are partnering with the Internet Archive to develop its functionality.