The dramatic political threats to science is not just about funding for new research but also about the existing data on government websites such as the US EPA’s climate pages which were initially taken down by the Trump Administration, then restored but perhaps only temporarily, have led scientists and archivists alike to begin their own projects to save the scientific data which anti-climate change fanatics want to hide and have already begun to remove from government sites.

Guerrilla archiving

So-called guerilla archiving is simply the downloading and storage of websites and data from the Internet by individuals and casually organised groups.

Advertisements
Advertisements

Although it is unlikely that, for instance, all the vast databases of climatological data and research which has been collected and sometimes created by the Environmental Protection Agency and other Federal Agencies will actually be completely lost, that is possible - last century saw book burning on a vast scale and government reports even in the US which actively hid the truth about vital topics such as the Vietnam War.

But even if the actual data is stored some place one of the most important innovations of the Internet, and the actual reason it was developed by scientists was to share data. The Web, also created by scientists, made the scientific information widely and easily accessible. That is what guerilla archivists are striving to save - online access for the public as well as for scientists.

Advertisements

Organized archiving

Although few people seem to know about it, The Internet #archive, a massive project to backup the entire Web which began decades ago, has been quietly capturing every Web site which it could scan. Some sites use tools which prevent the bots used by search engines and others such as The Archive from downloading them automatically.

The Archive itself is a vast research resource for academic and scientific research as well as personal curiosity and entertainment. Individuals who fear that governments anywhere will remove web data can also directly copy and store that site using the archive tools.

The Internet Archive Wayback Machine is named for the Rocky and Bullwinkle cartoon segment with Peabody and Sherman, a boy and his scientist dog with a time machine. The video here looks at the Great Wall.

As of today, the Wayback Machine at The Archive has stored 279 billion web pages, many of them pages with live HTML links.

Checking the “Wayback machine” at The Archive address of the #EPA’s official Web site I found that although the EPA climate pages had never before been archived by this project until last month.

Advertisements

Then, beginning January, 2017, there has been a concerted attempt to capture all the pages and the underlying data, complete with active links - essentially preserving the entire site for future generations even if the official EPA site is taken down.

Because the Web is a living, changing entity, The Archive doesn't backup the sites in the ordinary way, discarding previous copies. Instead The Archive contains copies of a web site on many different days, not changing or deleting earlier copies. In other words, this is an actual history, not merely an always changing backup.

The Wayback Machine is limited to fixed web pages. The dark web consists of sites which will search databases and display information in new ways. That can't be archived.

The Archive itself is free to use and also contains 11 million documents including books, 3 million movies, music, and more.

It started in Canada

A University of Toronto professor, Michelle Murphy began to spot check that data was still online. This began because of experience with former Canadian Prime Minister Stephen Harper, who was accused of doing what President #trump is now apparently doing, muzzling government scientists.

"The transition team and Trump administration are very upfront and promoting of this plan; it's not that this is surreptitious," Murphy told Greenwire - a part of E&ENews, an environmental watchdog news service.