skip to main content

Virginia Tech News / Articles / 2018 / June 

Virginia Tech is leading a grant-funded project to make web archives more valuable to researchers

July 9, 2018

Students studying with online resources.

Students studying with online resources

The Institute of Museum and Library Services recently awarded a $248,451 grant for a collaborative two-year project, Continuing Education to Advance Web Archiving, that will create materials to teach librarians and archivists across the world how to collect, extract, and analyze archived information from the world wide web.

Zhiwu Xie, director of digital library development for the University Libraries at Virginia Tech, is leading the team of library and archive experts to create a curriculum surrounding the technology of web archiving and challenges related to how archivists and librarians can gather the most useful information from archived internet sites and social media.

“The web is the most prominent channel of communication we have today, and web sites change all the time. The web doesn’t have a memory, so a history of time is hard to construct,” said Xie. “Web archiving is about recording that memory.”

Project team member and Virginia Tech professor of computer science Ed Fox believes in providing individuals and libraries the tools to better access and analyze the massive amount of information already archived. 

“I view information as a fundamental need of humans,” said Fox, who also serves as the director for the Digital Library Research Laboratory. “The most visible information is what’s available over the World Wide Web, and over time, in its archive. This information is invaluable for researchers studying areas such as trends in elections, technology, and the environment.”

More than tens of petabytes of web content have been collected and archived by memory institutions. All of the project collaborators, including Xie, Fox, Martin Klein from Los Alamos National Laboratory, Michael Nelson from Old Dominion University, Justin Littman from George Washington University, Ian Milligan from University of Waterloo, and Jefferson Bailey from the nonprofit archiving organization Internet Archive, are pioneers in web archiving technology and infrastructure.

“Collectively, we have done a lot of work in creating tools for web archiving; we want to put our work to use and make an impact on society,” said Xie.

“By creating training materials for some of the most innovative and complex tools used in web archiving, it can help lower barriers for institutions wanting to run these technologies locally, either for collecting, or especially, for researcher and user support,” said Bailey, who serves as director of web archiving.

“Suites of open source tools are available to assist researchers conducting analyses and extracting knowledge,” said Xie. “However, these tools require the user to be proficient in big-data processing and analysis. Very few librarians or archivists have been trained to understand, utilize, maintain, and manage these tools.”

By the end of the project, the collaborators will provide a collection of educational resources, a series of in-person and online training workshops, and cyberinfrastructure for deploying tools to support the curriculum and workshops — including source code.

“The curriculum will include project-based learning because people learn better by doing,” said Fox. “During the training, participants will solve problems like they would face while helping patrons. The curriculum will be need-oriented as opposed to system or technology oriented. All of the training and tools will be free to the user.”

“By educating more people and organizations on the technologies of web archiving, the project can contribute to allowing more organizations to build collections of web-published materials,” said Bailey. “This benefits society by ensuring a greater portion of web-published historical documentation is preserved and accessible into the future.”

“Equipped with these skills, library and archive professionals will be able to go beyond their traditional role as information providers or pointers and form deeper alliances with researchers,” said Xie. “This will continue to transform libraries and archives from information repositories to knowledge producers.”

Contact: