Research Guides: Web Archiving @ UBC Library: FAQs

How do I suggest a website for you to archive?

Suggestions from the UBC community are welcomed. If you would like to propose a website for archiving, or if you are interested in partnering with us for a new collection or research project, please review our Collection Guidelines and complete our web archiving request form.

How often do you crawl websites?

Crawl frequency depends on the type of content in a website and how often it is updated. If a site is updated regularly, such as government news release pages, it would be crawled daily or weekly. For other sites that are updated less frequently, crawls can be monthly, quarterly or annually.

Is there any personal information in the web archives?

The Library collects web content that is publicly accessible; this may include pages with personal information.

Will my password-protected site be archived?

The Library does not archive password-protected content, unless by special permission from the website owner.

What if I don't want my website to be archived?

If you are the owner of a website that you do not wish to have included in the UBC Library's Web Archive Collections, please complete our takedown request form.

Please note that UBC Library is able to remove websites from the UBC Web Archives Collections only; this will not remove your website from the Wayback Machine’s archived collections. Read here for information on how to remove your website from the Wayback Machine.

Are there technical limits to what can be archived?

We make every effort to ensure that the archived version of a website represents the live site as closely as possible as of the day it is archived. However, web crawlers have limitations as to what they are able to capture, and it may not be possible to archive certain types of content. This includes, but is not limited to, the following:

dynamic content, such as zoomable maps
elements requiring human interaction, such as search boxes
databases
Flash or JavaScript content
robots.txt files
certain aspects of streaming and downloadable media, such as YouTube, TikTok, etc.
password-protected sites

In some instances, content may be successfully archived for preservation purposes, but cannot be successfully replicated or replayed for viewing in its original website format.

For more information on the technical limitations of web crawlers please see Known Web Archiving Challenges – Archive-It Help Center. If you are uncertain as to whether the content of a website can be archived, a Web Archiving Team member will review the site and its contents to assess the potential for a successful capture, and discuss alternate options if necessary.

What are robots.txt?

A robots.txt file directs the path that web crawlers take when crawling a site in order to avoid overloading web pages, or to prevent crawlers from capturing specific content. The robots.txt file can instruct the crawler to avoid a certain path or link, therefore making certain pages inaccessible.

How do I cite an archived website?

The Purdue Online Writing Lab (OWL) has a great overview and citation generator that you can use. In general, you should cite an archived webpage in the same way as you would cite a live webpage but use the Wayback Machine link. For example:

APA	Lytton (2021, July 2). Village of Lytton. https://wayback.archive-it.org/17076/*/https://lytton.ca/
MLA	Angus Reid Institute. “Blame, bullying and disrespect: Chinese Canadians reveal their experiences with racism during COVID-19”. Angus Reid Institute, 24 June 2020. https://wayback.archive-it.org/14208/*/http://angusreid.org/racism-chinese-canadians-covid19/

How can I ensure that my own website can be archived?

The way your website is designed can prevent a web crawler from archiving the content. There are a few basic things, if taken into account when a website is designed, that can increase the likelihood that it can be archived in its entirety.

Five tips for designing preservable websites | Smithsonian Institution Archives

Guidelines to make websites archivable | European Commission

Designing preservable websites, redux | Library of Congress

How to make websites more archivable | British Library

How to make your website technically compliant | The National Archives (UK)