Want to suggest a website to add to our collections? Interested in developing a web archive collection for a research project? Contact us through our request form.
For other inquiries, please email us directly at digitization.centre@ubc.ca.
Suggestions from the UBC community are welcomed. If you would like to propose a website for archiving, or if you are interested in partnering with us for a new collection or research project, please review our Collection Guidelines and complete our web archiving request form.
Crawl frequency depends on the type of content in a website and how often it is updated. If a site is updated regularly, such as government news release pages, it would be crawled daily or weekly. For other sites that are updated less frequently, crawls can be monthly, quarterly or annually.
The Library collects web content that is publicly accessible; this may include pages with personal information.
The Library does not archive password-protected content, unless by special permission from the website owner.
If you are the owner of a website that you do not wish to have included in the UBC Library's Web Archive Collections, please complete our takedown request form.
Please note that UBC Library is able to remove websites from the UBC Web Archives Collections only; this will not remove your website from the Wayback Machine’s archived collections. Read here for information on how to remove your website from the Wayback Machine.
We make every effort to ensure that the archived version of a website represents the live site as closely as possible as of the day it is archived. However, web crawlers have limitations as to what they are able to capture, and it may not be possible to archive certain types of content. This includes, but is not limited to, the following:
In some instances, content may be successfully archived for preservation purposes, but cannot be successfully replicated or replayed for viewing in its original website format.
For more information on the technical limitations of web crawlers please see Known Web Archiving Challenges – Archive-It Help Center. If you are uncertain as to whether the content of a website can be archived, a Web Archiving Team member will review the site and its contents to assess the potential for a successful capture, and discuss alternate options if necessary.
A robots.txt file directs the path that web crawlers take when crawling a site in order to avoid overloading web pages, or to prevent crawlers from capturing specific content. The robots.txt file can instruct the crawler to avoid a certain path or link, therefore making certain pages inaccessible.
The Purdue Online Writing Lab (OWL) has a great overview and citation generator that you can use. In general, you should cite an archived webpage in the same way as you would cite a live webpage but use the Wayback Machine link. For example:
APA | Lytton (2021, July 2). Village of Lytton. https://wayback.archive-it.org/17076/*/https://lytton.ca/ |
MLA | Angus Reid Institute. “Blame, bullying and disrespect: Chinese Canadians reveal their experiences with racism during COVID-19”. Angus Reid Institute, 24 June 2020. https://wayback.archive-it.org/14208/*/http://angusreid.org/racism-chinese-canadians-covid19/ |
The way your website is designed can prevent a web crawler from archiving the content. There are a few basic things, if taken into account when a website is designed, that can increase the likelihood that it can be archived in its entirety.
Five tips for designing preservable websites | Smithsonian Institution Archives
Guidelines to make websites archivable | European Commission
Designing preservable websites, redux | Library of Congress
How to make websites more archivable | British Library
How to make your website technically compliant | The National Archives (UK)