Skip to Main Content

Text and Data Mining (TDM)

General guidelines for text and data mining projects at UBC.

What is this guide for?

General information on Text and Data Mining resources and support for TDM activities at UBC Library. This guide also maintains an up-to-date list of library licensed resources that allow TDM activity as well as information on legal support.

What is text and data mining (TDM)?

TDM is a broad label that refers to bulk collection and analysis of a corpus of data. A corpus can be anything from the full text of a set of journal articles to public social media posts to census data. The work of Text and Data Mining is to programmtically extract unseen relationships in the data.

Getting help with TDM

The library can help get your project off the ground legally and safely. We can help negotiate licenses for access to resources; develop agreements with providers who don't normally allow TDM; consult on project planning and tool selection; help train project members in the use of TDM tools.

To get in touch, reach out to the subject specialist librarian for your subject or schedule a consultation in the Research Commons. For legal questions reach out to Michael Serebriakov in the Office of the University Counsel.

For more information about:

Resources that allow TDM

Some resources allow TDM by default. It is still important to consider legal restrictions, inappropriate use, ethics, and form of data collection.

This spreadsheet lists resources licensed by UBC Library that allow TDM:

The UBC Library Open Collections are also accessible through a robust API which allows full-text download when full-text is available. Learn more here: