Skip to Main Content

Data and Statistics

Resources for finding and using data and statistics. This guide focuses on data for the social sciences.

Citing data

The most important thing to remember is that you want your citation to include enough information so that a reader could find the same dataset again in the future, even if the link you provide no longer works. It's necessary to include a mixture of general and specific information to help them be certain that they've found the same dataset that you were referring to.

Citing data has not always been standard practice, especially if it is data you have collected yourself, but as data becomes more and more widely shared proper attribution is increasingly important. Citing datasets helps them become part of the scholarly record and gives proper credit to the creator of the dataset.



Citation elements

Most of the time this information wouldn't be included in the dataset itself, but would be located in the item record of the data repository.

Many data repositories provide information about how to cite their products - look closely to see if you can find anything. This is your best bet for relevant information, as the structure of repositories and how they display different elements varies widely.

The most important elements to include are:

Author/Creator - This could either be the personal name of the researcher, or the institution that collected the data.

Title - Include the full title as it appears in the record for the dataset, including table or catalogue numbers if they are provided. If there is more than one title and you want to cite a part within a whole (such as a series within a table, etc), you can include both titles in the same way that you would include other parts within a whole, such as an article within a journal, or a chapter within a book.

Publication date - Most datasets should include some kind of publication date, even if it is hard to find.

Identifier and/or Link - Most published datasets should have some sort of a unique identifier, most commonly a DOI or a URI. This is the most reliable way to identify a particular resource. Many dataset providers will include a permanent URL in addition to or instead of a unique identifier. Link the DOI to the data source if you are working digitally, or include the URL in print.

Other elements that may be good to include:

Edition or Version - This may help to identify your dataset if it is one that undergoes continuous changes.

Resource Type - Include if the style you are using normally includes a resource type

Publisher - This could be the repository where it's located, or whoever has verified the data.

Consult the style manual for your discipline to see how to correctly cite data.

Statistics Canada has a How to Cite Statistics Canada Products guide with examples using APA format. There is also an archived reference building tool that can help you identify which elements to include for a wide range of data and statistics products.



Citation styles

Once you've tracked down all the right elements, you'll need to put them together by using the appropriate style guidelines, consistent with the rest of your citation list.

In order to create a citation for a dataset, you'll need the same basic pieces of information as you would for any other citation. As described in the UBC library's general How to Cite guide, these are found by asking Who/What/When/Where about the item you are citing.

You can view data citation examples for multiple style manuals below:

For other styles, you will need to arrange the elements in the same way as other resources with similar elements. Think about the similarities between the elements you have and the ones in more common resources (ie: a repository may be like a publisher, a series within a table may be like a chapter within a book). This can help you to build out your citation even if you don't have a specific example to model.

Links to style guide resources can be found on the library's How to Cite guide.




I can't find all the elements I need:

The different elements needed for the citation may be hard to find, depending on the source of your dataset. The information usually provided about datasets is not as standardized as it is for books and articles, which can make things confusing.

  • Several data and statistics repositories actually collect datasets from several different agencies and providers. If you're unable to find enough information about the dataset in the repository, tracking down the dataset where it was originally published may turn up additional information.
  • Many data providers will offer their own guidelines for citing their datasets. This can help to decode some of the language used to describe particular elements. Sometimes these guidelines are general for the whole site, and sometimes they will be linked directly from the dataset record itself.
  • Do your best to include as many elements as possible and keep your data citation consistent with the rest of your list. If some key element is unavailable, try to make sure there is still enough information so that someone else can find it.

The data I'm using comes from multiple sources:

Sometimes datasets can be drawing information from multiple sources at once, making them confusing to cite. This is particularly common when creating charts and tables, whether you are making them yourself or using online tools built in by the data providers.

You must cite ALL the sources of your data.

  • If you are combining data from several series from the same provider, cite all the series. (eg: combined series from Statistics Canada)
  • If you are combining data from several different providers, cite all the sources. (eg: a table you've made comparing trade data from Industry Canada to employment data from Statistics Canada)
  • If you're including a table or graph in your paper which combines data from multiple sources, include a note describing which data elements came from where, with in-text citations. Give each source an entry in your reference list.
  • If you're not including a table or graph, provide this same information in the text of your paper.
  • If your data is drawing from so many sources that citing each source in the traditional manner is unreasonable, see the section on "Microattribution" in the Digital Curation Centre guide.