Paul Lesack, Data/GIS Analyst
UBC Library Research Commons, Koerner Library
(604) 822-5587
paul.lesack@ubc.ca
Sophia Papandonatou, Data Librarian
UBC Library Research Commons, Koerner Library
(604) 827-3510
sophia.papandonatou@ubc.ca
Mathew Vis-Dunbar, Data & Digital Scholarship Librarian
UBC Library Okanagan Campus
(250) 807-9861
mathew.vis-dunbar@ubc.ca
The most important thing to remember is that you want your citation to include enough information so that a reader could find the same dataset again in the future, even if the link you provide no longer works. It's necessary to include a mixture of general and specific information to help them be certain that they've found the same dataset that you were referring to.
Citing data has not always been standard practice, especially if it is data you have collected yourself, but as data becomes more and more widely shared proper attribution is increasingly important. Citing datasets helps them become part of the scholarly record and gives proper credit to the creator of the dataset.
Most of the time this information wouldn't be included in the dataset itself, but would be located in the item record of the data repository.
Many data repositories provide information about how to cite their products - look closely to see if you can find anything. This is your best bet for relevant information, as the structure of repositories and how they display different elements varies widely.
Author/Creator - This could either be the personal name of the researcher, or the institution that collected the data.
Title - Include the full title as it appears in the record for the dataset, including table or catalogue numbers if they are provided. If there is more than one title and you want to cite a part within a whole (such as a series within a table, etc), you can include both titles in the same way that you would include other parts within a whole, such as an article within a journal, or a chapter within a book.
Publication date - Most datasets should include some kind of publication date, even if it is hard to find.
Identifier and/or Link - Most published datasets should have some sort of a unique identifier, most commonly a DOI or a URI. This is the most reliable way to identify a particular resource. Many dataset providers will include a permanent URL in addition to or instead of a unique identifier. Link the DOI to the data source if you are working digitally, or include the URL in print.
Edition or Version - This may help to identify your dataset if it is one that undergoes continuous changes.
Resource Type - Include if the style you are using normally includes a resource type
Publisher - This could be the repository where it's located, or whoever has verified the data.
Consult the style manual for your discipline to see how to correctly cite data.
Statistics Canada has a How to Cite Statistics Canada Products guide with examples using APA format. There is also an archived reference building tool that can help you identify which elements to include for a wide range of data and statistics products.
Once you've tracked down all the right elements, you'll need to put them together by using the appropriate style guidelines, consistent with the rest of your citation list.
In order to create a citation for a dataset, you'll need the same basic pieces of information as you would for any other citation. As described in the UBC library's general How to Cite guide, these are found by asking Who/What/When/Where about the item you are citing.
You can view data citation examples for multiple style manuals below:
For other styles, you will need to arrange the elements in the same way as other resources with similar elements. Think about the similarities between the elements you have and the ones in more common resources (ie: a repository may be like a publisher, a series within a table may be like a chapter within a book). This can help you to build out your citation even if you don't have a specific example to model.
Links to style guide resources can be found on the library's How to Cite guide.
The different elements needed for the citation may be hard to find, depending on the source of your dataset. The information usually provided about datasets is not as standardized as it is for books and articles, which can make things confusing.
Sometimes datasets can be drawing information from multiple sources at once, making them confusing to cite. This is particularly common when creating charts and tables, whether you are making them yourself or using online tools built in by the data providers.
You must cite ALL the sources of your data.