Skip to Main Content

Data and Statistics

Resources for finding and using data and statistics. This guide focuses on data for the social sciences.

What is Microdata?

Microdata represents the individual or weighted responses or observations from a survey before any statistical analysis has been applied. It is made up of records that each represent an individual, household, business, or some other observation depending on the survey or study and provides a method for researchers to perform advanced statistical analysis.

What makes microdata important?

There are many ways to view and interpret data, such as statistics. Statistics are aggregated data (data that has already been analyzed). Sometimes the statistics that are presented for a dataset do not show the data we need. Microdata provides a way to create our own statistics by analyzing the variables we are interested in using the methods best suited for the research.

The nature of microdata may require considerations for privacy and confidentiality. The data contains information about individuals, households, or businesses including private information that should not be easily accessed. Many institutions that provide microdata have it available in multiple forms with different levels of confidentiality. Publicly accessible microdata could have variables that are suppressed, random rounding, or other methods to protect privacy while restricted microdata files have more detailed information and usually require an application process to access.

When to use Microdata

You may consider using microdata under the following conditions:

  • There is a need for descriptive statistics or aggregated data
  • There are no data tables or other statistics that fit that need
  • Data is needed for geographic areas with large populations (e.g. province-wide, national)

Working with Microdata

Institutions that provide microdata downloads also provide codebooks alongside the data files that can be used to interpret what is in the files. A codebook describes the structure, content, and layout of a dataset, including descriptions of the variables and variable values. Reading the codebook is a good way to determine if a dataset contains the desired data. 

At first glance, the microdata might not look like anything useful if you aren't familiar with raw data. It may just look like a lot of random numbers with no context. To get meaning from the data, statistical analysis software or a programming language such as R or Python is required and can be used to analyze it, alongside the codebook. Microdata can be provided in multiple formats, including ASCII, SAS, SPSS, or Stata. Some data downloads may also include syntax files. These are files that can be read by a text editor and have commands that can be used when viewing data in a statistical analysis program. Running the syntax file in the appropriate program will perform operations on the data that can make it easier to work with.

Accessing Software

Statistical software is available on workstations in the following locations at UBC:

  SAS SPSS Stata RStudio
Koerner Library 497
Koerner Library 217  
Koerner Library 218A
UBC Okanagan Library
(all workstations)
 
Woodward Library B25  

The Digital Scholarship Lab in Koerner Library 497 has some workstations that can be connected to remotely. More information for using these workstations is available here: Digital Scholarship Lab – computer workstations

Statistics Canada Microdata

Public Use Microdata Files (PUMFs) are one way Statistics Canada provides data from many of its surveys. These files are based on master files and remove any possible identifying information, so some variables may be suppressed due to privacy legislation. PUMFs are non-aggregated and allow researchers to perform their own analysis on survey responses. Many PUMFs are available for free through the Statistics Canada data page, and many can also be found in UBC's data repository, Abacus.

Statistics Canada provides an analytical platform called Rich Data Services (RDS) for exploring PUMFs. The RDS has an Explorer section and a Tabulation Engine that allows users to browse, interact with, and download data and metadata.

Statistics Canada provides data from many of its surveys in Public Use Microdata Files (PUMFs). Some variables are excluded or grouped in PUMFs to protect respondent confidentiality. Researchers who need the complete data may apply for access through the Research Data Centres (RDC) Program. RDCs provide secure access to Statistics Canada data that would not otherwise be available, including microdata Master Files from population and household surveys, administrative data holdings, and linked data. RDC research may be appropriate when:

  • The variables required are not included in the Public Use Microdata File (PUMF)
  • A PUMF for the dataset is not available
  • The analysis requires longitudinal or linked data

For a list of Master Files and variables available through the RDC Program visit the official list of files available.

Note that there is an application approval process which can take some time (weeks or even months) and is generally not available for undergraduate work. All analysis must be done within the secure environment of a Research Data Centre. The UBC RDC is located on the second floor of Koerner Library.

Real Time Remote Access (RTRA) allows researchers to get aggregated data from Statistics Canada's masterfile subsets without requiring direct access to the microdata through a Research Data Centre. This involves running SAS programs that will extract results into a table, allowing users to calculate descriptive statistics at the national or provincial level. Due to concerns around respondent confidentiality, some variables are suppressed in the queried data. Any output produced through RTRA is released under the Statistics Canada Open License.

RTRA may be useful when:

  • There is a need for descriptive statistics or aggregated data
  • There is no published table that already meets the need
  • There is no Public Use Microdata File with the necessary data
  • The aggregated data is needed at the national or provincial level

Requests for RTRA data need to be submitted through the library. Please email the Data/GIS Analyst or Data Librarian for any inquiries about RTRA. Learn more from the Statistics Canada RTRA web page.

Microdata Sources

Canada

United States

International