Skip to main content

Data and Statistics

Data and statistics

In common speech, "data" and "statistics" are synonymous, but when doing research, they're not.

Statistics

Statistics are data that are already aggregated. Tables of information, like years of education vs mean income, bar charts, pie charts and statements like "20% of people don't understand statistics" are statistics. Or, to put it a different way, data turns into statistics once the analysis has been completed.

Data

Statistics don't generally arrive fully formed out of thin air. Their precursor is the raw information used to create them, or data. Data can take many forms, but several commonly encountered types can include:

Raw machine readable data, such as a 100000 line log of experimental interactions
Survey data, or the responses from each studied unit (like an individual) and all the responses to the survey
Environmental data, such as Temperature and rainfall information for a particular weather station for the past 100 years

These items are largely impenetrable until some sort of analysis has taken place on them. So, data is the precursor to statistics.

Singular or plural?

"Data" can be used as collective noun, thus "data is" is perfectly acceptable. Datum is also the singular of data, but is less commonly used. You may see "data is" in this guide. If this bothers you, you can create an agendum to discuss it.

 

Survey data

Surveys conducted by government organizations, private or not-for-profit organizations, or academic researchers are a common source of data for research. Each piece of data collected about a survey respondent is referred to as a variable, and each individual’s response is referred to as a case. Surveys may be cross-sectional (a snapshot of the respondents at one point in time) or longitudinal (repeated observations of the same respondents over a period of time.

Statistics Canada is the primary source of Canadian survey data, which is available in aggregate or microdata forms:

Aggregate data is often presented in tables, reports, or maps that provide a thematic overview or analysis of the survey results. The Statistics Canada Data page is a good source of aggregate data.

Microdata consists of the individual survey responses. Statistics Canada provides data from many of their surveys in Public Use Microdata Files (PUMFs). Some variables are excluded or grouped in PUMFs to protect respondent confidentiality.* UBC's Abacus data repository is a good place to find Statistics Canada PUMFs.

Survey data is accompanied by documentation to help you interpret it. Consult the user's guide, data dictionary, and other documentation for information about survey methodology and variables.

 

* Researchers who need more detail than the PUMFs may apply for access to Statistics Canada Master Files (this is not generally available for undergraduate work). See Research Data Centres for more information.