Research Guides: Library, Archival, and Information Studies: Metadata

Metadata Basics

This page covers metadata and its practical application. It is an introduction to metadata, types of metadata you may encounter, and how to use metadata.

This guide is intended for use by students and practitioners interested in learning more about metadata and its application.

Definition:

What is metadata?

The most common definition of metadata is 'data about data' - summary information about something like a book, movie, car, etc. Metadata about a book could look like the title, author, number of pages; a movie's metadata would include the title, lead actors, and director; a car's metadata could show the make, model, and horsepower available. Metadata is everywhere, and can be very powerful!

Metadata provides statements, composed of a field and associated data, about an object (like Title: Metadata). Metadata structures and contextualizes the data in a predictable, consistent manner based on the intention of the data. This can improve the discoverability of a resource.

It is often easier to consider metadata using an example. Take, for instance, the catalogue record of a book located in the UBC Library stacks (available here).

The Location shows which library on campus holds the book. The Call Number shows where the book is located on the shelves using the Library of Congress (LOC) classification system. The sections of the LOC classification system also describe information about the book, as it is a metadata classification system itself.

Metadata describes many aspects of an object. If you think about a book, metadata covers information like the title, author, publication date, etc. However, it can also include how the book is organized (chapters, sub-headings within chapters, number of pages), how often the book has been checked out from the library, where it came from and any new editions, and physical properties that can describe how to preserve the book.

There are 5 main categories of metadata, each serving a different purpose. It is important to note that these are the categories for the elements making up a record, and many records include multiple categories of metadata to provide a comprehensive detailing of the resource.

Descriptive

As it sounds, descriptive metadata describes an object. This is a common form of metadata, useful for resource discovery through Internet searches or the library catalog. It provides a description of the resource, like a book’s title, author, and ISBN.

Administrative

This category of metadata is used to provide information about the origin and maintenance of a resource. This can include information like the Copyright information, publisher, and a model number for objects.

Structural

This refers to data about an object’s organization structure. For example, it might identify a book’s chapters, the file format of an image, or the dimensions of an atlas.

Preservation

Preservation metadata identifies information about the object to aid in its preservation, ensuring the longevity of a resource. This might include the type of paper used in a book or the file format of a digital image that could become obsolete as new technology emerges.

Use

Use metadata provides data on how a resource has been used. In a library, this could show how many times a book has been checked out or the number of downloads for a PDF.

Element

An element defines the field that is assigned data; it is the category of information.

Examples of elements that may be required for a descriptive metadata schema for books in a library:

Title
Author
Subjecct Headings

Value

The data related to an element, or the information that corresponds to the category. Using the above elements, examples of values can include:

One Hundred Years of Solitude
Gabriel Garcia Marquez
Macondo (Imaginary place)--Fiction

Element-Value Pair

The combined element (category) and value (data) comprising a piece of metadata for an object.

An example could be:

Element	Value
Title	One Hundred Years of Solitude
Author	Gabriel Garcia Marquez
Subject Headings	Macondo (Imaginary place)--Fiction; Latin America--Social conditions--Fiction

The values from this example can be found on the UBC Library page for this book here: https://webcat.library.ubc.ca/vwebv/holdingsInfo?bibId=2704629

Record

The complete set of metadata statements about an object, organized based on the pre-determined schema. Each record is based on a single object.

Example: The record for The Man Who Mistook His Wife for a Hat and Other Clinical Tales by Oliver Sacks, in MARC 21 format

(MARC 21 is described in greater detail under the “Metadata Examples” section)

Schema

A metadata schema establishes the rules about what statements can be made about a resource and how they can be made. They determine the structure of the data, what fields need to be included, and how the information is organized for consistency. This depends on the purpose of the metadata itself.

Every schema has its own definition for the elements that are required - while the data might be the same, the specific name for each element can be different (like 'Author' in one schema and 'Creator' in another). The order of elements is specified, and in many cases, the way the data is standardized can also be recommended (using the ISO 8601 date format for the publication date, for example).

Example: Dublin Core (explained in greater detail under the “Metadata Examples” section)

Why Standarize?

Choosing to standardize your metadata values ensures consistency in how the information is presented and may indicate a source for where to find new information for future entries.

Syntax Encoding Scheme

A controlled standard for representing a value, such as calendar dates, character strings, integers, etc. Using a syntax encoding scheme ensures consistency in the presentation of values in element-value pairs of metadata throughout a collection of records. By using a syntax encoding scheme, ambiguity is removed because the data can be ready only one way.

Example: International Organization for Standardization (ISO) 8601 – provides a format for representing dates

Any new date/time data added using a standardized encoding scheme, like ISO 8601, will be consistent.

Controlled Vocabularies

Like a syntax encoding scheme, a controlled vocabulary dictates which values can be assigned to a metadata element using a finite list of options. In academic libraries, this may be the Library of Congress Subject Headings, which provides a list of subject headings that can be selected to categorize a resource’s subjects.

Controlled vocabularies are also used in some academic and research databases, as they demonstrate the preferred terms for an expression.

Example: Library of Congress Subject Headings (LCSH)

An example of the LCSH – Children’s Headings:

These are some possible subject headings that can be selected for a record. These are often used in libraries to provide a brief description of what a book is about.

Other controlled vocabularies include:

Medical Subject Headings (MeSH) - used for medical resources

Homosaurus - an LGBTQ+-specific vocabulary allowing for increased discoverability of LGBTQ+ resources

Thesaurus

Unlike the well-known use of “thesaurus” referring to a word’s synonyms and antonyms, in resource description, a thesaurus is a type of controlled vocabulary that allows for different levels of specificity to be applied to a concept or definition. This includes lead-in terms which might be commonly used outside the discipline that then link to the preferred term; broad terms forming the umbrella under which a term falls; narrow terms, like most specific categories within a term; and related terms that can be used for similar descriptions.

Example: The American Psychological Association’s Thesaurus of Psychological Index Terms

An example of the APA Thesaurus structure:

When searching for terms containing the term "Justice," there are multiple options to choose from.

Within the term Social Justice, there is information on when the term was introduced to the thesaurus (2006) and its definition. The Broader Terms, Justice and Social Processes, refer to the larger umbrella terms under which Social Justice can be found. The Narrow Term of Distributive Justice falls under Social Justice. There are additional Related Terms that may also be selected depending on the object described.

Other thesauri include::

Library of Congress Genre/Form Terms - provides specific terms to use for various genres

Art & Architecture Thesaurus - developed to improve discoverability and access to resources around art, architecture, and other cultural materials

Thesaurus of Geographic Names - terms for geographic areas developed by Getty

Authority File

Like a controlled vocabulary, an authority file provides a list of options for describing a resource. The Online Computer Library Center (OCLC) provides a searchable list of available authority files that can be used to determine if an authoritative heading exists.

Example: Integrated Authority File (GND) – a German-language authority file

This entry shows the authory file information on author Kurt Vonnegut.

Other available authority files include:

Virtual International Authority File - combines terms from the national libraries of countries around the world of the following fields: corporate names; geographic names; personal names; works; expressions; preferred headings; exact heading; and bibliographic titles

Iconography Authority - a vocabulary that covers global iconography topics in artwork

Name Authority File

There are also authority files specific to names, outlining the acceptable presentation of names in a record.

Example: Library of Congress Name Authority File (LCNAF) – this file provides the preferred naming convention (label), its original dataset, type, subdivision (if available), and an identifier.

These are some examples of names, including how to refer to people and buildings.

Other name authority files include:

Cultural Objects Name Authority - a compiled resource of information about various works

Union List of Artist Names - preferred format of the names of people and organizations responsible for creating and maintaining art and architecture

Unique Identifiers

An identifier that can only relate to one person or ‘thing,’ like a call number for a book in a library or an ORCID identifier for a researcher. These are assigned to only one object, so it cannot be mistaken for any other.

Example: Registered researchers can be assigned an ORCID ID, which is unique to that person.

This is the ORCID profile of a fictitious individual, Josiah Carberry, demonstrating the information included in ORCID entries. The ORCID ID for any researchers would never be repeated.

Other unique identifiers:

International Standard Music Number - identifies music publications internationally

Digital Object Identifier (DOI) - provides a stable link to a site (such a journal article)

International Standard Name Identifier - uniquely identifies the people and organizations involved in creating works

Metadata Examples

Each of these tabs demonstrate different options for existing metadata schemas, especially the ones most frequently seen in library studies.

Dublin Core is a standard originally developed for cataloguing web pages. There are two versions of the Dublin Core standard:

Simple – 15 elements
Qualified – 18 elements (the 15 from Simple plus 3 additional elements) and qualifiers

Dublin Core can be expressed in plain text, but as it was designed for electronic resources, can also be expressed using XML (Extensible Markup Language), HTML, RDF, etc.

The Dublin Core elements advise on what should be included in a standard, but the way those elements are encoded is up to the person creating the records. They can choose to use free text entries or specific controlled vocabularies, name authorities, and/or encoding schemes to standardize the element data.

Simple 15 Elements

Select an element to follow the link to the Dublin Core page information
Element	Description
Title	The name given to a work
Subject	Keywords or phrases about a resource
Description	About the content of a resource
Type	Genre, category, etc. of a work
Source	The original resource from which the resource in question derives
Relation	A related source similar to the resource being described
Coverage	The location, time period, or jurisdiction of the resource
Creator	The individual(s) responsible for a work
Publisher	The name of the person or organization who made the resource available
Contributor	People or organizations who supplied information for the resource
Rights	Who holds legal rights to the use and distribution of the resource (ex:copyright)
Date	Can refer to the date of creation, publication, availability, or modification of the resource
Format	The medium of the resource (physical or digital), media type, dimensions, etc.
Identifier	How the resource can be uniquely identified (ex: ISBN)
Language	The language of the resource

Dublin Core Qualified

Includes the Simple 15 elements, plus these additional elements
Element	Description
Audience	Who the resource is intended for
Provenance	A record of any changes to the ownership of the resource throughout its history
RightsHolder	The person or organization who has rights over the resource

MARC, or MAchine Readable Cataloging Record, is a widely-used metadata schema used in libraries because the records are easily transferred. The schema was developed specifically for the ability for computers to read the data using data signposts signifying the element types.

The MARC standards include options for authority, bibliographic, community information, classification, and holdings records. The consistency of data signposts allows each of these record types to be machine readable regardless of the institution using the records, provided they know the meaning of the signposts.

MARC records are available for many academic library materials at the University of British Columbia. For example, this is the full MARC record for a book titled: Justice and social interaction: Experimental and theoretical contributions, from psychology research

Each of the bolded number codes correlate to an alement, which is then defined. For example:

245 a is the title

245 b is the subtitle

300 a is use metadata, showing the number of pages

300 c is use metadata, defining the size of the book

Each of the field categories and more information can be found on the MARC website here:

Bibliographic Data

Bibliographic Data Lite

Authority

Holdings

Classification

Community Information

MODS stands for Metadata Object Description Standard and uses 20 elements, along with sub-elements and attributes. It is a common standard for use in library environments. MODS contains information from existing MARC records and has its own unique elements. This schema is richer than Dublin Core, in that it contains more description using text-based values where Dublin Core might use numeric values drawn from a standardized resource. It can also be used to supplement other metadata schemas.

This XML example of a MODS record shows the elements and values used for a journal article:

The metadata listed in MODS for this journal article defines the elements and then specifies their values.

Created by the Library of Congress, BIBFRAME (Bibliographic Framework) was intended to replace the MARC schema as it specifically geared towards digital objects. It organizes bibliographic information into three distinct levels: Work, Instance, and Item.

The three main levels are intended to organize the properties of a particular work

There are many possible properties used to describe a resource in BIBFRAME, organized into categories like category, title, work description. The complete list of properties can be found here: https://id.loc.gov/ontologies/bibframe-category.html

Metadata Translation

A metadata standard is often selected based on the needs of specific organization or a situation for organizing records on objects. For example, a metadata record may originally be in MARC format and need to be converted to BIBFRAME. It may be necessary in these scenarios to translate existing metadata rather than starting from scratch. There are several different options available for translation, depending on the existing information format and the desired format.

Mapping is the process of identifying what element is required for a new schema (schema B) that is functionally equivalent to an element in the source schema (schema A).

Schema A

Title	The New Jim Crow: Mass Incarceration in the Age of Colorblindness
Creator	Michelle Alexander

The existing data uses the Simple Dublin Core schema

Schema B

<title>	The New Jim Crow: Mass Incarceration in the Age of Colorblindness
<ObjectCreator>	Michelle Alexander

The target schema has different element names, but the information serves the same purpose, making it easy to identify where information from Schema A fits into Schema B

While similar to mapping, conversion refers to expressing a relationship in a different element-value pair.

For example, trying to express “The author of this resource is Kurt Vonnegut” is a relationship that needs to be translated into a metadata schema to correctly capture the information provided in the statement. In Schema B, this might be expressed as:

Author

Vonnegut, Kurt

Unlike with mapping, conversion is somewhat less straight-forward, but the relationship between an element and value can be determined and expressed accurately

A crosswalk is a set of equivalency relationships between elements in the original schema A and new schema B.

The element “title” in schema A may be equivalent to element “document name” in schema B.

It’s important to note that the equivalencies in a crosswalk are not reciprocal, so crosswalks are only one-directional.

There are multiple existing crosswalks that allow quick, easy change from one metadata schema to another. They lay out how the original element maps to the target element.

These existing crosswalks can allow easy translation between the original and target schemas.

Metadata Creation

Creating a metadata standard can seem daunting, especially given the complexity and granularity of some standards. However, in its basic form, metadata records are a series of fields and data that can allow for searching through records for a specific record.

A simple metadata standard could look like a spreadsheet separating the elements from the data for each resource. Once you have determined what elements are important for identifying and differentiating resources, you can decide whether you want to include standardization for the various elements. If you do, you need to decide what authority would be most appropriate for the collection and use those guidelines to inform how data is entered.

In some cases, using an existing metadata schema may be the easiest course of action. Alternatively, an application profile can be developed by combining the elements of multiple schemas.