This page covers metadata and its practical application. It is an introduction to metadata, types of metadata you may encounter, and how to use metadata.
This guide is intended for use by students and practitioners interested in learning more about metadata and its application.
What is metadata?
The most common definition of metadata is 'data about data' - summary information about something like a book, movie, car, etc. Metadata about a book could look like the title, author, number of pages; a movie's metadata would include the title, lead actors, and director; a car's metadata could show the make, model, and horsepower available. Metadata is everywhere, and can be very powerful!
Metadata provides statements, composed of a field and associated data, about an object (like Title: Metadata). Metadata structures and contextualizes the data in a predictable, consistent manner based on the intention of the data. This can improve the discoverability of a resource.
It is often easier to consider metadata using an example. Take, for instance, the catalogue record of a book located in the UBC Library stacks (available here).
The Location shows which library on campus holds the book. The Call Number shows where the book is located on the shelves using the Library of Congress (LOC) classification system. The sections of the LOC classification system also describe information about the book, as it is a metadata classification system itself.
Metadata describes many aspects of an object. If you think about a book, metadata covers information like the title, author, publication date, etc. However, it can also include how the book is organized (chapters, sub-headings within chapters, number of pages), how often the book has been checked out from the library, where it came from and any new editions, and physical properties that can describe how to preserve the book.
There are 5 main categories of metadata, each serving a different purpose. It is important to note that these are the categories for the elements making up a record, and many records include multiple categories of metadata to provide a comprehensive detailing of the resource.
Descriptive
As it sounds, descriptive metadata describes an object. This is a common form of metadata, useful for resource discovery through Internet searches or the library catalog. It provides a description of the resource, like a book’s title, author, and ISBN.
Administrative
This category of metadata is used to provide information about the origin and maintenance of a resource. This can include information like the Copyright information, publisher, and a model number for objects.
Structural
This refers to data about an object’s organization structure. For example, it might identify a book’s chapters, the file format of an image, or the dimensions of an atlas.
Preservation
Preservation metadata identifies information about the object to aid in its preservation, ensuring the longevity of a resource. This might include the type of paper used in a book or the file format of a digital image that could become obsolete as new technology emerges.
Use
Use metadata provides data on how a resource has been used. In a library, this could show how many times a book has been checked out or the number of downloads for a PDF.
An element defines the field that is assigned data; it is the category of information.
Examples of elements that may be required for a descriptive metadata schema for books in a library:
Title
Author
Subjecct Headings
Value
The data related to an element, or the information that corresponds to the category. Using the above elements, examples of values can include:
Element-Value Pair
The combined element (category) and value (data) comprising a piece of metadata for an object.
An example could be:
Element | Value |
Title |
One Hundred Years of Solitude |
Author | Gabriel Garcia Marquez |
Subject Headings |
Macondo (Imaginary place)--Fiction; Latin America--Social conditions--Fiction |
The values from this example can be found on the UBC Library page for this book here: https://webcat.library.ubc.ca/vwebv/holdingsInfo?bibId=2704629
Record
The complete set of metadata statements about an object, organized based on the pre-determined schema. Each record is based on a single object.
Example: The record for The Man Who Mistook His Wife for a Hat and Other Clinical Tales by Oliver Sacks, in MARC 21 format
(MARC 21 is described in greater detail under the “Metadata Examples” section)
Schema
A metadata schema establishes the rules about what statements can be made about a resource and how they can be made. They determine the structure of the data, what fields need to be included, and how the information is organized for consistency. This depends on the purpose of the metadata itself.
Every schema has its own definition for the elements that are required - while the data might be the same, the specific name for each element can be different (like 'Author' in one schema and 'Creator' in another). The order of elements is specified, and in many cases, the way the data is standardized can also be recommended (using the ISO 8601 date format for the publication date, for example).
Example: Dublin Core (explained in greater detail under the “Metadata Examples” section)
Choosing to standardize your metadata values ensures consistency in how the information is presented and may indicate a source for where to find new information for future entries.
Syntax Encoding Scheme
A controlled standard for representing a value, such as calendar dates, character strings, integers, etc. Using a syntax encoding scheme ensures consistency in the presentation of values in element-value pairs of metadata throughout a collection of records. By using a syntax encoding scheme, ambiguity is removed because the data can be ready only one way.
Example: International Organization for Standardization (ISO) 8601 – provides a format for representing dates
Any new date/time data added using a standardized encoding scheme, like ISO 8601, will be consistent.
Controlled Vocabularies
Like a syntax encoding scheme, a controlled vocabulary dictates which values can be assigned to a metadata element using a finite list of options. In academic libraries, this may be the Library of Congress Subject Headings, which provides a list of subject headings that can be selected to categorize a resource’s subjects.
Controlled vocabularies are also used in some academic and research databases, as they demonstrate the preferred terms for an expression.
Example: Library of Congress Subject Headings (LCSH)
An example of the LCSH – Children’s Headings:
These are some possible subject headings that can be selected for a record. These are often used in libraries to provide a brief description of what a book is about.
Other controlled vocabularies include:
Medical Subject Headings (MeSH) - used for medical resources
Homosaurus - an LGBTQ+-specific vocabulary allowing for increased discoverability of LGBTQ+ resources
Thesaurus
Unlike the well-known use of “thesaurus” referring to a word’s synonyms and antonyms, in resource description, a thesaurus is a type of controlled vocabulary that allows for different levels of specificity to be applied to a concept or definition. This includes lead-in terms which might be commonly used outside the discipline that then link to the preferred term; broad terms forming the umbrella under which a term falls; narrow terms, like most specific categories within a term; and related terms that can be used for similar descriptions.
Example: The American Psychological Association’s Thesaurus of Psychological Index Terms
An example of the APA Thesaurus structure:
When searching for terms containing the term "Justice," there are multiple options to choose from.
Within the term Social Justice, there is information on when the term was introduced to the thesaurus (2006) and its definition. The Broader Terms, Justice and Social Processes, refer to the larger umbrella terms under which Social Justice can be found. The Narrow Term of Distributive Justice falls under Social Justice. There are additional Related Terms that may also be selected depending on the object described.
Other thesauri include::
Library of Congress Genre/Form Terms - provides specific terms to use for various genres
Art & Architecture Thesaurus - developed to improve discoverability and access to resources around art, architecture, and other cultural materials
Thesaurus of Geographic Names - terms for geographic areas developed by Getty
Authority File
Like a controlled vocabulary, an authority file provides a list of options for describing a resource. The Online Computer Library Center (OCLC) provides a searchable list of available authority files that can be used to determine if an authoritative heading exists.
Example: Integrated Authority File (GND) – a German-language authority file
This entry shows the authory file information on author Kurt Vonnegut.
Other available authority files include:
Virtual International Authority File - combines terms from the national libraries of countries around the world of the following fields: corporate names; geographic names; personal names; works; expressions; preferred headings; exact heading; and bibliographic titles
Iconography Authority - a vocabulary that covers global iconography topics in artwork
Name Authority File
There are also authority files specific to names, outlining the acceptable presentation of names in a record.
Example: Library of Congress Name Authority File (LCNAF) – this file provides the preferred naming convention (label), its original dataset, type, subdivision (if available), and an identifier.
These are some examples of names, including how to refer to people and buildings.
Other name authority files include:
Cultural Objects Name Authority - a compiled resource of information about various works
Union List of Artist Names - preferred format of the names of people and organizations responsible for creating and maintaining art and architecture
Unique Identifiers
An identifier that can only relate to one person or ‘thing,’ like a call number for a book in a library or an ORCID identifier for a researcher. These are assigned to only one object, so it cannot be mistaken for any other.
Example: Registered researchers can be assigned an ORCID ID, which is unique to that person.
This is the ORCID profile of a fictitious individual, Josiah Carberry, demonstrating the information included in ORCID entries. The ORCID ID for any researchers would never be repeated.
Other unique identifiers:
International Standard Music Number - identifies music publications internationally
Digital Object Identifier (DOI) - provides a stable link to a site (such a journal article)
International Standard Name Identifier - uniquely identifies the people and organizations involved in creating works
Each of these tabs demonstrate different options for existing metadata schemas, especially the ones most frequently seen in library studies.
Dublin Core is a standard originally developed for cataloguing web pages. There are two versions of the Dublin Core standard:
Dublin Core can be expressed in plain text, but as it was designed for electronic resources, can also be expressed using XML (Extensible Markup Language), HTML, RDF, etc.
The Dublin Core elements advise on what should be included in a standard, but the way those elements are encoded is up to the person creating the records. They can choose to use free text entries or specific controlled vocabularies, name authorities, and/or encoding schemes to standardize the element data.
Element | Description |
---|---|
Title | The name given to a work |
Subject | Keywords or phrases about a resource |
Description | About the content of a resource |
Type | Genre, category, etc. of a work |
Source | The original resource from which the resource in question derives |
Relation | A related source similar to the resource being described |
Coverage | The location, time period, or jurisdiction of the resource |
Creator | The individual(s) responsible for a work |
Publisher | The name of the person or organization who made the resource available |
Contributor | People or organizations who supplied information for the resource |
Rights | Who holds legal rights to the use and distribution of the resource (ex:copyright) |
Date | Can refer to the date of creation, publication, availability, or modification of the resource |
Format | The medium of the resource (physical or digital), media type, dimensions, etc. |
Identifier | How the resource can be uniquely identified (ex: ISBN) |
Language | The language of the resource |
Element | Description |
---|---|
Audience | Who the resource is intended for |
Provenance | A record of any changes to the ownership of the resource throughout its history |
RightsHolder | The person or organization who has rights over the resource |
MARC, or MAchine Readable Cataloging Record, is a widely-used metadata schema used in libraries because the records are easily transferred. The schema was developed specifically for the ability for computers to read the data using data signposts signifying the element types.
The MARC standards include options for authority, bibliographic, community information, classification, and holdings records. The consistency of data signposts allows each of these record types to be machine readable regardless of the institution using the records, provided they know the meaning of the signposts.
MARC records are available for many academic library materials at the University of British Columbia. For example, this is the full MARC record for a book titled: Justice and social interaction: Experimental and theoretical contributions, from psychology research
Each of the bolded number codes correlate to an alement, which is then defined. For example:
245 a is the title
245 b is the subtitle
300 a is use metadata, showing the number of pages
300 c is use metadata, defining the size of the book
Each of the field categories and more information can be found on the MARC website here:
MODS stands for Metadata Object Description Standard and uses 20 elements, along with sub-elements and attributes. It is a common standard for use in library environments. MODS contains information from existing MARC records and has its own unique elements. This schema is richer than Dublin Core, in that it contains more description using text-based values where Dublin Core might use numeric values drawn from a standardized resource. It can also be used to supplement other metadata schemas.
This XML example of a MODS record shows the elements and values used for a journal article:
The metadata listed in MODS for this journal article defines the elements and then specifies their values.
Created by the Library of Congress, BIBFRAME (Bibliographic Framework) was intended to replace the MARC schema as it specifically geared towards digital objects. It organizes bibliographic information into three distinct levels: Work, Instance, and Item.
The three main levels are intended to organize the properties of a particular work
There are many possible properties used to describe a resource in BIBFRAME, organized into categories like category, title, work description. The complete list of properties can be found here: https://id.loc.gov/ontologies/bibframe-category.html
A metadata standard is often selected based on the needs of specific organization or a situation for organizing records on objects. For example, a metadata record may originally be in MARC format and need to be converted to BIBFRAME. It may be necessary in these scenarios to translate existing metadata rather than starting from scratch. There are several different options available for translation, depending on the existing information format and the desired format.
Mapping is the process of identifying what element is required for a new schema (schema B) that is functionally equivalent to an element in the source schema (schema A).
Schema A
Title |
The New Jim Crow: Mass Incarceration in the Age of Colorblindness |
Creator |
Michelle Alexander |
The existing data uses the Simple Dublin Core schema
Schema B
<title> |
The New Jim Crow: Mass Incarceration in the Age of Colorblindness |
<ObjectCreator> |
Michelle Alexander |
The target schema has different element names, but the information serves the same purpose, making it easy to identify where information from Schema A fits into Schema B
While similar to mapping, conversion refers to expressing a relationship in a different element-value pair.
For example, trying to express “The author of this resource is Kurt Vonnegut” is a relationship that needs to be translated into a metadata schema to correctly capture the information provided in the statement. In Schema B, this might be expressed as:
Author |
Vonnegut, Kurt |
Unlike with mapping, conversion is somewhat less straight-forward, but the relationship between an element and value can be determined and expressed accurately
A crosswalk is a set of equivalency relationships between elements in the original schema A and new schema B.
The element “title” in schema A may be equivalent to element “document name” in schema B.
It’s important to note that the equivalencies in a crosswalk are not reciprocal, so crosswalks are only one-directional.
There are multiple existing crosswalks that allow quick, easy change from one metadata schema to another. They lay out how the original element maps to the target element.
These existing crosswalks can allow easy translation between the original and target schemas.
Creating a metadata standard can seem daunting, especially given the complexity and granularity of some standards. However, in its basic form, metadata records are a series of fields and data that can allow for searching through records for a specific record.
A simple metadata standard could look like a spreadsheet separating the elements from the data for each resource. Once you have determined what elements are important for identifying and differentiating resources, you can decide whether you want to include standardization for the various elements. If you do, you need to decide what authority would be most appropriate for the collection and use those guidelines to inform how data is entered.
In some cases, using an existing metadata schema may be the easiest course of action. Alternatively, an application profile can be developed by combining the elements of multiple schemas.