Tuesday, June 28, 2016

Subject Analysis, Controlled Vocabularies and Thesaurus Structure

Previous Problem

  • Economics of information organization:
    • High cost of human-created descriptive metadata
    • High cost of maintaining authority files and other sources of right side content
  • Solution for description:
    • Centralize cataloging work whenever possible, including authority files
    • Distribute metadata and authority records using industry standard MaRC encoding
New Problem
  • Economics of information organization:
    • High cost of human-created subject metadata
    • High cost of maintaining authority files and other sources of right side content
  • Solution for subject analysis:
    • Centralize cataloging work whenever possible, including authority files
    • Distribute metadata and authority records using industry standard MaRC encoding.
Shared Authority Files
  • Purposes of shared authority files:
    • To map from alternative forms of terms to authorized forms
    • To provide a common source for cataloging/metadata record content (i.e. for data entry purposes)
  • A shared authority file is maintained independently of cataloging/metadata files, and authority files are often shared across different institutions. 
  • For description, authority files include LCNAF and ULAN.
  • For subject analysis, authority files are called controlled vocabularies
Functional Purposes of Authority Work
  • To meet Cutter objectives 1 & 2
    1. To locate an item in a library ( a known item search):
      • By an author
      • On a subject
    2. To show what the library has (a collocating search):
      • By an author (collocation using author criterion)
      • On a subject (collocation using subject criterion)
  • Additional purposes of subject authority work:
    • Synonymy: To relate different words that represent the same concept
    • Homonymy: To distinguish (i.e. disambiguate) between different concepts having the same name. 
Why vocabulary control?
  • Problem of natural language variation:
    • Varation over time in the way concepts are referred to 
    • Geographical variation in the way people ask for resources using a subject criterion - The Great Pop vs. Soda Controversy
  • Problem that the same concept (e.g. fizzy liquid) is referred with different strings of character (i.e. words)
  • Cataloger's decision: "I know what the subject is (from the resource in hand), but what subject data do I enter into the surrogate record?"

What Vocabulary Control Controls

  • Systems of controlled vocabularies map the various words that describe concepts. 
  • These concepts are maintained in a thesaurus of headings list, where each concept is located in what we'll call a "semantic neighborhood."
  • Problems addressed by controlled vocabularies:
    • Lexical and orthographic variation
    • Word sequences
    • Abbreviations
    • Technical language
    • Synonymy
    • Homonymy 
Lexical and Orthographic Variation
  • Same concept different spellings
    1. Varying word forms
    2. Differences across cultures and coutnries
    3. Changes over time
  • Examples
  1. Varying word forms
    • clothes vs. clothing
  2. Differences across cultures and countries:
    • Orthopedic vs. orthapaedics
    • Catalog vs. catalogue
  3. Changes over time:
    • on line vs. on-line vs. online
Word sequences
  • Same multiword concept, different word order:
    • Inversion is the process of reversing word order 
    • Used to collocated like with like
  • Examples
    • Educational psychology vs psychology, educational
    • Right of asylum vs asylum, right of
    • BOTTOM LINE: WE DON'T CARE HOW IT'S ORDERED IT JUST NEEDS TO GET THE INFORMATION
Abbreviations
  • Same concept, different word forms:
    • Depends on the intended use of the controlled vocabulary
    • Abbreviations are normally spelled out
    • Exceptions for those concepts with worldwide recognition
  • Example:
    • Acquired Immunodeficiency Syndrome vs. AIDS
Technical Languages
  • Same concept, different words:
    • Depends on the intended use of the controlled vocabulary
    • Use words that match that of the intended user group
  • Example:
    • Neoplasms(MeSH) vs cancer (LCSH)
Synonymy
  • Same concept, different words. 
    • Examples:
      • Cats vs felines
      • Attire vs dress vs clothing
Homonymy
  • Same word and/or pronunciation, different concepts
    1. Homographs - same word, different concepts
    2. Homophones - same pronunciation, different concepts
  • Examples:
    1. Homographs
      • Mercury (planet) vs Mercury (god)
    2. Homophones
      • Foul vs Fowl
Vocabulary Control Techniques
  • Use of entry terms (mapping):
    • Lexical and orthographic variation
    • Word sequences
    • Abbreviations
    • Technical Language
    • Synonymy
  • Use of various disambiguation techniques:
    • Homonymy
Vocabulary Control - Entry Terms
  • All words or word forms are mapped to the authorized word form for that concept.
  • Quality of controlled vocabulary is directly related to the number of entry terms, which themselves must be maintained over time. 
  • EXAMPLE: search MeSH for "brain neoplasms" vs "brain cancer" - brings up the same info. Good mapping!Great example of high quality subject heading terms (LCSH being example of low quality).
Vocabulary Control - Disambiguation
  • Qualification
    • Parenthetical qualifier to disambiguation meaning
      • Mercury (god)
      • Mercury (planet)
      • Scope notes - specific definition for a term within that domain
  • Domain specification:
    • Limiting controlled vocabulary to a subject or domain
      • Mercury (god) not needed in medical vocabulary because it is out of scope. 
  • Hierarchy
    • Embed concept in hierarchical context
      • Metals 
        •  Mercury
    • In the case above, Mercury, if within subject of Metals, does not need parenthetical qualifier (metal) to differentiate between the metal and Mercury, the god, because the latter is not related to Metals. 
The "Semantic Neighborhood" Concept
  • The concepts represented in controlled vocabularies can be thought of as residing in a semantic neighborhood. 
  • Each concept has relationships to other concepts in four directions:
    • "North/South" for broader and narrower concepts
    • "East/West" for related concepts
  • These relationships provide the intellectual infrastructure of a domain to the indexer and the searcher ("riding the rails").
 Broader and Narrower Concepts
  • Hierarchical relationships -n "riding the rails" in northerly and southerly directions:
    • In classical terms: parent/child relationships - mammal ... dog
    • In ontological terms: "is/a relationships" - dog is a mammal
  • Broader term (BT) relationships (northerly):
    • Also termed "subordinate" relationship.
    • Gives the context of the concept for disambiguation purposes
  • Narrower term (NT) relationships (southerly):
    • Also termed "subordinate" relationship
    • Indexers ALWAYS use the most specific term (specific entry)
Related Concepts - East/West
  • Equivalence and associative relationships - "riding the rails" in east/west directions.
  • Equivalence reltaionships - "use" & "use for" (UF):
    • Relates authorized terms and entry terms (i.e. synonyms)
      • Maintenance (UF upkeep)
      • Upkeep use Maintenance
        • i.e. maintenance is the preferred term used for searching.
  • Associative relationships - "related terms" (RT):
    • Also called "see also" terms
    • Terms have non-hierachical relationship (i.e., not synonymous)
      • Birds (RT Ornithology)
        • If someone comes in and asks for a book on bird, and you ask if they want a book on birds or if they want a book on the study of birds you will know if you should use "ornithology" as your search term.
Semantic Neighborhood Examples
Subject Analysis in Context
  • Obtain resource
  • Describe resource in surrogate record.
  • Subject analyze resource in surrogate record:
    • Verbal
    • Classification
Basic Concepts of Subject Analysis
  • The reader is the focus - how do users ask for resources when using retrieval tools?
  • Collocation using subject criterion
  • Terms must represent common usage among users of the retrieval tool.
  • Specific Entry - professional indexers ALWAYS index to the most specific relevant term.
Performing Subject Analysis - Two Steps
  1. Read for subject analysis to generate concepts that describe the "aboutness" of the information resource:
    • DO NOT read entire work
    • DO read:
      • Preface
      • Introduction
      •  Table of Contents
      • Some text, especially noting those words that are bold, italicized, part of a caption, etc. 
      • Bibliographies
  2. Translate concepts into the controlled vocabulary of the information retrieval tool.
Reading for Subject Analysis Purposes
  • Take notes while reading.
  • Write down common themes.
  • Use dictionaries, encyclopedias, etc. to clarify terminologies. 
  • Get a clear picture in mind about primary and secondary topics (the "aboutness") of the work.
Indexing Exhaustivity
  • Addresses the question: How many subject terms per resource?
    • Important to indexers because of the need to maintain consistency across resources indexed in a retrieval tool
    • Important for searchers to know in advance of a successful search. 
  • Summarization:
    • Purpose is to summarize subject content of resource
    • Used in library cataloging (books contain secondary indexing)
  • Depth Indexing:
    • To exhaustively describe subject content of resource
    • Used in article indexing (and web page indexing) for resources that often do not have secondary indexing structures. 

No comments:

Post a Comment