Showing posts with label Organization of Information. Show all posts
Showing posts with label Organization of Information. Show all posts

Thursday, July 21, 2016

Classification of Resources IV: Information Architecture, Digital Libraries, and Social Classification

Subject Access Problem, Revisited

  • How does this apply to Web searching?
    • Automatic indexing (earlier in semester), such as what Google docs: Does a centralized index address the soda versus pop controversy?
    • Today, we explore possibilities that center on the human organizing of Web resources.
Challenge of Providing Access to Web Resource Collections
  • Can we build true digital *libraries*?
    • Can we systematically arrange Web resource collections by subject in a way that is useful to those seeking information? 
    • Can we replicate the "great reduction" phenomenon?
  • Can we insert these digital libraries into the lives of our users where the time available for finding information is like to be a major constraint?
    • While at home
    • While at work
Search vs. Browse of Info Resources
  • Search
    • Of surrogates
    • Bibliographic
    • Cataloging
    • Verbal subject analysis
    • Results list
    • A website's search engine
  • Browse
    • Of actual info resources
    • Bibliothecal
    • Shelf arrangement
    • Classification
    • Navigation
    • A website's architecture
Organizing Hyperlinks to Web Content
  • Hyperlinks to what types of files?
    • Text
    • Audio
    • Video
    • Powerpoint
    • etc.
  • Hyperlinks to whose content and to what kind of structures?
    • Content: Our own content ("stickiness") or someone else's content (traditional library approach)
    • Structure: Websites, sub-sites or webpages
Classification & Hyperlink Organizing
  • Main goal is to transcend the limitations of shelf arrangement in physical libraries:
    • E.g. multiple class numbers in classified catalog
    • Arranging links in multiple hierarchical locations (poly-hierarchy)
  • Another important goal is to take end-user perspective into account:
    • "Views" of a website in the case customized for "Future Students", "Current Students", etc. 
    • Social bookmarking (let "pop" be pop)
What is the Documentary Unit?
  • For collections of hyperlinks to internal Web resources (i.e., self contained websites):
    • Keeping users on the webpages of your website
    • The emerging field of Information Architecture
      • Also referred to as navigation design for self-contained websites with few links to external content.
      • About creating navigational and organizational structures that put users in touch with the information they need in a website as efficiently as possible (similar to physical library signage, but also vocabulary control for labels).
      • Job titles include:
        • Interaction/Interface Designer
        • Usability Engineer
        • User Experience Designer (UX)
  • For collections of hyperlinks to external Web resources:
    • Linking to webpages: Google and other search engines
    • Linking to web sub-sites: Human constructed link indexes such as those on library websites
    • Linking on entire websites: Human constructed link indexes such as those on library websites.
Structural Standards for Web Resources
  • Are websites structured link books? No. Not always. That's why we have to worry about the differing structuring of information. 
  • There is a lack of standards for structuring websites across publishers:
    • Many "vanity" web publishers
    • "Commercial publishers are making inroads
How Libraries Organize Web Resources
  • Standards approach: Cataloging - use of 856 tag:
  • Customization approach - Web lists by subject:
    • Mathematics (list at University of Alabama Rodgers Library)
Faceted Classification (Non-hierarchical)
  • Can be a part of a comprehensive system, e.g. Colon Classification
    • Ranganathan, PMEST and the Colon Classification 
    • Fungal diseases in the rice crops of Madras, 1950-1959: J381,4:433.441'N5
  • Can be a part of a hierarchical system as a non-hierarchical specification of the aspects of a subject:
  • Often Used for web organization:
Social Classification
  • User added metadata
  • Shared resources (commonly used for collections of photos and URLs)
  • Organized via third party collaborative websites.
  • Also known as folksonomy, ethnoclassification and free-tagging.
  • Tagging - the establishment of a relationship between an online resource and a user:
    • No centralized vocabulary control
    • However, intent is to match an individual with other individuals who not only have the same interest, but also share the same way to express the aboutness of that resource (let "pop" be pop and "soda" be soda)
  • Contexts (know what is being organzied!) [ e.g. Photographs]
    • Virtual Photographic "Shoeboxes"
      • Collections of digital photographs stored at a third party website
      • Users "Tag" their photos with descriptors and descriptors can be searched.
      • Lack of vocabulary control; however, in this "social" context, netiher precision nor recall is important. 
      • EXAMPLE: third party photo aggregator: Flickr
Know your Resource Collection!
  • What type of resource?
    • Pages or collection of pages?
      • Webpages, articles, books or other?
    • Are they structured consistently?
    • Do they follow conventions of publishing?
  • How are resources organized?
    • Searchable surrogates?
      • Authority control?
      • Automatically indexed?
      • Human (professional or civilian) indexer?
    • Browsable directly?
    • Both searchable and browseable?

Article Summary for Lecture # 14 - Manuel

Digital as a Hegemonic Medium for Epistemology and Knowledge Organization

Epistemology was born in the European Modernity in order to control others’ knowledge. Its study deals with foundations, criteria and validation by which scientific knowledge is justified, including historical, political, economic, social, etc. circumstances. Rosa San Segundo Manuel, and Daniel Martinez-Avila believe that the approach of epistemology have changed within the 21st century. New social conditions, industrial production, advances in medical and scientific research reflect the changing involvement of technology in day-to-day life. They cite the internet as a bringer of change in media, scientific and epistemic contexts.

                Manuel, et al., break down information into three main ages of human history:
  • The Oral Age – Primitive ideas of social organization and structuring were developed. Information was transferred orally, kept internally.
  • The Written Age – Developed a need to organize all the written knowledge. Writing and the role of the printing press allowed ideas to be shared with a wider audience.
  • The Digital Age – Current age. Printed and oral information become digitized and uploaded onto the web.

The authors also note the organization of knowledge in the digital environment is represented, invented, and articulated with two new fundamental instruments: a material one, digital technology, and a symbolic one, the deposited culture.

                According to Manuel, et al., culture is being subordinated to technology, and theorize that soon there will be a digital repository of all constructed objects of a culture – making the digital not only the instrument and the location, but also the content itself. In this new culture, individuals will participate through blogs, wikis, and social networks. They believe digital natives have a new way of thinking in the Digital Age that embraces the hybridization of materials, formats, and texts, globalization of information, connectivity, virtuality, and hypertextuality.

                These digital natives do not read the same way printed text is meant to be read, and have new ways of learning, memorizing, and participating via Web 2.0 and 3.0 technologies. They are immersing themselves into digital reality- they have an electronic mailbox, participate in social networks, communicate via blogs, wikis, etc. The authors believe this will lead to a change in epistemology, which will be a sort of post-epistemology that approaches a new structure of knowledge.

                Overall, I agree with the authors’ ideas about the evolution of technology, and how the digital world is slowly changing the world in which we organize, study, and evaluation information. However, I found Manuel and Martinez-Avila’s article a little too brief. There were several ideas presented that I felt could be expounded upon, and found their highly technical vocabulary hard to decipher and navigate without a second pass. This being the case, it was a little difficult to extrapolate information to create this post, but do feel as though the article has its merits. This article does present some very interesting points about the future of information study, so if you are interested in how the digitization of our lives is affecting this, then this article is definitely for you!

_____________________________________________________________________________
For more information and clarification, check out the full article (citation below)!

Manuel, R., & Martinez-Avila, D. (2014). Digital as a hegemonic medium for epistemology and knowledge organization. Advances in Knowledge Organization 14:96-100.

Tuesday, July 19, 2016

Classification of Resources III: Research Library Schemes

Classification Definitions

  • Classification - the act of organizing a body of knowledge into a systematic order
    • In libraries: the systematic arrangement by subject of books and other materials on shelves, or of catalog and index entries, in a manner that is most useful to those who read or those who seek a definite piece of information.
      • Shelving device
      • Organization device
Characteristic of a Classification System
  • Inclusive and comprehensive
  • Systematic
  • Flexible and expansive (i.e. hospitable)
  • Employ terminology that is clear and descriptive
  • Indexed
Classified Shelf Arrangement of Books
  • Collocating objective: bringing like things together on library shelves:
    • Subject criterion: But what about books on multiple topics?
    • Author criterion: But what about books by multiple authors?
  • Need a system for the unique identification of resources in open stack libraries through the use of notational systems and call numbers.
Notational Systems
  • Notation marks (aka "classmarks") represent a book's subject class, including its relation to other subject classes in a classification scheme.
  • Most common types:
    • Pure - e.g., DDC employs Arabic numerals
    • Mixed - e.g., LCC employs an alpha-numeric notation
  • Mnemonics:
    • Repeating class notation patterns throughout classification
    • This technique is also used in the MaRC system
Shelf Arrangement and Sub-arrangement
  • Library classification schemes provide:
    • A systematic method for shelf arrangement in open stack libraries
    • A systematic method for sub-arrangement within each class
  • To accomplish this goal, catalogers synthesize (i.e. create) class numbers to represent the subject of a book
  • This process remains transparent to the user (i.e. the user is more interested int he fact that books are collocated on the shelf rather than how the numbers were determined.
Cuttering to Create a Call Number
  • Used to create call numbers (UIs) for individual library collections.
  • Provides the link between the surrogate record an the actual item in the collection
  • Provides for both subject and non-subject (e.g. author) oriented sub-arrangement in open stack libraries. 
  • After the class notation is determined, then the cuttering process begins.
  • Use a cutter table:
    • There may be more than one cutter table floating around!
    • Cuttering is a flexible process; only use table as a guide
  • Steps for cuttering:
    • Determine the first letter of main entry (most often taken from 100 tag; but could be 245 tag)
    • Use number associated with second letter of main entry
    • Add additional numbers until call number is unique 
    • Add date
  • You may have 1 or 2 cutter numbers, BUT you may never have 3.
  • In the LCC, cuttering instructions are given in the schedules:
    • Assists in topical, geographical and other non-main entry-based sub-arrangement
    • Occasionally, this involves double cuttering, thus precluding the use of main entry-based cutters. 
LCC - General Points
  • Mixed notations
  • All letters are not used (reserved for expansion)
  • Subject specialists develop and maintain the class schedules, rather than centralized editors as is the case with the DDC.
  • Designed to meet the needs of the collection of the Library of Congress.
  • Based on literary warrant: Schedules developed with reference to what has been published. 
  • Hierarchical; however, NOT reflected in notation. 
  • LCC Classification outline:
    • Divided into main classes according to academic discipline or areas of study and then into sub-classes representing branches of those disciplines
    • Larger range of letters for History (C-G) and Social Sciences (H-L)
    • Numericals used range from 1-9999 with frequent gaps
LCC is Essentially Enumerative
  • Aspects of subjects are explicitly provided for (i.e. enumerated) in the schedules:
    • More "pigeonholes" created in advance to represent very specific topics and topical aspects.
    • However, tables are also used to synthesize complex class notations (not as often as with the DDC)
  • As an enumerative classification, the LCC schedules are more voluminous than many other schemes:
    • 50 volumes
    • 10,000 pages.
Consistent Structuring: Martel's 7 points
  • Initially, Charles Martel provided basis for consistent structuring across divisions:
    • General forms: periodicals, societies, dictionaries, etc
    • Theory, Philosophy
    • History
    • Treatises, General works
    • Law, Regulations, State relations (now relocated to K)
    • Study and teaching
    • Special subjects and their subdivision from general to specific
LCC has Many Editors 
  • LCC has been referred to as a series of special classifications (though compare with NLM)
  • However, individual LCC schedules are structured identically:
    • Preface
    • Broad outline
    • Detailed outline
    • The schedule, itself
    • Any necessary, auxiliary tables
    • Detailed index
Revising LCC with Topic Subdivisions
  • Constitute the bulk of the expansion of classes and subclasses
  • EXAMPLE: Women Suffrage in U.S. (JK1880)
    • Note geographical subdivision of "Women's Suffrage. Women's right to vote" - in this case, cutter by location
    • Also note geographic subdivision under subtopic "Biography"
Procedures for Revision and Expansion
  • Proposals for changes originate with LC catalogers:
    • Anomalies
    • New topics
  • Some methods for expansion:
    • Using unused letters (I, O,W, X and Y)
    • Adding a third letter (or sometimes a fourth)
    • Extending existing numbers decimally
    • Expanding use of cuttering
General Steps for Classifying with LCC
  • Because of disciplinary aspects of LCC, first check for appropriate schedule to match subject of item in hand and then determine the best class number within the selected schedule. 
  • Class item in hand with similar works:
    • Consult existing records
    • Consult class numbers mapped from assigned LCSH
    • Consult LCC outlines
Classifying General Works
  • Under most numbers with subdivisions, a number is designated for "general works."
Classifying Works on Single Topics
  • Always use the most specific class number that is co-extensive with the subject matter of the work.
  • If no co-extensive number exists, then the next appropriate broader number should be used. 
Classifying Works by Time Period
  • For works treating a topic with regard to a particular time period. For those works spanning two periods use earliest. 
  • DC History of France:
    • DC725 Earliest to 1515
    • DC727 16th century
    • DC729 17th-18th century
    • DC731 1789-1815
    • DC733 1815-1870
    • DC735 1871-1914
    • DC736 1914-1921
    • DC737 1922-
Multi-Faceted Single Topic Works
  • Works covering multiple facets of a single topic.
  • If available, use a class number representing all facets:
    • Idea of the English landscape painter: genius as Alibi in the eary ninetheenth century by Kay Dian Kriz
    • ND1354.5.K75 1997
  • If number covering all facets is not available, go with the emphasized or more prominent facet:
    • Elizabethan mineatures, by Carl Winter
    • ND1337.G7.W5 1955
Classifying Works on Multiple Topics
  • A work on two or three topics treated equally:
    • Use number for the topic treated first
    • QD412.A7 C53 1994 - Chemistry of organic arsenic, antimony, and bismuth compounds
  • A work on four or more topics:
    • Use a general number that encompasses all numbers chosen
    • RA566.3.C48 1992 - Changing U.S. health care; a study of four metropolitan areas
Classifying Works with Phase Relations
  • A work covering the relationship between two topics is classed in the most specific number covering the relationship:
    • QK46.5.H85 P66 2001 - Botany of desire; a plant's eye view of the world
    • QK46.5.H85 - Botany > Human-plant relationship
  • A work covering the influence of one topic on another is classed with the topic being influenced:
    • LA2317.T8 M3 1899x - Henry Tutwiler and the influence of the University of Virginia on education in Alabama
    • LA2317.T8 - History of U.S. Education > Individual biography
Cuttering for Geographical Aspects
  • Geographical cuttering option is often available in parts of the schedules:
    • Use either general or specialized cutter table
  • EXAMPLE:
    • Visual Arts Education:
      • N353.A-W - General works in Visual Arts in U.S.
      • N354.A-Z - General works in Visual Arts in a state
      • N355.A-Z - General works in Visual arts in a city
Cuttering Sub-topics - "Reserve Cutters"
  • Cuttering by topical aspect is also sometimes available - this is termed "reserve cuttering"
  • Style, Composition, Rhetoric:
    • P301 - General Works
    • P301.3.A-Z - General Works by region or country
    • P301.5.A-Z - Special Aspects A-Z
      • P301.5.134 - Idioms
      • P301.5.P73 - Propoganda

Thursday, July 14, 2016

Classification of Resources II: Public Library Schemes

Classification Definitions:

  • The act of organizing a body of knowledge into a systematic order
  • In libraries: The systematic arrangement by subject of books and other materials on shelves, or of catalog and index entries, in a manner that is most useful to those who read or those who seek a definite piece of information
    • Shelving Device
    • Organization Device
Characteristics of a Classification System 

  • Inclusive and comprehensive
  • Systematic
  • Flexible and expansive/hospitable to new knowledge
  • Employ terminology that is clear and desriptive
  • Indexed
Notational Systems
  • Notation marks (a.k.a. "classmarks") represent a book's subject class, including its relation to other subject classes in a classification scheme. 
  • Most common types:
    • Pure - e.g. DDC employs Arabic numerals
    • Mixed - e.g. LLC employs an alpha-numeric notation
  • Mnemonics:
    • Repeating class notation patterns throughout classification
    • This technique is also used in the MaRC system
Shelf Arrangement and Sub-arrangement
  • Library classification schemes provide:
    • A systematic method for shelf arrangement in open stack libraries
    • A systematic method for sub-arrangement within each class
  • To accomplish this goal, catalogers synthesize (i.e., create) class numbers to represent the subject of a book. 
  • This process remains transparent to the user, i.e., the user is more interested in the fact that books are collocated on the shelf rather than how the numbers were determined.
Historical Development of DDC
  • First published in 1876 - "A Classification and Subject Index for Cataloging and Arranging the Books and Pamphlets of a Library."
    • Current edition: 23rd 
    • Electronic version available
  • Most widely used classification scheme in the world (135 countries - translated into 30 languages)
  • Innovations:
    • Relative index
    • Integrity of numbers (with 2nd edition)
Conceptual Framework of DDC
  • Basic classes are organized by discipline (i.e., fields of study).
  • Divisions of DDC:
    • Ten main classes (0XX, 1XX, 2XX, etc), which together cover the entire world of knowledge
    • Each main class is divided into ten divisions (100 total divisions in DDC)
    • Each division is divided into ten sections (1000 total sections in DDC)
    • Class 000 is most general
      • Used for works not limited to any specific discipline (e.g., encyclopedias, newspapers, general periodicals)
      • Used for certain specialized disciplines that deal with knowledge and information (e.g., library science, computer science, journalism)
    • Each of the other main classes (1XX to 9XX) comprises a major disciplien or group of related disciplines. 
  • DDC IS ARRANGED PRIMARILY BY DISCIPLINE AND NOT BY SUBJECT; therefore, a given subject is likely to appear under more than one class numbers.
Relative Index
  • Disciplinary focus of DDC causes subjects to be scattered across the classification; the Relative Index to the schedules is needed to collocate. 
  • Relative index relates subjects to the various disciplines to which they may belong:
    • Journalism - generally found at 070.4
    • Journalism - civil rights issues at 323.445
    • Journalism - sociology at 302.23
  • Dewey's theoretical contribution to library classification.
Principle of Hierarchy
  • Structural Hierarchy (inheritance):
    • Whatever is true of the whole is true of the parts
    • This is termed "hierarchical force"
    • EXAMPLE:
      • All classmarks under 5XX are related to the natural sciences and/or mathematics
      • All classmarks under 612.1... are related to blood and circulation
    • Notational hierarchy (relationships between concepts):
      • Subordinate: 621.4 is subordinate to 621
      • Coordinate: 621.4 is coordinate with 621.6
      • Superordinate: 621 is superordinate to 621.4
Revision Process for DDC
  • Suggestions sent to Library of Congress
  • Reviewed by:
    • LC - Decimal Classification Division
    • OCLC - Forest Press
    • 6 DDC editors
    • Editorial Policy Committee (EPC)
  • Role of EPC (Editorial Policy Committee)
    • Works closely with DDC editors to:
      • Suggest changes
      • Facilitate innovations
      • Monitor general development of the Classification
    • Reviews all versions of the Classification:
      • Print
      • WebDewey
      • Full edition
      • Abridged edition
  • EPC Membership
    • 10 member international committee
    • Elected membership represents Dewey users worldwide
    • Members come from all types of libraries
    • Current representation:
      • American Library Association
      • Australian Committee on Cataloging
      • Chartered Institute of Library and Information Professionals
      • Library of Congress
      • National Library of Canada
      • OCLC
  • Types of Revisions
    • Expansion:
      • Introduction of new subject as part of a class scheme
      • Provides more minute subdivisions
    • Reduction:
      • Rarely used subdivisions are deleted and marked by brackets([]) in the class scheme listing
      • "Starvation policy" allows DDC to phase out the deleted numbers with the next edition printed
    • Relocations:
      • To rectify an improper placement
      • To eliminate duel provisions
      • To make room for new subjects when there is not available number room
      • To realign fields of knowledge
    • Reconstructed Schedules ("Phoenix schedules")
      • An entire schedule is reconstructed without regard to previous divisions
      • Rarely used due to integrity of numbers practice
Classifying with DDC
  • First, determine subject of work
    • Subject analysis is central to library classification
    • Must determine the intent of the author by examining:
      • Title - never the sole source 
      • Table of contents - lists main topics discussed
      • Preface and introduction - can indicate author's intent
      • Scanning text, itself - provides guidance and confirmation
      • Bibliographic references - can also list topics discussed
      • Outside sources - helpful for verifying advanced subject
  • Second, determine discipline of work
    • Guiding principle is that a work is classed int he discipline for which it is intended rather than in the discipline from whicht he work is derived
    • This enables works that are used together to be shelved together
      • EXAMPLE: Zoologist's book on agricultura pest control would be classed with other books on pest conrtol rather than with other books on zoology.
  • Third, translate findings into appropriate DDC class
     *NB*
  • What to do with multiple subjects/same discipline:
    • Class works covering interrelated subjects with the subject that is being acted upon (rule of application):
      • "Shakespeare's Influence on Keats" with Keats
      • "Great Depression's Impact on American Art" with American Art
    • Class works covering two subjects equally with the subject whose number appears first in the schedules (first-of-two rule):
      • There may be exceptions in instructions in the schedules
    • Class works covering three or more subjects that are all subdivisions of a broader subject with the first higher subject that includes them all (rule of three):
      • "History of Portugal [946.9, Sweden [948.5] and Greece [949.5]" is classed with the history of Europe at 940.
  • What to do with more than one discipline:
    • These works are examples of interdisciplinary research
    • Interdisciplinary is predictable; therefore, there may already be a place in the schedules for works that are interdisciplinary:
      • Check for interdisciplinary numbers such as 305.231
      • Class works not give an interdisciplinary number in the discipline given the fullest treatment.
Key Features of Schedules and Tables
  • Summaries
    • Summaries provide an overview of the structure of a class
    • "Birds Eye View" of a section of the DDC to help classifier to focus in on the possible classes numbers for a work
    • Because of hierarchical force, summaries at main class, divisional and sectional levels hold for all subordinate class numbers
  • Entries
    • Each entry contains a Dewey class number and descriptive information:
      • Heading
      • Notes
    • Additional Dewey class numbers may be:
      • In parenthesis - these numbers provide alertatives to standard practices
      • In brackets - these numbers represent subjects that have been discontinued or relocated
  • Notes
    • Notes provide additional information that is not obvious from a class' position in the notational hierarchy.
    • Classes of notes:
      • Notes that describe what is found in a class
      • Including notes
      • Notes that describe what is found in other classes
      • Notes that explain changes or irregularities
    • Scope notes - definition used within a knowledge organizing system to say what a term means within that system (see 700)
    • Former-heading notes reflect that the formal label for the classmark has changed (see 281.63)
    • Variant-name notes notate that a specific number goes by different names (see 332.32)
    • Class-here notes - confirmation that you're in the correct place (see 371.192)
    • Including notes - "Including" is a code word that notates a specific number is about to go through an expansion (see 362.16)
    • Class-elsewhere notes - offers other places that the book might ought to be placed(see 791.43)
    • See references notes (see 577.7)
    • See-also references (see 584.3)
    • Revision notes are used to indicate when a subdivision or class has been completely or extensively revised.
    • Discontinued notes - (see 004.696)
    • Relocation notes - (see 687.43)
Number Building
  • Synthesis of two numbers to create a complex Dewey class number.
  • Can be multiple Dewey numbers synthesized into a single number:
    • Book on advertising in libraries (659.1902)
      • Use 659.19 for advertising in special organizations
      • 02 for libraries (dropping the trailing 0 from 020)
  • Can be synthesized:
    • Using one of seven Dewey tables
    • Using a "Number-built note" (see 353.13263
DDC Tables
  • Table 1 - Standard Subdivisions:
    • Contains mnemonics for standard subdivisions
    • Used to add facets to the class number (education is 072, geographical is 09, etc)
    • Used to connect a subject with a standard facet
      • - 01 Philosophy and theory
      • - 02 Miscellany
      • - 03 Dictionaries, encyclopedias, concordances
      • - 04 Special topics
      • - 05 Serial publications
      • - 06 Organizations and management
      • - 07 Education, research, related topics
      • - 08 History/ description with respect to kinds of persons
      • - 09 Historical, geographic, persons treatment
    • When analyzing Table One numbers (and Table One numbers ONLY), looking for the connecting "0" between subject and facet
    • Look up subject class in the schedules
    • Look up facet class in Table One
    • Most of the time, the connecting "0" is the first occurence:
      • 635.13074 = 635.13 for Carrots and 074 for catalogs
      • But not always: 020.25 = 020 for LIS and 025 for Directories
  • Table 2 - Geographic Areas, Historical Periods, Persons:
    • Contains mnemonics for geographical areas, etc. 
    • Used to add geographic facets to the class number:
      • United States: 973
      • Southeastern States: 975
      • Alabama: 976.1
      • Tuscaloosa County: 976.184
  • Table 3 - Subdivisions for the Arts, for Individual Literatures, for Specific Literary Forms:
    • Contains mnemonics for subdivisions for the arts, for individual literatures, for specific literary forms
    • Tables 3A-3C needed to determine specific notation to be used
  • Table 4 - Subdivisions of individual languages and language families
  • Table 5 - Racial, ethnic, national groups
  • Table 6 - Languages
  • Table 7 - Groups of persons

Tuesday, July 12, 2016

Classification of Resources I: Concepts, Problems, Issues

Two Perspectives on Classification
  • Classificationists (i.e. editor) - those who create and edit the conceptual places to put things
    • Need to create workable classifications
      • What is the basis for a classification?
      • What are the ramifications of a classification?
      • How hospitable is a classification to changes in knowledge over time?
  • Classifiers - those who place things into these created places when organizing
    • Need places to put things
      • Is there a place for this thing?
      • Is this think with other like things?
Classical Classification
  • Aristotle and the natural world:
    • Mutual exclusivity: "in or out" - something is an animal or it is not an animals
    • Inheritance: based on the assumption that there is a natural hierarchy to the world (all animals share same characteristics that differentiate them from plants)
    • Basis for scientific classification
  • Medieval scholastics and their resources:
    • Classification of scholarly materials based on the academic divisions of study of that time - 
    • This classification reflected those areas taught to scholars. 
    • Classification provides the basis for shelf arrangement.
Scientific (Linnaean) Classification


  • Largest example of formal classification:
    • Managed over time by classificationists
    • Binomial nomenclature (genus/species)
    • Specific rules for classifiers to follow (e.g., members of the same species are capable of interbreeding to produce fertile offspring)
  • Follows classical approach:
    • Mutual exclusivity of classes
    • Inheritance
  • EXAMPLES:
    • Two species contexts:
    • Dealing with changes in the classification itself:
      • Evolution of existing species - e.g. newly emerging drug resistant infectious diseases.
      • Newly discovered species - e.g., the division of Monera into two distinct kingdoms
Library Classification
  • Inspired by scientific classification:
    • Early library classificationists were leaders in the application of classification principles outside of the scientific domain
    • However, book classification provided challenges in comparison to the classification of natural objects
    • Is a book like a starfish?
  • Historically U.S. libraries used classification for shelf arrangement, while European libraries used classification to facilitate retrieval of surrogates:
    • Open stacks require a system for shelf arrangement of books
    • Closed stacks rely on the catalog to provide logical groupings
The Nature of Book Classification
  • Collocating objective: Bringing like things together on library shelves:
    • Subject criterion: What about books on multiple topics?
    • Author criterion: What about books by multiple authors?
    • Subject/author criteria: What about books by the same author on different topics?
  • Solving the need for a system of unique identification in open stack libraries through notational systems an call numbers.
Questions in Book Classification
  • How do libraries provide for the collocation of like books (2nd Cutter objective) while at the same time facilitating the retrieval of known items (1st Cutter objective)?
    • Cutter's solution - two part call number:
      • Call number is made up of the class notation (classmark) and the Cutter number
      • The classmark provides for the fulfillment of the collocation objective
      • The system for Cuttering provides unique call within each library that provides for the known item retrieval objective - helps with unique idenfitifaction.
    • A book is not a starfish:
      • Books, as physical objects can only be in one place at a time, even if they are about multiple topics
      • Libraries do not buy multiple copies
      • Conceit of the cataloger revisited
      • What about networked resources: Is a website a starfish?
    • Comparison of I.S. and European libraries:
      • U.S. libraries: open stacks and "mark and park"
      • European libraries: closed stacks and classified cataloging (i.e., the assignment of book to multiple classes)
  • How many books should be grouped together in the stacks (the efficient browsing problem):
    • Broad classification
    • Detailed ("close") classification
  • How should books within each class be displayed (the subarrangement problem)
    • We will compare the needs of the typical research library, with its larger collection of specialized books, with the typical public library, with its smaller collection of broader books.
Four Types of Library Classifications
    • Universal: Intended to organize all of knowledge:
      • DDC
      • LCC
      • UDC
    • National General: Same as universal, but limited to a specific country:
      • Nederlandse Basisclassifatie
    • Subject Specific: Intended to organize a domain:
      • NLM Classification
    • Homegrown: Built as needed (e.g., Yahoo directory)
Classification Concepts
  • Broad versus Close Classification
    • General strategy employed by individual libraries. 
    • How many class numbers to use?
      • The larger collection, the closer the classification (i.e., more books require more detailed class numbers)
      • Can vary within a library's collections (i.e., a given library may have larger collections in a certain subject area)
      • Otherwise, there will be too many books classed together or there will be too few books under each class (each of which impede efficient browsing)
  • Classification of Knowledge versus Classification of a Particular Collection
    • Relates to the intent of the classification:
      • Classification of knowledge approach provides pigeonholes for all subjects in advance of the use of that classification
      • Particular collection approach has mechanism to create new pigeonholes as resources are added to collection (literary warrant)
    • DDC began as universal, but updates to the classification are now through literary warrant.
    • LCC began through literary warrant, but the nature of the LC collection makes the LCC a  de facto universal scheme.
  • Notational Integrity Over Time
    • Attempting to Maintain the same meaning of a class notation over time:
      • Response to the problem of accounting for the growth of knowledge over time
      • A challenge for the classificationist
    • Classifications can be designed to handle the growth of knowledge
  • Fixed versus Relative Location in Closed and Open Stack Libraries
    • In terms of efficient storage, what is the most obvious characteristic of an open stack library?
      • When storage space is at a premium, use the fixed location approach of closed stacks libraries. 
      • Resources can be efficiently stored by size
      • The call number is an accession number
    • Relative location approach is employed in open stack libraries:
      • Physical spaces in collection allow for growth of the collection without a lot of shifting
      • Relative addressing is the key
      • A library collection is a single, linear sequence of books
  • The Case of Journal Shelf Arrangement
    • Alphaetic or classified?
    • Alphabetic by title:
      • Less costly to manage
      • What about title changes?
      • Problem: Journals on the same subject are scattered across the collection
    • Classified by subject:
      • Journals arranged by subject
      • Title changes are accommodated
      • Problem: Users must look up call number to find journal (this is not the case with alphabetical arrangement)
  • Faceted Classification (Non-hierarchical)
    • Can be part of a comprehensive system, e.g., Colon Classification:
    • Can be part of hierarchical system as non-hierarchical specification of the aspects of a subject:
      • DDC and LCC have tables for geographic and other facets
    • Often used for web organization:

Article Summary for Lecture # 11 - Barite

The Notion of "Category":
Its Implications in Subject Analysis and in the Construction and Evaluation of 
Indexing Languages

       Mario Guido Barite, a professor and researcher at the School of Librarianship at the University of the Republic in Uruguay, attempts to tackle the notion of category (a basic intellectual tool for the analysis of the existence and changeableness of things) and proposes conceptual and methodological reexamination from a functional standpoint. Barite claims that "most classifiers or indexers assume the role of classificationist since the present state of indexing languages entails minor and major surgery be performed to adapt these languages to users' requirements." 
   
      Categories necessarily are the foundation of any organizational system of knowledge, however  category, characteristic, or class are sometimes used indistinctly. It is not possible to characterize categories in the Theory of Classification, as categories are extremely general abstract expressions. Categories are used as tools to discover certain regularities of the material world, but Barite suggest properties as a possible category to analyze the material world since categories are, in their basic nature, extremely simple notions. Within the Theory of Classification, categories are only relevant as instruments of analysis and organization of object, phenomena and knowledge, and classificationists are used in three precise activities Barite mentions:
  • design, planning, and structuring of indexing languages or systems of knowledge
  • modification or specification of classification tables
  • the evaluation and analysis of indexing languages and systems of concepts through a set of parameters capable of establishing the grade of reciprocal tension among related concepts and their relevance and validity. 
       Since it is not possible to isolate the notion of category from those of object and analyst there are several object attributes that Barite suggests condition its study:
  • Any object  is naturally dynamic and mutable - that being the case, in order for the analysis to be completed, the object must be captured at a certain time and abstraction from its reality is required at a given moment.
  • The object may be real or ideal - it may have existed as may be corroborated by its existence registers or maybe it only has an immaterial existence, not physical, due to its nature. These particular characteristics seem to obstruct the analysis since analysts are condemned to act by approximation. However, once conventions have been clearly established by conses, abstract objects are easily systematized after agreement has been reached regarding what a theorem is or certain chronological and factual conventions of the French Revolution - the difficulty of giving intellectual access to the concept diminishes.
  • Some objects have delimitation problems - attempts to produce a definition usually create discrepancies and shades of meaning among experts, so much that they may cause a certain aspect of the object to be placed within one category or the other. But we also have the difficulties posed by the concepts that do not attain conventional agreement. To exemplify, think of the difficulty of approving by consensus the basic statements towards the definition of the concept  labor flexibilization from the viewpoint of a sociologist with a Marxist orientation and another one of ultra-liberal ideas.
  • A large part of the objects belong to, or occur in a phase of the time-space continuum, or rather flow along a section of that continuum. - Due to their mutating and dynamic nature, some objects achieve various configurations and undergo a double influence: that of the processes occurring as a result of the action of internal agents, and that of the processes caused by external agents. This double influence is the determinant of each specific configuration, since any object is in a given time and in a given spatial situation, the synthesis of the impacts brought about by such agents. 
      He then goes on to decompose the notion of category to extract its most typical characteristics:
  • Every category is a sectorial one. 
  • Every category implies a specific level of analysis. 
  • Categories are levels of analysis external to the object.
  • Categories are mutually excluding.
  • Every category is highly generalizable. 
  • Every category may admit, with reference to an object, variable levels of subdivision.
  • Agreement has not been reached regarding a limited collection of categories.
Barite concludes that the proposition of greater attention on the definition of category, because it involves essential theoretical practical aspects for the reasonable command of the theory of concepts by specialists. I agree that it is important to dissect, correct, and fully understand terminology. If we can't fully understand the terminology within a an organization system, the system will not be efficient enough to get quality information into the hands of patrons with ease. The article is a bit philosophic, which made it somewhat of a struggle to read, but the heart of the article and Barite's ideas are spot on. I'd recommend this for organizers everywhere, as it really gets you to think about the elements of a system and the terminology used.

________________________________________________________________________
 For more information see the full article (citation below!)

 Barite, M. (2000). The notion of "category:" Its implications in subject analysis and in the  construction and evaluation of indexing languages. Knowledge Organization 27:4-10.

Thursday, July 7, 2016

Verbal Subject Analysis III: Webpage Databases (a.k.a. "Search Engines")

Human vs. Automatic Indexing

  • Both are related to the subject analysis of information resources.
  • Human indexing is used to describe the subject analysis of various periodical databases.
  • Automatic indexing is a term used for the subject analysis operations by the computer algorithms of various webpage databases (a.k.a. search engines).
    • Research from the 60s-80s were trying to get a computer to calculate what articles were about. The most frequent words, articles like a, an, the, etc., don't really tell you much about the article, neither do the least used words. The key is finding the sweet spot based  on what the author usually writes about.
Why Webpage Database?
  • It is always important to know the documentary unit of an information database. 
  • The adjective associated with database is always a cue to the documentary unit. 
  • Webpage databases are informational databases in which a webpage is the documentary unit. 
  • They are also known as search engines and discovered databases.
Analysis of Websites and their Structure
  • What are webpages? What are websites? Webpages:Websites as pages:books
  • Standards (or lack thereof) for the authoring of web sites and webpages
    • HTML and other markup languages
    • Editors
  • What are the implications of the lack of authoring standards for web-based information resources?
Location of Webpage Subject Metadata
  • In webpage headers: For individual webpages, subject metadata can be created by authors and included in HTML headers.
  • In separate metadata record databases:
    • Subject metadata can be created by intermediaries using Dublin Core schema
    • In search engines, subject metadata is inferred "automatically" by computer algorithm.
Search Engine Questions
  • For greater understanding we need to be able to answer:
    • Why do search engines produce different results the exact same query?
    • What is the principle for ranking the display of search engine records in response to a query?
The Term "Search Engine"
  • The term has become the common designation for webpage databases, However, in actuality, webpage databases have three parts:
    • Spidering/crawling software to collect webpages.
    • Indexing software to build the index of surrogate records.
    • Retrieval software to facilitate retrieval of surrogates.
Automatic Indexing in Context
  1. Obtain information resource - spidering/crawling
    • Steps for spidering/crawling:
      • Computers owned by search engine retrieve documents by clicking on all hyperlinks on each retrieved webpage
      • Determination is made whether a webpage needs to be indexed (because it is new) or reindexed (if it has already been indexed)
      • Determination is made whether reindexing is warranted
      • New webpages and those meeting criteria for reindexing are then placed in the indexing queue
  2. Describe information resource in surrogate record - read off webpages by indexing software
    • Left Side elements must be inferred by searcher:
      • Examine structure of retrieved records
      • Examine advanced search interface
      • Element sets are not standard, i.e., they will vary across search engines.
    • Right Side Content:
      • What is the source for the content?
      • Authority control?
  3. Subject analyze information resource in surrogate record - indexing software:
    • Verbal - inferred by computer algorithm
    • Classification - inferred by computer algorithm
    • Subject Indexing in Search Engines
      • The subject fields of webpage surrogate records include the words that describe what the webpage is about. 
      • Right side subject content is inferred through the application of proprietary algorithms.
      • Subject terms added to surrogate records are weighted:
        • Doc #1: SU = dogs (.99); breeding (.87);dachshund (.30)
        • Doc #2: cats (.92); dogs(.44); dachshund (.03)
        • The weights are computed by proprietary algorithm.
Retrieval from Search Engines
  • Unlike bibliographic databases, in which the ordering of retrieved surrogate records is reverse chronological, search engines use a relevance-based ranking.
  • The search engine component of a search engine takes the entered query and compares it to the terms to the index.
  • The documents that are retrieved first are those that contain a higher "relevance" score:
    • Doc #1: SU = dogs (.99); breeding (.87);dachshund (.30)
    • Doc #2: cats (.92); dogs(.44); dachshund (.03)
    • "dog" query would rank document #1 ahead of document #2
    • "breeding" query would rank document #1 ahead of document #2
    • "cats"query would rank document #2 ahead of document #1
How are Subject Weights Calculated?
  • Conventional methods (Dating from the 1950s) for automatically inferring what a document is about include the following three techniques:
    • Frequency of word occurrences
    • Location of words occurrences
    • Size of word occurrences
  • In the web era, however, these techniques did not scale well to meet the needs of databases containing billions of records:
    • Could facilitate retrieval of relevant documents, but could not distinguish between "good" and "bad" documents.
    • Were also subject to manipulation by authors desiring higher search engine retrieval (spamming)
Two responses to Early Indexing Failure
  • Yahoo! era (late 1990's)
    • Human indexing (website directories)
    • More discussion during lectures on classification. 
  • Google era (since 1999)
    • Additional criteria introduced to infer aboutness, e.g.,;
      • $ - paid submissions, such as Alta Vista
      • Quality - PageRank algorithm of Google
Google Approach to Authomatic Indexing
  • Issue addressed by Google concerns the quality problem: How to cause the "best" documents to rise to the top of a set of retrieved webpages.
  • Solution concerns identifying additional criteria to include int he subject weighting algorithm.
  • Google maintains additional metadata elements for each surrogate record in its index of webpages:
    • How many other webpages link to a given webpage
      • The more webpages (i.e. linkers) a dachshund webpage has poiting to it, the more quality it has.
      • This factors into the weight assigned to the "dachshund" descriptor inthe subject field of its surrogate record
    • Who are the linkers
      • Those linkers that have a higher quality rank are given more weight than those linkers with a lower quality rank.

Article Summary for Lecture # 10 - Northedge


and beyond:
information retrieval on the World Wide Web
Northedge defines a web directory as “a human compiled list of links to web pages, typically organized into a hierarchical structure of subject categories.” Back in 1994, a mere 3 years after Berners-Lee created the “Web”; there were less than 10,000 websites. This number inflated to almost 3.5 million in 1998, and in 2006, it was estimated to be at over 100 million. Imagine if those websites were books. Without anyone to organize and sort through all of them, it would take forever for us users to retrieve any kind of information, let alone navigate the sea of changes that authors and creators make on a daily basis to their sites. If a librarian is involved, the user can submit their queries to the librarian.
                In the case of the internet, search engines are the librarians. Several criteria measure the quality of the search engine, such as:
  • The size of the corpus – the more books the librarian can search, the better.
  • The speed of the answer – if we do not get our information quickly, we will find another search engine.
  • The availability of service – if it is not available when it is needed, the users are going to find another search engine.

·         The accuracy of results – if the information the user gets back is not what they are looking for, and then they will find another search engine that will return related results. However, if the three preceding criteria are not met, accurate data is not going to be important. (See this post).
Search engines require their users to submit their searches through a search box, which allows the user to choose whatever terms they like – unlike web directories, which constrain users to search using vocabulary chosen by the indexer. Since it might take a while for a search engine to sift through over 100 million constantly changing websites, it only makes sense to implement an indexing program (called a spider or robot). This program accesses web pages, analyses their contents and records the results in a database (referred to as an “index”), which enables fast access to sought information and bridges the gap between the search engine and the requested content.

                Today, one of the most used search engine is Google. Google’s software agent (indexer), called “Googlebot” continually locates billions of web pages, analyses the content, and save the result in the Google index. The algorithms it uses are a company secret, as they are what sets Google apart from its competitors (Bing, Yahoo, etc.). Googlebot breaks down webpages into words and examines their context within the page (position – is it in a header, sub header, body text, etc.) and sources are returned to user, based on the algorithms weighted scale, in order of assumed most relevant to least.

                In addition, while Google’s search box may seem to ask “What subject do you want information on?” in reality, it is asking “What word or combination of words will be most likely to appear on web pages that address the subject I am interested in, an least likely to appear on pages that are irrelevant to me?”. This may trip up users who are unfamiliar with how search engines work, and this may be the one negative Northedge presents about search engines – there is no one-to-one correspondence between words and meanings, and a single word may have multiple meanings (search for Java – the country – and only results about the computer programming language are returned). He also offers information on alternatives to search engines, which include META tags (the assignment of subject keywords by the web content creators), and folksonomies/tagging (creation of a taxonomy by the collective actions of users on the Web – see del.icio.us and flickr). These alternatives are somewhat controversial, because users and/or creators can deliberately assign misleading or inaccurate keywords to the content for financial gain or malicious reasons.

                Ultimately, Northedge offers insight into the possible future of web searches, computer-generated indexes, but the data contained in those indexes may be driven by data sets produced by human indexing techniques and human linguistic research. I agree with this assertation, because it seems as more technologies are developed and released, the search process becomes more streamlined and tailored to what the user REALLY wants from their search. This article is very informative, and if you want to know more about the inner-workings of search engines, this is a fascinating read. I definitely came away knowing more about what happens once I search for "cat videos" on Google. 
______________________________________
To read the whole article, see the citation below:

Northedge, R. (2007, April). Google and beyond: Information retrieval on the World Wide Web. The Indexer, 25(3), 192-195.

Tuesday, July 5, 2016

Verbal Subject Analysis II: Periodical and Other Databases

Subject Cataloging vs. Indexing
  • Both are related to the subject analysis of resources. 
  • Subject cataloging  is a term used for the subject analysis operations in library cataloging. 
  • Indexing is a term generally used for the subject analysis operations in various other resource organization contexts, including periodical databases and search engines. 
Brief History of Periodical Indexes
  • Around the turn of the 20th century, the library community decided not to add article citations to the catalog. 
  • This development led to the growth of the commercial indexing industry. 
  • The result of this has been:
    • Split files
    • Fees for licensing database content
    • Difficulty fulfilling Cutter's 2nd objective
Analytical Cataloging
  • Analytical cataloging techniques are needed in order to provide access to the component parts of composite information resources, most commonly:
    • Book chapters
    • Proceedings articles (usually of academic meetings)
    • Journal articles
  • Definition from AARC2: Analysis is the process of preparing a bibliographic record that describes a part (or parts) of an item for which a comprehensive entry is made.
Analytical Cataloging Techniques
  • Complex entries made within the record of composite work [cheap]:
    • Analytical added entries:
      • Use 740 tag for second of two works mentioned in title of item
    • Note area for comprehensive entry of larger work:
      • Use 505 tag for structured display of table of contents.
  • Separate records created for the component parts of composite works ("In" Analytics)[expensive]:
    • Use 773 to trace the component part record to parent record
Analytical Access to Journal Content
  • Decision to not provide analytical access to journal content (i.e. directly to articles) was because of the expense:
    • Excessive number of records would have to be created.
    • Additional authority work would need to be done.
  • As a result, through the 20th century, cataloging and periodical indexing/bibliography creation techniques evolved separate approaches. 
Overview Comparison
  • Catalog
    • Authority work
    • Cataloging records represents the holdings of a library
  • Periodical indexes:
    • Subject indexes are extensive topical bibliographies (often include books and book chapters, too), usually covering large swaths of "territory"
    • Domain-wide indexes (e.g. Index Medicus) attempt to capture an entire discipline (may include book chapters, too)
    • No single library could ever own all items referred to in exhaustive bibliographies/indexes, thus leading to ILL (inter-Library Loan) services
    • Authority work nonexistent (except controlled vocabularies)
Surrogate Records in Periodical Databases
  • As is the case with library catalogs, periodical databases contain structured surrogate records. 
  • This structuring is fairly consistent across periodical databases, both in terms of stored records (two part metadata model holds) and how records are displayed
  • There is some authority control at work, but not in ways that you might think.
Collocation in Periodical Databases
  • By subject - what about vocabulary control?
  • By author - what about authority control?
  • By journal - what about authority control?
  • By language
  • By publication type
  • By date
  • Etc., etc., etc. 
In all Collocation Contexts: MATCH!
  • EXAMPLES:
    • Indexers → author name → match ← author name ← users
    • Indexers → journal name → match ← journal name ← users
    • Indexers → vocabulary → match ←vocabulary ← users
Inverted File Structures
  • How surrogate records are physically stored in the index of a database.
  • Each surrogate record has a unique identifies (also called a pointer)
  • Each word and phrase of the index has a record in the index; each record contains the UI for each surrogate record that contains that word or phrase:
    • Dog: 235, 527; 5,345,672; 117,127,923
    • Cat: 127; 2,753; 917,538; 327,543,238
How is Surrogate Information Stored?
  • Print periodical indexes and bibliographies. 
  • Online periodical databases:
  • ALWAYS KNOW THE START DATE OF YOUR ONLINE PERIODICAL DATABASE!
    • Manual search of the literature is often needed for exhaustive searches
    • Retrospective conversion of print indexes to online is not generally undertaken by database providers due to expense. 
Periodical Database Characteristics
  • They usually hold more records than a library catalog.
  • Information resources (i.e. journal articles) contain less information than the info resources in library catalogs and there are no detailed secondary navigation aids such as book indexes.
  • More fields (i.e. left side elements) available for search word qualification.
  • Important to distinguish database producing companies/organizations from database interface companies:
    • Some companies/organizations provide both (e.g. PubMed MEDLINE)
    • Other companies provide interface services, such as Dialog or EBSCO.
User Interfaces (UI) in Periodical Databases
  • Common UIs employed in periodical databases:
    • ISSN - uniquely numerical identification for individual serial publications
    • Internal numbering systems within a periodical database, such as the PMID in MEDLINE (for known item searches)
  • The most important UI is pre-Web and that is the "Address" of an article in the bibliographic universe (also known as the citation data) - also good for known item searches:
    • Journal name
    • Volume number (in some journals, the issue number also)
    • First page number of article. 
Authority Control in Periodical Databases
  • Titles - not controlled
  • Authors - somewhat controlled:
    • Indexers generally enter author name from the information resource
    • Control rests with periodical editors, who often have policies on author names that may be different than other periodical editors
  • Subjects - controlled:
    • Controlled vocabularies are imposed across periodical  and over time by indexers
    • However, subject searching is still subject to the problems associated with the "Great Pop vs. Soda Controversy"
Author Indexes in Periodical Databases
  • Examine how author data is entered into surrogate records:
    • Generally taken from the information resource in hand
    • PubMed MEDLINE is an exception
  • Some databases will provide lists of author names from which to choose:
    • Indicator of authority work?
      • Library Literature is an example
  • Subject Indexing in Periodical Databases
    • Indexing approach is "information access," therefore depth indexing is the general rule. 
    • Indexers index to the most specific, therefore, hierarchies remain important in controlled vocabularies. 
    • Pre-coordination and post-coordination are important concepts.
Subject Indexes in Periodical Databases
  • Depth Indexing:
    • General goal is to provide subject access to the information contained in an article.
    • However, this practice is not universal across database producers; therefore, determine the depth of indexing of the database you are searching by examining existing records
  • Be aware of the lag time related to subject indexing of articles
Management of Controlled Vocabularies
  • Homonymy:
    • Addressed by domain specificity of most periodical databases
    • Various forms of qualification are used, including parenthetical, hierarchy, and scope notes. 
  • New concepts:
    • Literary warrant is generally not employed in periodical database vocabularies
    • New terms are added after new concepts have established themselves in the literature
    • This poses a challenge if you are searching at a "research front" (may need to perform a keyword search strategy of the abstract field)
Some Vocabularies for Periodical Databases
Other Controlled Vocabulary Contexts
Web Content for Human Indexing
Indexing in Context
  1. Obtain information resource
  2. Describe information resource in surrogate record
  3. Subject analyze information resource in surrogate record:
    • Verbal 
    • Classification
Article Indexing Process
  • Two steps:
    • Analyze information resource to generate list of candidate concepts that describe its subject content
    • Translate those concepts into the controlled vocabulary of the database
  • ISO Standard for article indexing - special attention should be paid to certain sources of information:
    • Title
    • Abstract (when provided)
    • Introduction; opening and concluding paragraphs
    • Illustrations, diagrams, etc and their captions
    • Words or groups of words that are underlined, bold, etc.

Article Summary for Lecture # 9 - Rotenberg

The Author Challenge: 
Identification of Self 
in the Scholarly Literature

                Open access publishing, digital repositories, and contribution to scientific thought outside of traditional publishing are a reflection of the faster paced research arena. In the era of expanding research, to enable proper attribution of contributions to the correct individual some items need to be addressed:
  • Individual researchers are under increased pressure to find collaborators and to keep up-to-date with field trends, which is imperative to professional branding and reputation management.
  • The management of researcher identity by universities and other research organizations is a time-consuming process that institutions are required to do for institutional and government research assessment exercises.
  • Publishers, granting organizations, and professional and academic societies need to properly identify the researchers utilizing their systems for tracking and management purposes, and for finding reviewers.

                The main issue of name ambiguity is how it affects career advancement, tenure, global collaboration between researchers, and grant funding. According to Rotenberg and Kushmerick proper identification is contributed to by paying attention to the following:
  • Name Variations
    • Many authors share the same first and/or last name.
    • Authors use different names on different papers.
    • Middle initials may or may not be included with the author name.
    • Non-Roman names may have a variety of spellings and word orders. They may also be improperly transliterated.
  • Increase in Global Research Output
    • Scholarly research around the world has been increasing. The number of publications from China has increased from over 20,000 papers in 1998 to 112,000 in 2008.
    • This global research has introduced new forms of communications, and proper identification can help accurately attribute scholars' activities.
  • Funding and Tenure Requirements
    • Academic institutions within Australia and the United Kingdom are required to participate in government run research evaluations, which require institutions to supply accurate reports of publication output for their university faculty.
    • Highly multi-authored papers (more than 50,100,200, and 500) authors involved are on the rise.
           Keeping these in mind, and properly identifying various facets can let us know who researchers are, what they do, where they work, what kinds of information they publish, who they are connected to, an who they want to know. This comes in handy when making connections for future research or collaboration. Thomas Reuters is on the forefront of normalizing the collection of scholar data. in 2006, he introduced two features into Web of Science:
  • Author finder - a search aid that utilizes the various indexed fields in Web of Science, such as author names, institution names, and subject areas to help a user identify the correct set of records for their chosen author. 
  • Distinct Author Sets - aid that presents sets or clusters of publications that have been computationally grouped together using a proprietary algorithm, which helps users to pinpoint the publications of an individual. The algorithm takes into account a number of data points like author names, institution names, and citing and cited author relationships. 
           A really neat element of Web of Science is that users can provide feedback on author sets. This feedback alongside automatic clustering takes advantage of the best information and resources available. Another innovation of Reuters is ResearcherID. ResearcherID offers scholars and researchers the opportunity to connect their scholarly works to a unique identifier called a ResearcherID number. These numbers are attached to the individual throughout their career across disciplines and institutions, and the interface allows the chance for scholars to network through a number of search options (geographic heat map, keyboard, country tag clouds, and traditional fielded search page). However, you do not have to have an account to search public profiles, which allows the budding researcher to make contacts.

         ResearcherID allows users to classify themselves into a given area of research/expertise, and to integrate their profile on their website, so those perusing the web can find them and link to their work. As of July 2011, 127,000 unique ResearcherID profiles were created and over 2 million publication records from Web of Science  have been claimed by authors via ResearcherID. The author's conclude there are many initiative devoted to adding clarity to the traditionally unclear world of author identity in scholarly literature, but it is by no means solved.

         This is a really interesting article that really outlines how authorship can be an issue in the modern, fast-paced research environment. I would suggest researchers alongside information organizers to take a look at Rotenberg and Kushmerick's work. I came away from this article with an idea of the types of sources I could use upon completion of published research, or to find researchers on a specific topic. Fortunately, electronic organization of information has helped to partially resolve some issues in author recognition, and hopefully the information specialists of the future can completely eradicate any form of error or misrepresentation in scholarly work.

_________________________________________________________________________
For more information see the full article (citation below)!

Rotenberg, E. & Kushmericka, A. (2011). The author challenge: Identification of self in the scholarly literature. Cataloging & Classification Quarterly 49(6):503-20.