Learning Libraries: Organization of Information

Showing posts with label Organization of Information. Show all posts

Thursday, July 21, 2016

Classification of Resources IV: Information Architecture, Digital Libraries, and Social Classification

Subject Access Problem, Revisited

How does this apply to Web searching?

Automatic indexing (earlier in semester), such as what Google docs: Does a centralized index address the soda versus pop controversy?
Today, we explore possibilities that center on the human organizing of Web resources.

Challenge of Providing Access to Web Resource Collections

Can we build true digital *libraries*?

Can we systematically arrange Web resource collections by subject in a way that is useful to those seeking information?
Can we replicate the "great reduction" phenomenon?

Can we insert these digital libraries into the lives of our users where the time available for finding information is like to be a major constraint?

While at home
While at work

Search vs. Browse of Info Resources

Of surrogates
Bibliographic
Cataloging
Verbal subject analysis
Results list
A website's search engine

Browse

Of actual info resources
Bibliothecal
Shelf arrangement
Classification
Navigation
A website's architecture

Organizing Hyperlinks to Web Content

Hyperlinks to what types of files?

Text
Audio
Video
Powerpoint
etc.

Hyperlinks to whose content and to what kind of structures?

Content: Our own content ("stickiness") or someone else's content (traditional library approach)
Structure: Websites, sub-sites or webpages

Classification & Hyperlink Organizing

Main goal is to transcend the limitations of shelf arrangement in physical libraries:

E.g. multiple class numbers in classified catalog
Arranging links in multiple hierarchical locations (poly-hierarchy)

Another important goal is to take end-user perspective into account:

"Views" of a website in the case customized for "Future Students", "Current Students", etc.
Social bookmarking (let "pop" be pop)

What is the Documentary Unit?

For collections of hyperlinks to internal Web resources (i.e., self contained websites):

Keeping users on the webpages of your website
The emerging field of Information Architecture

Also referred to as navigation design for self-contained websites with few links to external content.
About creating navigational and organizational structures that put users in touch with the information they need in a website as efficiently as possible (similar to physical library signage, but also vocabulary control for labels).
Job titles include:

Interaction/Interface Designer
Usability Engineer
User Experience Designer (UX)

For collections of hyperlinks to external Web resources:

Linking to webpages: Google and other search engines
Linking to web sub-sites: Human constructed link indexes such as those on library websites
Linking on entire websites: Human constructed link indexes such as those on library websites.

Structural Standards for Web Resources

Are websites structured link books? No. Not always. That's why we have to worry about the differing structuring of information.
There is a lack of standards for structuring websites across publishers:

Many "vanity" web publishers
"Commercial publishers are making inroads

How Libraries Organize Web Resources

Standards approach: Cataloging - use of 856 tag:

MaRC information
Example: "Data mining your website

Customization approach - Web lists by subject:

Mathematics (list at University of Alabama Rodgers Library)

Faceted Classification (Non-hierarchical)

Can be a part of a comprehensive system, e.g. Colon Classification

Ranganathan, PMEST and the Colon Classification
Fungal diseases in the rice crops of Madras, 1950-1959: J381,4:433.441'N5

Can be a part of a hierarchical system as a non-hierarchical specification of the aspects of a subject:

DDC and LCC have tables for geographic and other facts
NCSU Catalog
Home Depot

Often Used for web organization:

Iokio CameraFinder

Social Classification

User added metadata
Shared resources (commonly used for collections of photos and URLs)
Organized via third party collaborative websites.
Also known as folksonomy, ethnoclassification and free-tagging.
Tagging - the establishment of a relationship between an online resource and a user:

No centralized vocabulary control
However, intent is to match an individual with other individuals who not only have the same interest, but also share the same way to express the aboutness of that resource (let "pop" be pop and "soda" be soda)

Contexts (know what is being organzied!) [ e.g. Photographs]

Virtual Photographic "Shoeboxes"

Collections of digital photographs stored at a third party website
Users "Tag" their photos with descriptors and descriptors can be searched.
Lack of vocabulary control; however, in this "social" context, netiher precision nor recall is important.
EXAMPLE: third party photo aggregator: Flickr

Know your Resource Collection!

What type of resource?

Pages or collection of pages?

Webpages, articles, books or other?

Are they structured consistently?
Do they follow conventions of publishing?

How are resources organized?

Searchable surrogates?

Authority control?
Automatically indexed?
Human (professional or civilian) indexer?

Browsable directly?
Both searchable and browseable?

Article Summary for Lecture # 14 - Manuel

Digital as a Hegemonic Medium for Epistemology and Knowledge Organization

Epistemology was born in the European Modernity in order to control others’ knowledge. Its study deals with foundations, criteria and validation by which scientific knowledge is justified, including historical, political, economic, social, etc. circumstances. Rosa San Segundo Manuel, and Daniel Martinez-Avila believe that the approach of epistemology have changed within the 21^st century. New social conditions, industrial production, advances in medical and scientific research reflect the changing involvement of technology in day-to-day life. They cite the internet as a bringer of change in media, scientific and epistemic contexts.

Manuel, et al., break down information into three main ages of human history:

The Oral Age – Primitive ideas of social organization and structuring were developed. Information was transferred orally, kept internally.
The Written Age – Developed a need to organize all the written knowledge. Writing and the role of the printing press allowed ideas to be shared with a wider audience.
The Digital Age – Current age. Printed and oral information become digitized and uploaded onto the web.

The authors also note the organization of knowledge in the digital environment is represented, invented, and articulated with two new fundamental instruments: a material one, digital technology, and a symbolic one, the deposited culture.

According to Manuel, et al., culture is being subordinated to technology, and theorize that soon there will be a digital repository of all constructed objects of a culture – making the digital not only the instrument and the location, but also the content itself. In this new culture, individuals will participate through blogs, wikis, and social networks. They believe digital natives have a new way of thinking in the Digital Age that embraces the hybridization of materials, formats, and texts, globalization of information, connectivity, virtuality, and hypertextuality.

These digital natives do not read the same way printed text is meant to be read, and have new ways of learning, memorizing, and participating via Web 2.0 and 3.0 technologies. They are immersing themselves into digital reality- they have an electronic mailbox, participate in social networks, communicate via blogs, wikis, etc. The authors believe this will lead to a change in epistemology, which will be a sort of post-epistemology that approaches a new structure of knowledge.

Overall, I agree with the authors’ ideas about the evolution of technology, and how the digital world is slowly changing the world in which we organize, study, and evaluation information. However, I found Manuel and Martinez-Avila’s article a little too brief. There were several ideas presented that I felt could be expounded upon, and found their highly technical vocabulary hard to decipher and navigate without a second pass. This being the case, it was a little difficult to extrapolate information to create this post, but do feel as though the article has its merits. This article does present some very interesting points about the future of information study, so if you are interested in how the digitization of our lives is affecting this, then this article is definitely for you!

_____________________________________________________________________________

For more information and clarification, check out the full article (citation below)!

Manuel, R., & Martinez-Avila, D. (2014). Digital as a hegemonic medium for epistemology and knowledge organization. Advances in Knowledge Organization 14:96-100.

Tuesday, July 19, 2016

Classification of Resources III: Research Library Schemes

Classification Definitions

Classification - the act of organizing a body of knowledge into a systematic order

In libraries: the systematic arrangement by subject of books and other materials on shelves, or of catalog and index entries, in a manner that is most useful to those who read or those who seek a definite piece of information.

Shelving device
Organization device

Characteristic of a Classification System

Inclusive and comprehensive
Systematic
Flexible and expansive (i.e. hospitable)
Employ terminology that is clear and descriptive
Indexed

Classified Shelf Arrangement of Books

Collocating objective: bringing like things together on library shelves:

Subject criterion: But what about books on multiple topics?
Author criterion: But what about books by multiple authors?

Need a system for the unique identification of resources in open stack libraries through the use of notational systems and call numbers.

Notational Systems

Notation marks (aka "classmarks") represent a book's subject class, including its relation to other subject classes in a classification scheme.
Most common types:

Pure - e.g., DDC employs Arabic numerals
Mixed - e.g., LCC employs an alpha-numeric notation

Mnemonics:

Repeating class notation patterns throughout classification
This technique is also used in the MaRC system

Shelf Arrangement and Sub-arrangement

Library classification schemes provide:

A systematic method for shelf arrangement in open stack libraries
A systematic method for sub-arrangement within each class

To accomplish this goal, catalogers synthesize (i.e. create) class numbers to represent the subject of a book
This process remains transparent to the user (i.e. the user is more interested int he fact that books are collocated on the shelf rather than how the numbers were determined.

Cuttering to Create a Call Number

Used to create call numbers (UIs) for individual library collections.
Provides the link between the surrogate record an the actual item in the collection
Provides for both subject and non-subject (e.g. author) oriented sub-arrangement in open stack libraries.
After the class notation is determined, then the cuttering process begins.
Use a cutter table:

There may be more than one cutter table floating around!
Cuttering is a flexible process; only use table as a guide

Steps for cuttering:

Determine the first letter of main entry (most often taken from 100 tag; but could be 245 tag)
Use number associated with second letter of main entry
Add additional numbers until call number is unique
Add date

You may have 1 or 2 cutter numbers, BUT you may never have 3.
In the LCC, cuttering instructions are given in the schedules:

Assists in topical, geographical and other non-main entry-based sub-arrangement
Occasionally, this involves double cuttering, thus precluding the use of main entry-based cutters.

LCC - General Points

Mixed notations
All letters are not used (reserved for expansion)
Subject specialists develop and maintain the class schedules, rather than centralized editors as is the case with the DDC.
Designed to meet the needs of the collection of the Library of Congress.
Based on literary warrant: Schedules developed with reference to what has been published.
Hierarchical; however, NOT reflected in notation.
LCC Classification outline:

Divided into main classes according to academic discipline or areas of study and then into sub-classes representing branches of those disciplines
Larger range of letters for History (C-G) and Social Sciences (H-L)
Numericals used range from 1-9999 with frequent gaps

LCC is Essentially Enumerative

Aspects of subjects are explicitly provided for (i.e. enumerated) in the schedules:

More "pigeonholes" created in advance to represent very specific topics and topical aspects.
However, tables are also used to synthesize complex class notations (not as often as with the DDC)

As an enumerative classification, the LCC schedules are more voluminous than many other schemes:

50 volumes
10,000 pages.

Consistent Structuring: Martel's 7 points

Initially, Charles Martel provided basis for consistent structuring across divisions:

General forms: periodicals, societies, dictionaries, etc
Theory, Philosophy
History
Treatises, General works
Law, Regulations, State relations (now relocated to K)
Study and teaching
Special subjects and their subdivision from general to specific

LCC has Many Editors

LCC has been referred to as a series of special classifications (though compare with NLM)
However, individual LCC schedules are structured identically:

Preface
Broad outline
Detailed outline
The schedule, itself
Any necessary, auxiliary tables
Detailed index

Revising LCC with Topic Subdivisions

Constitute the bulk of the expansion of classes and subclasses
EXAMPLE: Women Suffrage in U.S. (JK1880)

Note geographical subdivision of "Women's Suffrage. Women's right to vote" - in this case, cutter by location
Also note geographic subdivision under subtopic "Biography"

Procedures for Revision and Expansion

Proposals for changes originate with LC catalogers:

Anomalies
New topics

Some methods for expansion:

Using unused letters (I, O,W, X and Y)
Adding a third letter (or sometimes a fourth)
Extending existing numbers decimally
Expanding use of cuttering

General Steps for Classifying with LCC

Because of disciplinary aspects of LCC, first check for appropriate schedule to match subject of item in hand and then determine the best class number within the selected schedule.
Class item in hand with similar works:

Consult existing records
Consult class numbers mapped from assigned LCSH
Consult LCC outlines

Classifying General Works

Under most numbers with subdivisions, a number is designated for "general works."

Classifying Works on Single Topics

Always use the most specific class number that is co-extensive with the subject matter of the work.
If no co-extensive number exists, then the next appropriate broader number should be used.

Classifying Works by Time Period

For works treating a topic with regard to a particular time period. For those works spanning two periods use earliest.
DC History of France:

DC725 Earliest to 1515
DC727 16th century
DC729 17th-18th century
DC731 1789-1815
DC733 1815-1870
DC735 1871-1914
DC736 1914-1921
DC737 1922-

Multi-Faceted Single Topic Works

Works covering multiple facets of a single topic.
If available, use a class number representing all facets:

Idea of the English landscape painter: genius as Alibi in the eary ninetheenth century by Kay Dian Kriz
ND1354.5.K75 1997

If number covering all facets is not available, go with the emphasized or more prominent facet:

Elizabethan mineatures, by Carl Winter
ND1337.G7.W5 1955

Classifying Works on Multiple Topics

A work on two or three topics treated equally:

Use number for the topic treated first
QD412.A7 C53 1994 - Chemistry of organic arsenic, antimony, and bismuth compounds

A work on four or more topics:

Use a general number that encompasses all numbers chosen
RA566.3.C48 1992 - Changing U.S. health care; a study of four metropolitan areas

Classifying Works with Phase Relations

A work covering the relationship between two topics is classed in the most specific number covering the relationship:

QK46.5.H85 P66 2001 - Botany of desire; a plant's eye view of the world
QK46.5.H85 - Botany > Human-plant relationship

A work covering the influence of one topic on another is classed with the topic being influenced:

LA2317.T8 M3 1899x - Henry Tutwiler and the influence of the University of Virginia on education in Alabama
LA2317.T8 - History of U.S. Education > Individual biography

Cuttering for Geographical Aspects

Geographical cuttering option is often available in parts of the schedules:

Use either general or specialized cutter table

EXAMPLE:

Visual Arts Education:

N353.A-W - General works in Visual Arts in U.S.
N354.A-Z - General works in Visual Arts in a state
N355.A-Z - General works in Visual arts in a city

Cuttering Sub-topics - "Reserve Cutters"

Cuttering by topical aspect is also sometimes available - this is termed "reserve cuttering"
Style, Composition, Rhetoric:

P301 - General Works
P301.3.A-Z - General Works by region or country
P301.5.A-Z - Special Aspects A-Z

P301.5.134 - Idioms
P301.5.P73 - Propoganda

Thursday, July 14, 2016

Classification of Resources II: Public Library Schemes

Classification Definitions:

The act of organizing a body of knowledge into a systematic order
In libraries: The systematic arrangement by subject of books and other materials on shelves, or of catalog and index entries, in a manner that is most useful to those who read or those who seek a definite piece of information

Shelving Device
Organization Device

Characteristics of a Classification System

Inclusive and comprehensive
Systematic
Flexible and expansive/hospitable to new knowledge
Employ terminology that is clear and desriptive
Indexed

Notational Systems

Notation marks (a.k.a. "classmarks") represent a book's subject class, including its relation to other subject classes in a classification scheme.
Most common types:

Pure - e.g. DDC employs Arabic numerals
Mixed - e.g. LLC employs an alpha-numeric notation

Mnemonics:

Repeating class notation patterns throughout classification
This technique is also used in the MaRC system

Shelf Arrangement and Sub-arrangement

Library classification schemes provide:

A systematic method for shelf arrangement in open stack libraries
A systematic method for sub-arrangement within each class

To accomplish this goal, catalogers synthesize (i.e., create) class numbers to represent the subject of a book.
This process remains transparent to the user, i.e., the user is more interested in the fact that books are collocated on the shelf rather than how the numbers were determined.

Historical Development of DDC

First published in 1876 - "A Classification and Subject Index for Cataloging and Arranging the Books and Pamphlets of a Library."

Current edition: 23rd
Electronic version available

Most widely used classification scheme in the world (135 countries - translated into 30 languages)
Innovations:

Relative index
Integrity of numbers (with 2nd edition)

Conceptual Framework of DDC

Basic classes are organized by discipline (i.e., fields of study).
Divisions of DDC:

Ten main classes (0XX, 1XX, 2XX, etc), which together cover the entire world of knowledge
Each main class is divided into ten divisions (100 total divisions in DDC)
Each division is divided into ten sections (1000 total sections in DDC)
Class 000 is most general

Used for works not limited to any specific discipline (e.g., encyclopedias, newspapers, general periodicals)
Used for certain specialized disciplines that deal with knowledge and information (e.g., library science, computer science, journalism)

Each of the other main classes (1XX to 9XX) comprises a major disciplien or group of related disciplines.

DDC IS ARRANGED PRIMARILY BY DISCIPLINE AND NOT BY SUBJECT; therefore, a given subject is likely to appear under more than one class numbers.

Relative Index

Disciplinary focus of DDC causes subjects to be scattered across the classification; the Relative Index to the schedules is needed to collocate.
Relative index relates subjects to the various disciplines to which they may belong:

Journalism - generally found at 070.4
Journalism - civil rights issues at 323.445
Journalism - sociology at 302.23

Dewey's theoretical contribution to library classification.

Principle of Hierarchy

Structural Hierarchy (inheritance):

Whatever is true of the whole is true of the parts
This is termed "hierarchical force"
EXAMPLE:

All classmarks under 5XX are related to the natural sciences and/or mathematics
All classmarks under 612.1... are related to blood and circulation

Notational hierarchy (relationships between concepts):

Subordinate: 621.4 is subordinate to 621
Coordinate: 621.4 is coordinate with 621.6
Superordinate: 621 is superordinate to 621.4

Revision Process for DDC

Suggestions sent to Library of Congress
Reviewed by:

LC - Decimal Classification Division
OCLC - Forest Press
6 DDC editors
Editorial Policy Committee (EPC)

Role of EPC (Editorial Policy Committee)

Works closely with DDC editors to:

Suggest changes
Facilitate innovations
Monitor general development of the Classification

Reviews all versions of the Classification:

Print
WebDewey
Full edition
Abridged edition

EPC Membership

10 member international committee
Elected membership represents Dewey users worldwide
Members come from all types of libraries
Current representation:

American Library Association
Australian Committee on Cataloging
Chartered Institute of Library and Information Professionals
Library of Congress
National Library of Canada
OCLC

Types of Revisions

Expansion:

Introduction of new subject as part of a class scheme
Provides more minute subdivisions

Reduction:

Rarely used subdivisions are deleted and marked by brackets([]) in the class scheme listing
"Starvation policy" allows DDC to phase out the deleted numbers with the next edition printed

Relocations:

To rectify an improper placement
To eliminate duel provisions
To make room for new subjects when there is not available number room
To realign fields of knowledge

Reconstructed Schedules ("Phoenix schedules")

An entire schedule is reconstructed without regard to previous divisions
Rarely used due to integrity of numbers practice

Classifying with DDC

First, determine subject of work

Subject analysis is central to library classification
Must determine the intent of the author by examining:

Title - never the sole source
Table of contents - lists main topics discussed
Preface and introduction - can indicate author's intent
Scanning text, itself - provides guidance and confirmation
Bibliographic references - can also list topics discussed
Outside sources - helpful for verifying advanced subject

Second, determine discipline of work

Guiding principle is that a work is classed int he discipline for which it is intended rather than in the discipline from whicht he work is derived
This enables works that are used together to be shelved together

EXAMPLE: Zoologist's book on agricultura pest control would be classed with other books on pest conrtol rather than with other books on zoology.

Third, translate findings into appropriate DDC class

*NB*

What to do with multiple subjects/same discipline:

Class works covering interrelated subjects with the subject that is being acted upon (rule of application):

"Shakespeare's Influence on Keats" with Keats
"Great Depression's Impact on American Art" with American Art

Class works covering two subjects equally with the subject whose number appears first in the schedules (first-of-two rule):

There may be exceptions in instructions in the schedules

Class works covering three or more subjects that are all subdivisions of a broader subject with the first higher subject that includes them all (rule of three):

"History of Portugal [946.9, Sweden [948.5] and Greece [949.5]" is classed with the history of Europe at 940.

What to do with more than one discipline:

These works are examples of interdisciplinary research
Interdisciplinary is predictable; therefore, there may already be a place in the schedules for works that are interdisciplinary:

Check for interdisciplinary numbers such as 305.231
Class works not give an interdisciplinary number in the discipline given the fullest treatment.

Key Features of Schedules and Tables

Summaries

Summaries provide an overview of the structure of a class
"Birds Eye View" of a section of the DDC to help classifier to focus in on the possible classes numbers for a work
Because of hierarchical force, summaries at main class, divisional and sectional levels hold for all subordinate class numbers

Entries

Each entry contains a Dewey class number and descriptive information:

Heading
Notes

Additional Dewey class numbers may be:

In parenthesis - these numbers provide alertatives to standard practices
In brackets - these numbers represent subjects that have been discontinued or relocated

Notes

Notes provide additional information that is not obvious from a class' position in the notational hierarchy.
Classes of notes:

Notes that describe what is found in a class
Including notes
Notes that describe what is found in other classes
Notes that explain changes or irregularities

Scope notes - definition used within a knowledge organizing system to say what a term means within that system (see 700)
Former-heading notes reflect that the formal label for the classmark has changed (see 281.63)
Variant-name notes notate that a specific number goes by different names (see 332.32)
Class-here notes - confirmation that you're in the correct place (see 371.192)
Including notes - "Including" is a code word that notates a specific number is about to go through an expansion (see 362.16)
Class-elsewhere notes - offers other places that the book might ought to be placed(see 791.43)
See references notes (see 577.7)
See-also references (see 584.3)
Revision notes are used to indicate when a subdivision or class has been completely or extensively revised.
Discontinued notes - (see 004.696)
Relocation notes - (see 687.43)

Number Building

Synthesis of two numbers to create a complex Dewey class number.
Can be multiple Dewey numbers synthesized into a single number:

Book on advertising in libraries (659.1902)

Use 659.19 for advertising in special organizations
02 for libraries (dropping the trailing 0 from 020)

Can be synthesized:

Using one of seven Dewey tables
Using a "Number-built note" (see 353.13263

DDC Tables

Table 1 - Standard Subdivisions:

Contains mnemonics for standard subdivisions
Used to add facets to the class number (education is 072, geographical is 09, etc)
Used to connect a subject with a standard facet

- 01 Philosophy and theory
- 02 Miscellany
- 03 Dictionaries, encyclopedias, concordances
- 04 Special topics
- 05 Serial publications
- 06 Organizations and management
- 07 Education, research, related topics
- 08 History/ description with respect to kinds of persons
- 09 Historical, geographic, persons treatment

When analyzing Table One numbers (and Table One numbers ONLY), looking for the connecting "0" between subject and facet
Look up subject class in the schedules
Look up facet class in Table One
Most of the time, the connecting "0" is the first occurence:

635.13074 = 635.13 for Carrots and 074 for catalogs
But not always: 020.25 = 020 for LIS and 025 for Directories

Table 2 - Geographic Areas, Historical Periods, Persons:

Contains mnemonics for geographical areas, etc.
Used to add geographic facets to the class number:

United States: 973
Southeastern States: 975
Alabama: 976.1
Tuscaloosa County: 976.184

Table 3 - Subdivisions for the Arts, for Individual Literatures, for Specific Literary Forms:

Contains mnemonics for subdivisions for the arts, for individual literatures, for specific literary forms
Tables 3A-3C needed to determine specific notation to be used

Table 4 - Subdivisions of individual languages and language families
Table 5 - Racial, ethnic, national groups
Table 6 - Languages
Table 7 - Groups of persons

Tuesday, July 12, 2016

Classification of Resources I: Concepts, Problems, Issues

Two Perspectives on Classification

Classificationists (i.e. editor) - those who create and edit the conceptual places to put things

Need to create workable classifications

What is the basis for a classification?
What are the ramifications of a classification?
How hospitable is a classification to changes in knowledge over time?

Classifiers - those who place things into these created places when organizing

Need places to put things

Is there a place for this thing?
Is this think with other like things?

Classical Classification

Aristotle and the natural world:

Mutual exclusivity: "in or out" - something is an animal or it is not an animals
Inheritance: based on the assumption that there is a natural hierarchy to the world (all animals share same characteristics that differentiate them from plants)
Basis for scientific classification

Medieval scholastics and their resources:

Classification of scholarly materials based on the academic divisions of study of that time -
This classification reflected those areas taught to scholars.
Classification provides the basis for shelf arrangement.

Scientific (Linnaean) Classification

Largest example of formal classification:

Managed over time by classificationists
Binomial nomenclature (genus/species)
Specific rules for classifiers to follow (e.g., members of the same species are capable of interbreeding to produce fertile offspring)

Follows classical approach:

Mutual exclusivity of classes
Inheritance

EXAMPLES:

Two species contexts:

Dealing with changes in the classification itself:

Evolution of existing species - e.g. newly emerging drug resistant infectious diseases.
Newly discovered species - e.g., the division of Monera into two distinct kingdoms

Library Classification

Inspired by scientific classification:

Early library classificationists were leaders in the application of classification principles outside of the scientific domain
However, book classification provided challenges in comparison to the classification of natural objects
Is a book like a starfish?

Historically U.S. libraries used classification for shelf arrangement, while European libraries used classification to facilitate retrieval of surrogates:

Open stacks require a system for shelf arrangement of books
Closed stacks rely on the catalog to provide logical groupings

The Nature of Book Classification

Collocating objective: Bringing like things together on library shelves:

Subject criterion: What about books on multiple topics?
Author criterion: What about books by multiple authors?
Subject/author criteria: What about books by the same author on different topics?

Solving the need for a system of unique identification in open stack libraries through notational systems an call numbers.

Questions in Book Classification

How do libraries provide for the collocation of like books (2nd Cutter objective) while at the same time facilitating the retrieval of known items (1st Cutter objective)?

Cutter's solution - two part call number:

Call number is made up of the class notation (classmark) and the Cutter number
The classmark provides for the fulfillment of the collocation objective
The system for Cuttering provides unique call within each library that provides for the known item retrieval objective - helps with unique idenfitifaction.

A book is not a starfish:

Books, as physical objects can only be in one place at a time, even if they are about multiple topics
Libraries do not buy multiple copies
Conceit of the cataloger revisited
What about networked resources: Is a website a starfish?

Comparison of I.S. and European libraries:

U.S. libraries: open stacks and "mark and park"
European libraries: closed stacks and classified cataloging (i.e., the assignment of book to multiple classes)

How many books should be grouped together in the stacks (the efficient browsing problem):

Broad classification
Detailed ("close") classification

How should books within each class be displayed (the subarrangement problem)

We will compare the needs of the typical research library, with its larger collection of specialized books, with the typical public library, with its smaller collection of broader books.

Four Types of Library Classifications

Universal: Intended to organize all of knowledge:

National General: Same as universal, but limited to a specific country:

Nederlandse Basisclassifatie

Subject Specific: Intended to organize a domain:

NLM Classification

Homegrown: Built as needed (e.g., Yahoo directory)

Classification Concepts

Broad versus Close Classification

General strategy employed by individual libraries.
How many class numbers to use?

The larger collection, the closer the classification (i.e., more books require more detailed class numbers)
Can vary within a library's collections (i.e., a given library may have larger collections in a certain subject area)
Otherwise, there will be too many books classed together or there will be too few books under each class (each of which impede efficient browsing)

Classification of Knowledge versus Classification of a Particular Collection

Relates to the intent of the classification:

Classification of knowledge approach provides pigeonholes for all subjects in advance of the use of that classification
Particular collection approach has mechanism to create new pigeonholes as resources are added to collection (literary warrant)

DDC began as universal, but updates to the classification are now through literary warrant.
LCC began through literary warrant, but the nature of the LC collection makes the LCC a de facto universal scheme.

Notational Integrity Over Time

Attempting to Maintain the same meaning of a class notation over time:

Response to the problem of accounting for the growth of knowledge over time
A challenge for the classificationist

Classifications can be designed to handle the growth of knowledge

Fixed versus Relative Location in Closed and Open Stack Libraries

In terms of efficient storage, what is the most obvious characteristic of an open stack library?

When storage space is at a premium, use the fixed location approach of closed stacks libraries.
Resources can be efficiently stored by size
The call number is an accession number

Relative location approach is employed in open stack libraries:

Physical spaces in collection allow for growth of the collection without a lot of shifting
Relative addressing is the key
A library collection is a single, linear sequence of books

The Case of Journal Shelf Arrangement

Alphaetic or classified?
Alphabetic by title:

Less costly to manage
What about title changes?
Problem: Journals on the same subject are scattered across the collection

Classified by subject:

Journals arranged by subject
Title changes are accommodated
Problem: Users must look up call number to find journal (this is not the case with alphabetical arrangement)

Faceted Classification (Non-hierarchical)

Can be part of a comprehensive system, e.g., Colon Classification:
Can be part of hierarchical system as non-hierarchical specification of the aspects of a subject:

DDC and LCC have tables for geographic and other facets

Often used for web organization:

Article Summary for Lecture # 11 - Barite

The Notion of "Category":

Its Implications in Subject Analysis and in the Construction and Evaluation of

Indexing Languages

Mario Guido Barite, a professor and researcher at the School of Librarianship at the University of the Republic in Uruguay, attempts to tackle the notion of category (a basic intellectual tool for the analysis of the existence and changeableness of things) and proposes conceptual and methodological reexamination from a functional standpoint. Barite claims that "most classifiers or indexers assume the role of classificationist since the present state of indexing languages entails minor and major surgery be performed to adapt these languages to users' requirements."

Categories necessarily are the foundation of any organizational system of knowledge, however category, characteristic, or class are sometimes used indistinctly. It is not possible to characterize categories in the Theory of Classification, as categories are extremely general abstract expressions. Categories are used as tools to discover certain regularities of the material world, but Barite suggest properties as a possible category to analyze the material world since categories are, in their basic nature, extremely simple notions. Within the Theory of Classification, categories are only relevant as instruments of analysis and organization of object, phenomena and knowledge, and classificationists are used in three precise activities Barite mentions:

design, planning, and structuring of indexing languages or systems of knowledge
modification or specification of classification tables
the evaluation and analysis of indexing languages and systems of concepts through a set of parameters capable of establishing the grade of reciprocal tension among related concepts and their relevance and validity.

Since it is not possible to isolate the notion of category from those of object and analyst there are several object attributes that Barite suggests condition its study:

Any object is naturally dynamic and mutable - that being the case, in order for the analysis to be completed, the object must be captured at a certain time and abstraction from its reality is required at a given moment.
The object may be real or ideal - it may have existed as may be corroborated by its existence registers or maybe it only has an immaterial existence, not physical, due to its nature. These particular characteristics seem to obstruct the analysis since analysts are condemned to act by approximation. However, once conventions have been clearly established by conses, abstract objects are easily systematized after agreement has been reached regarding what a theorem is or certain chronological and factual conventions of the French Revolution - the difficulty of giving intellectual access to the concept diminishes.
Some objects have delimitation problems - attempts to produce a definition usually create discrepancies and shades of meaning among experts, so much that they may cause a certain aspect of the object to be placed within one category or the other. But we also have the difficulties posed by the concepts that do not attain conventional agreement. To exemplify, think of the difficulty of approving by consensus the basic statements towards the definition of the concept labor flexibilization from the viewpoint of a sociologist with a Marxist orientation and another one of ultra-liberal ideas.
A large part of the objects belong to, or occur in a phase of the time-space continuum, or rather flow along a section of that continuum. - Due to their mutating and dynamic nature, some objects achieve various configurations and undergo a double influence: that of the processes occurring as a result of the action of internal agents, and that of the processes caused by external agents. This double influence is the determinant of each specific configuration, since any object is in a given time and in a given spatial situation, the synthesis of the impacts brought about by such agents.

He then goes on to decompose the notion of category to extract its most typical characteristics:

Every category is a sectorial one.
Every category implies a specific level of analysis.
Categories are levels of analysis external to the object.
Categories are mutually excluding.
Every category is highly generalizable.
Every category may admit, with reference to an object, variable levels of subdivision.
Agreement has not been reached regarding a limited collection of categories.

Barite concludes that the proposition of greater attention on the definition of category, because it involves essential theoretical practical aspects for the reasonable command of the theory of concepts by specialists. I agree that it is important to dissect, correct, and fully understand terminology. If we can't fully understand the terminology within a an organization system, the system will not be efficient enough to get quality information into the hands of patrons with ease. The article is a bit philosophic, which made it somewhat of a struggle to read, but the heart of the article and Barite's ideas are spot on. I'd recommend this for organizers everywhere, as it really gets you to think about the elements of a system and the terminology used.

________________________________________________________________________

For more information see the full article (citation below!)

Barite, M. (2000). The notion of "category:" Its implications in subject analysis and in the construction and evaluation of indexing languages. Knowledge Organization 27:4-10.

Thursday, July 7, 2016

Verbal Subject Analysis III: Webpage Databases (a.k.a. "Search Engines")

Human vs. Automatic Indexing

Both are related to the subject analysis of information resources.
Human indexing is used to describe the subject analysis of various periodical databases.
Automatic indexing is a term used for the subject analysis operations by the computer algorithms of various webpage databases (a.k.a. search engines).

Research from the 60s-80s were trying to get a computer to calculate what articles were about. The most frequent words, articles like a, an, the, etc., don't really tell you much about the article, neither do the least used words. The key is finding the sweet spot based on what the author usually writes about.

Why Webpage Database?

It is always important to know the documentary unit of an information database.
The adjective associated with database is always a cue to the documentary unit.
Webpage databases are informational databases in which a webpage is the documentary unit.
They are also known as search engines and discovered databases.

Analysis of Websites and their Structure

What are webpages? What are websites? Webpages:Websites as pages:books
Standards (or lack thereof) for the authoring of web sites and webpages

HTML and other markup languages
Editors

What are the implications of the lack of authoring standards for web-based information resources?

Location of Webpage Subject Metadata

In webpage headers: For individual webpages, subject metadata can be created by authors and included in HTML headers.
In separate metadata record databases:

Subject metadata can be created by intermediaries using Dublin Core schema
In search engines, subject metadata is inferred "automatically" by computer algorithm.

Search Engine Questions

For greater understanding we need to be able to answer:

Why do search engines produce different results the exact same query?
What is the principle for ranking the display of search engine records in response to a query?

The Term "Search Engine"

The term has become the common designation for webpage databases, However, in actuality, webpage databases have three parts:

Spidering/crawling software to collect webpages.
Indexing software to build the index of surrogate records.
Retrieval software to facilitate retrieval of surrogates.

Automatic Indexing in Context

Obtain information resource - spidering/crawling

Steps for spidering/crawling:

Computers owned by search engine retrieve documents by clicking on all hyperlinks on each retrieved webpage
Determination is made whether a webpage needs to be indexed (because it is new) or reindexed (if it has already been indexed)
Determination is made whether reindexing is warranted
New webpages and those meeting criteria for reindexing are then placed in the indexing queue

Describe information resource in surrogate record - read off webpages by indexing software

Left Side elements must be inferred by searcher:

Examine structure of retrieved records
Examine advanced search interface
Element sets are not standard, i.e., they will vary across search engines.

Right Side Content:

What is the source for the content?
Authority control?

Subject analyze information resource in surrogate record - indexing software:

Verbal - inferred by computer algorithm
Classification - inferred by computer algorithm
Subject Indexing in Search Engines

The subject fields of webpage surrogate records include the words that describe what the webpage is about.
Right side subject content is inferred through the application of proprietary algorithms.
Subject terms added to surrogate records are weighted:

Doc #1: SU = dogs (.99); breeding (.87);dachshund (.30)
Doc #2: cats (.92); dogs(.44); dachshund (.03)
The weights are computed by proprietary algorithm.

Retrieval from Search Engines

Unlike bibliographic databases, in which the ordering of retrieved surrogate records is reverse chronological, search engines use a relevance-based ranking.
The search engine component of a search engine takes the entered query and compares it to the terms to the index.
The documents that are retrieved first are those that contain a higher "relevance" score:

Doc #1: SU = dogs (.99); breeding (.87);dachshund (.30)
Doc #2: cats (.92); dogs(.44); dachshund (.03)
"dog" query would rank document #1 ahead of document #2
"breeding" query would rank document #1 ahead of document #2
"cats"query would rank document #2 ahead of document #1

How are Subject Weights Calculated?

Conventional methods (Dating from the 1950s) for automatically inferring what a document is about include the following three techniques:

Frequency of word occurrences
Location of words occurrences
Size of word occurrences

In the web era, however, these techniques did not scale well to meet the needs of databases containing billions of records:

Could facilitate retrieval of relevant documents, but could not distinguish between "good" and "bad" documents.
Were also subject to manipulation by authors desiring higher search engine retrieval (spamming)

Two responses to Early Indexing Failure

Yahoo! era (late 1990's)

Human indexing (website directories)
More discussion during lectures on classification.

Google era (since 1999)

Additional criteria introduced to infer aboutness, e.g.,;

$ - paid submissions, such as Alta Vista
Quality - PageRank algorithm of Google

Google Approach to Authomatic Indexing

Issue addressed by Google concerns the quality problem: How to cause the "best" documents to rise to the top of a set of retrieved webpages.
Solution concerns identifying additional criteria to include int he subject weighting algorithm.
Google maintains additional metadata elements for each surrogate record in its index of webpages:

How many other webpages link to a given webpage

The more webpages (i.e. linkers) a dachshund webpage has poiting to it, the more quality it has.
This factors into the weight assigned to the "dachshund" descriptor inthe subject field of its surrogate record

Who are the linkers

Those linkers that have a higher quality rank are given more weight than those linkers with a lower quality rank.

Article Summary for Lecture # 10 - Northedge

and beyond:

information retrieval on the World Wide Web

Northedge defines a web directory as “a human compiled list of links to web pages, typically organized into a hierarchical structure of subject categories.” Back in 1994, a mere 3 years after Berners-Lee created the “Web”; there were less than 10,000 websites. This number inflated to almost 3.5 million in 1998, and in 2006, it was estimated to be at over 100 million. Imagine if those websites were books. Without anyone to organize and sort through all of them, it would take forever for us users to retrieve any kind of information, let alone navigate the sea of changes that authors and creators make on a daily basis to their sites. If a librarian is involved, the user can submit their queries to the librarian.

In the case of the internet, search engines are the librarians. Several criteria measure the quality of the search engine, such as:

The size of the corpus – the more books the librarian can search, the better.
The speed of the answer – if we do not get our information quickly, we will find another search engine.
The availability of service – if it is not available when it is needed, the users are going to find another search engine.

· The accuracy of results – if the information the user gets back is not what they are looking for, and then they will find another search engine that will return related results. However, if the three preceding criteria are not met, accurate data is not going to be important. (See this post).

Search engines require their users to submit their searches through a search box, which allows the user to choose whatever terms they like – unlike web directories, which constrain users to search using vocabulary chosen by the indexer. Since it might take a while for a search engine to sift through over 100 million constantly changing websites, it only makes sense to implement an indexing program (called a spider or robot). This program accesses web pages, analyses their contents and records the results in a database (referred to as an “index”), which enables fast access to sought information and bridges the gap between the search engine and the requested content.

Today, one of the most used search engine is Google. Google’s software agent (indexer), called “Googlebot” continually locates billions of web pages, analyses the content, and save the result in the Google index. The algorithms it uses are a company secret, as they are what sets Google apart from its competitors (Bing, Yahoo, etc.). Googlebot breaks down webpages into words and examines their context within the page (position – is it in a header, sub header, body text, etc.) and sources are returned to user, based on the algorithms weighted scale, in order of assumed most relevant to least.

In addition, while Google’s search box may seem to ask “What subject do you want information on?” in reality, it is asking “What word or combination of words will be most likely to appear on web pages that address the subject I am interested in, an least likely to appear on pages that are irrelevant to me?”. This may trip up users who are unfamiliar with how search engines work, and this may be the one negative Northedge presents about search engines – there is no one-to-one correspondence between words and meanings, and a single word may have multiple meanings (search for Java – the country – and only results about the computer programming language are returned). He also offers information on alternatives to search engines, which include META tags (the assignment of subject keywords by the web content creators), and folksonomies/tagging (creation of a taxonomy by the collective actions of users on the Web – see del.icio.us and flickr). These alternatives are somewhat controversial, because users and/or creators can deliberately assign misleading or inaccurate keywords to the content for financial gain or malicious reasons.

Ultimately, Northedge offers insight into the possible future of web searches, computer-generated indexes, but the data contained in those indexes may be driven by data sets produced by human indexing techniques and human linguistic research. I agree with this assertation, because it seems as more technologies are developed and released, the search process becomes more streamlined and tailored to what the user REALLY wants from their search. This article is very informative, and if you want to know more about the inner-workings of search engines, this is a fascinating read. I definitely came away knowing more about what happens once I search for "cat videos" on Google.

______________________________________

To read the whole article, see the citation below:

Northedge, R. (2007, April). Google and beyond: Information retrieval on the World Wide Web. The Indexer, 25(3), 192-195.

Tuesday, July 5, 2016

Verbal Subject Analysis II: Periodical and Other Databases

Subject Cataloging vs. Indexing

Both are related to the subject analysis of resources.
Subject cataloging is a term used for the subject analysis operations in library cataloging.
Indexing is a term generally used for the subject analysis operations in various other resource organization contexts, including periodical databases and search engines.

Brief History of Periodical Indexes

Around the turn of the 20th century, the library community decided not to add article citations to the catalog.
This development led to the growth of the commercial indexing industry.
The result of this has been:

Split files
Fees for licensing database content
Difficulty fulfilling Cutter's 2nd objective

Analytical Cataloging

Analytical cataloging techniques are needed in order to provide access to the component parts of composite information resources, most commonly:

Book chapters
Proceedings articles (usually of academic meetings)
Journal articles

Definition from AARC2: Analysis is the process of preparing a bibliographic record that describes a part (or parts) of an item for which a comprehensive entry is made.

Analytical Cataloging Techniques

Complex entries made within the record of composite work [cheap]:

Analytical added entries:

Use 740 tag for second of two works mentioned in title of item

Note area for comprehensive entry of larger work:

Use 505 tag for structured display of table of contents.

Separate records created for the component parts of composite works ("In" Analytics)[expensive]:

Use 773 to trace the component part record to parent record

Analytical Access to Journal Content

Decision to not provide analytical access to journal content (i.e. directly to articles) was because of the expense:

Excessive number of records would have to be created.
Additional authority work would need to be done.

As a result, through the 20th century, cataloging and periodical indexing/bibliography creation techniques evolved separate approaches.

Overview Comparison

Catalog

Authority work
Cataloging records represents the holdings of a library

Periodical indexes:

Subject indexes are extensive topical bibliographies (often include books and book chapters, too), usually covering large swaths of "territory"
Domain-wide indexes (e.g. Index Medicus) attempt to capture an entire discipline (may include book chapters, too)
No single library could ever own all items referred to in exhaustive bibliographies/indexes, thus leading to ILL (inter-Library Loan) services
Authority work nonexistent (except controlled vocabularies)

Surrogate Records in Periodical Databases

As is the case with library catalogs, periodical databases contain structured surrogate records.
This structuring is fairly consistent across periodical databases, both in terms of stored records (two part metadata model holds) and how records are displayed
There is some authority control at work, but not in ways that you might think.

Collocation in Periodical Databases

By subject - what about vocabulary control?
By author - what about authority control?
By journal - what about authority control?
By language
By publication type
By date
Etc., etc., etc.

In all Collocation Contexts: MATCH!

EXAMPLES:

Indexers → author name → match ← author name ← users
Indexers → journal name → match ← journal name ← users
Indexers → vocabulary → match ←vocabulary ← users

Inverted File Structures

How surrogate records are physically stored in the index of a database.
Each surrogate record has a unique identifies (also called a pointer)
Each word and phrase of the index has a record in the index; each record contains the UI for each surrogate record that contains that word or phrase:

Dog: 235, 527; 5,345,672; 117,127,923
Cat: 127; 2,753; 917,538; 327,543,238

How is Surrogate Information Stored?

Print periodical indexes and bibliographies.
Online periodical databases:

Examine sample database record for left side metadata elements
Examine advanced search interface
EXAMPLES:

ALWAYS KNOW THE START DATE OF YOUR ONLINE PERIODICAL DATABASE!

Manual search of the literature is often needed for exhaustive searches
Retrospective conversion of print indexes to online is not generally undertaken by database providers due to expense.

Periodical Database Characteristics

They usually hold more records than a library catalog.
Information resources (i.e. journal articles) contain less information than the info resources in library catalogs and there are no detailed secondary navigation aids such as book indexes.
More fields (i.e. left side elements) available for search word qualification.
Important to distinguish database producing companies/organizations from database interface companies:

Some companies/organizations provide both (e.g. PubMed MEDLINE)
Other companies provide interface services, such as Dialog or EBSCO.

User Interfaces (UI) in Periodical Databases

Common UIs employed in periodical databases:

ISSN - uniquely numerical identification for individual serial publications
Internal numbering systems within a periodical database, such as the PMID in MEDLINE (for known item searches)

The most important UI is pre-Web and that is the "Address" of an article in the bibliographic universe (also known as the citation data) - also good for known item searches:

Journal name
Volume number (in some journals, the issue number also)
First page number of article.

Authority Control in Periodical Databases

Titles - not controlled
Authors - somewhat controlled:

Indexers generally enter author name from the information resource
Control rests with periodical editors, who often have policies on author names that may be different than other periodical editors

Subjects - controlled:

Controlled vocabularies are imposed across periodical and over time by indexers
However, subject searching is still subject to the problems associated with the "Great Pop vs. Soda Controversy"

Author Indexes in Periodical Databases

Examine how author data is entered into surrogate records:

Generally taken from the information resource in hand
PubMed MEDLINE is an exception

Some databases will provide lists of author names from which to choose:

Indicator of authority work?

Library Literature is an example

Subject Indexing in Periodical Databases

Indexing approach is "information access," therefore depth indexing is the general rule.
Indexers index to the most specific, therefore, hierarchies remain important in controlled vocabularies.
Pre-coordination and post-coordination are important concepts.

Subject Indexes in Periodical Databases

Depth Indexing:

General goal is to provide subject access to the information contained in an article.
However, this practice is not universal across database producers; therefore, determine the depth of indexing of the database you are searching by examining existing records

Be aware of the lag time related to subject indexing of articles

Management of Controlled Vocabularies

Homonymy:

Addressed by domain specificity of most periodical databases
Various forms of qualification are used, including parenthetical, hierarchy, and scope notes.

New concepts:

Literary warrant is generally not employed in periodical database vocabularies
New terms are added after new concepts have established themselves in the literature
This poses a challenge if you are searching at a "research front" (may need to perform a keyword search strategy of the abstract field)

Some Vocabularies for Periodical Databases

MeSH

NLM MeSH Browser
PubMed MeSH Database

Library Literature

EBSCO interface

Other Controlled Vocabulary Contexts

Getty Museum (the folks who created ULAN)

Subject Terms (search "cubist")
Thesaurus of Geographic Names (compare "Soviet Union" and "Russia")

Dublin Core permits various subject vocabularies:

Specification of subject element
DC Vocabulary Qualifiers

Web Content for Human Indexing

A-Z Web Indexes:

MEDLINEPlus Topics A-Z (organizes access to Web-based information for patients)

Indexing in Context

Obtain information resource
Describe information resource in surrogate record
Subject analyze information resource in surrogate record:

Verbal
Classification

Article Indexing Process

Two steps:

Analyze information resource to generate list of candidate concepts that describe its subject content
Translate those concepts into the controlled vocabulary of the database

ISO Standard for article indexing - special attention should be paid to certain sources of information:

Title
Abstract (when provided)
Introduction; opening and concluding paragraphs
Illustrations, diagrams, etc and their captions
Words or groups of words that are underlined, bold, etc.

Article Summary for Lecture # 9 - Rotenberg

The Author Challenge:

Identification of Self

in the Scholarly Literature

Open access publishing, digital repositories, and contribution to scientific thought outside of traditional publishing are a reflection of the faster paced research arena. In the era of expanding research, to enable proper attribution of contributions to the correct individual some items need to be addressed:

Individual researchers are under increased pressure to find collaborators and to keep up-to-date with field trends, which is imperative to professional branding and reputation management.
The management of researcher identity by universities and other research organizations is a time-consuming process that institutions are required to do for institutional and government research assessment exercises.
Publishers, granting organizations, and professional and academic societies need to properly identify the researchers utilizing their systems for tracking and management purposes, and for finding reviewers.

The main issue of name ambiguity is how it affects career advancement, tenure, global collaboration between researchers, and grant funding. According to Rotenberg and Kushmerick proper identification is contributed to by paying attention to the following:

Name Variations

Many authors share the same first and/or last name.
Authors use different names on different papers.
Middle initials may or may not be included with the author name.
Non-Roman names may have a variety of spellings and word orders. They may also be improperly transliterated.

Increase in Global Research Output

Scholarly research around the world has been increasing. The number of publications from China has increased from over 20,000 papers in 1998 to 112,000 in 2008.
This global research has introduced new forms of communications, and proper identification can help accurately attribute scholars' activities.

Funding and Tenure Requirements

Academic institutions within Australia and the United Kingdom are required to participate in government run research evaluations, which require institutions to supply accurate reports of publication output for their university faculty.
Highly multi-authored papers (more than 50,100,200, and 500) authors involved are on the rise.

Keeping these in mind, and properly identifying various facets can let us know who researchers are, what they do, where they work, what kinds of information they publish, who they are connected to, an who they want to know. This comes in handy when making connections for future research or collaboration. Thomas Reuters is on the forefront of normalizing the collection of scholar data. in 2006, he introduced two features into Web of Science:

Author finder - a search aid that utilizes the various indexed fields in Web of Science, such as author names, institution names, and subject areas to help a user identify the correct set of records for their chosen author.
Distinct Author Sets - aid that presents sets or clusters of publications that have been computationally grouped together using a proprietary algorithm, which helps users to pinpoint the publications of an individual. The algorithm takes into account a number of data points like author names, institution names, and citing and cited author relationships.

A really neat element of Web of Science is that users can provide feedback on author sets. This feedback alongside automatic clustering takes advantage of the best information and resources available. Another innovation of Reuters is ResearcherID. ResearcherID offers scholars and researchers the opportunity to connect their scholarly works to a unique identifier called a ResearcherID number. These numbers are attached to the individual throughout their career across disciplines and institutions, and the interface allows the chance for scholars to network through a number of search options (geographic heat map, keyboard, country tag clouds, and traditional fielded search page). However, you do not have to have an account to search public profiles, which allows the budding researcher to make contacts.

ResearcherID allows users to classify themselves into a given area of research/expertise, and to integrate their profile on their website, so those perusing the web can find them and link to their work. As of July 2011, 127,000 unique ResearcherID profiles were created and over 2 million publication records from Web of Science have been claimed by authors via ResearcherID. The author's conclude there are many initiative devoted to adding clarity to the traditionally unclear world of author identity in scholarly literature, but it is by no means solved.

This is a really interesting article that really outlines how authorship can be an issue in the modern, fast-paced research environment. I would suggest researchers alongside information organizers to take a look at Rotenberg and Kushmerick's work. I came away from this article with an idea of the types of sources I could use upon completion of published research, or to find researchers on a specific topic. Fortunately, electronic organization of information has helped to partially resolve some issues in author recognition, and hopefully the information specialists of the future can completely eradicate any form of error or misrepresentation in scholarly work.

_________________________________________________________________________

For more information see the full article (citation below)!

Rotenberg, E. & Kushmericka, A. (2011). The author challenge: Identification of self in the scholarly literature. Cataloging & Classification Quarterly 49(6):503-20.

Learning Libraries