The Thesaurus:
A Historical Viewpoint,
with a Look to the Future
Four decades of use,
experimentation, and development have allowed users, researchers, and
catalogers to refine thesauri to be very effective search tools. Aitchison and
Clarke discuss the history, making note of important, monumental events that
help in the creation of what we know as the thesaurus. They draw on earlier
printed histories of thesauri (specifically Gilchrist’s Thesaurus in Retrieval), and go on to define thesaurus as “a
treasury or storehouse of knowledge, as a dictionary, encyclopedia and the like.”
The primary purpose of thesauri is to match the vocabulary used by the indexer
with the language of the searcher.
The first time the word “thesaurus”
was used (in terms of information retrieval) was in 1957 by Peter Luhn of IBM,
and had definitely evolved through the 1950s. One particular highlight on the
timeline of the thesaurus is the Uniterm System, which used uncontrolled single
words taken from the text of documents, which ultimately proved difficult since
only single-word terms were available to deal with synonyms, homonyms, etc.
Fortunately, it was superseded by vocabularies containing significant numbers
of compound terms. In addition, during this time, the thesauri listed terms in
alphabetical order, which was eventually carried into the standardization of
format in 1967 when the Thesaurus of
Engineering and Scientific Terms (TEST) was published.
This predominant feature
alphabetically displays descriptors and non-descriptors - synonyms, broader,
narrower, and related terms showing under each descriptor. A subject overview
or systematic display was of secondary importance. The idea of a detailed
classified arrangement was considered too complex. An example of this is the
Descriptor Group Display, within which main groups are divided into subgroups,
and within the subgroups, descriptors are further organized into clusters. For
example:
14 DEMOGRAPHY. POPULATIONS
14.01
POPULATION DYNAMICS
14.01.01
CIVIL REGISTRATION
DEMOGRAPHIC STATISTICS
POPULATION
DATA
USE: DEMOGRAPHIC STATISTICS
etc.
In order to section thesaurus information
in this way, the classification scheme is an indispensable tool. When the
editor works only with an alphabetical list, it is a sense of working blind,
but if rigorous classification is developed, the compiler has a better chance
of building accurate and meaningful relationships between the terms. In the
early days, most thesauri were compiled manually, which was a massive and both
a time and space investment (e.g. Thesaurofacet was held in more than 20 shoeboxes
containing cards for 16,000 descriptors and 7,000 non-descriptors), and was greatly
vulnerable to human errors or mid-process interruptions. This is where
computer-aided compilation becomes handy. In the late 1970s, computer
compilation was more common; however, there was no software to maintain a
systematic display of the faceted thesaurus style.
During this time, access was
usually limited to one workplace. There was either a large tome that stood by
the bank of filing cards or optical coincidence viewer, and even computerized
thesauri were limited in space. However, trained searchers became fluent with
the process, and that alongside trained indexers they were able to fully
harness the power of the thesauri to perform effective searches. Nowadays, pcs
are everywhere, and each of them provide access to unlimited networks. In order
to apply thesauri to information retrieval the authors feel the following
challenges need to be addressed:
- Access to information proceeds through any number of different portals, gateways, and search engines, many geared to particular audiences and subject areas. There is no universal thesaurus, but a multitude of different vocabularies for different applications.
- In the publish one, re-utilize many times’ environment, it is hard to predict in which systems or networks a given document may eventually appear. Indexers must struggle to foresee all the needs that may arise for a given document.
- With the data entry/indexing task distributed among a vast number of authors, webmasters, system administrators, etc., quality control cannot be enforced across organizational boundaries.
- How can we train end-users to use a thesaurus properly? The experience of most information providers is that users do not want to cope with anything complicated, and the thesaurus is perceived as very complicated. Those beautifully presented systematic displays, carefully designed for selecting the right term(s) for each required concept, are often rejected as an unnecessary impediment and delay between the user and goal.
Confronting these challenges has
recently led to two major trends in thesaurus developments:
- Hunting for adaptations that will make a controlled vocabulary much quicker, easier, and more intuitive to use.
- Drive to interoperability of systems, meaning to design vocabularies for easy integration into downstream applications such as content management systems, indexing/meta-tagging interface, search engines, and portals.
Current technology has users who
are more than happy to browse through a simple classified directory, using
point-and-click interaction with established headings instead of actively
thinking of search terms. This is why some companies are working on developing
taxonomies that will make things easier for the searcher, and perhaps even the
indexer. There are even mentions of hiding vocabulary all together by
implementing synonyms sets for selected terms that can be used to drive
automatic expansion of free-text search queries.
Another topic current thesauri developers
need to think about is interoperability. It makes things easier for users. Gone
are the days of looking up in a printed thesaurus and then key selected terms
into the indexing system. Now, copy-paste or clicking on them, a search system
has to be capable of interacting with the thesaurus database.These newest
concerns were reflected in the updated standards released in the Workshop on Electronic Thesauri held in
1999, which says, “The standard should provide for a broader group of controlled
vocabularies than those that fit the standard definition of “thesaurus.” This
includes, for example, ontologies, classifications, taxonomies and subject
headings, in addition to standard thesauri. The primary concern is with
shareability (interoperability), rather than with construction or display.
Therefore, this new standard will probably not supersede Z39.19, but supplement
it.
Overall, I think Aitchison and
Clarke’s article is very thorough and offers a lot of insight into the world of
thesauri. This is a great read for anyone interested in the development of
thesauri or the organization of information. They have a ton of information and
references to support their examples, and write in a way that is easy for even
beginners to be able to pull information from the article and form new ideas
and appreciation for the thesaurus in their life!
_______________________________________________________________________
For more information, check out the full article (citation below)!
Aitchison, J., & Clarke, S.D. (2004). The thesaurus: A historical viewpoint, with a look to the future. Cataloging & Classification Quarterly 37(3/4):5-21.
Good work!
ReplyDeleteDr. MacCall