36:
496:
This is particularly problematic when the search question involves terms that are sufficiently tangential to the subject area such that the indexer might have decided to tag it using a different term (but the searcher might consider the same). Essentially, this can be avoided only by an experienced
308:
are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's
512:
Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly. Even in an ideal scenario, a controlled vocabulary is often less specific than the words of the text itself. Indexers trying to choose the appropriate index
527:
Word choice in chosen vocabularies is not neutral, and the indexer must carefully consider the ethics of their word choices. For example, traditionally colonialist terms have often been the preferred terms in chosen vocabularies when discussing First
Nations issues, which has caused controversy.
625:
It is unlikely that a single metadata scheme will ever succeed in describing the content of the entire Web. To create a
Semantic Web, it may be necessary to draw from two or more metadata systems to describe a Web page's contents. The eXchangeable Faceted Metadata Language (XFML) is designed to
516:
The use of controlled vocabularies can be costly compared to free text searches because human experts or expensive automated systems are necessary to index each entry. Furthermore, the user has to be familiar with the controlled vocabulary scheme to make best use of the system. But as already
500:
Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity is low. For example, an article might mention football as a secondary focus, and the indexer might decide not to tag it with "football" because it is not important enough compared to the main
373:
When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of the work will not be described with index terms. In general the higher the indexing
565:
databases appeared; these databases contain the full text of the index articles as well as the bibliographic information. Online bibliographic databases have migrated to the
Internet and are now publicly available; however, most are proprietary and can be expensive to use. Students enrolled in
287:
Subject headings tend to use more pre-coordination of terms such that the designer of the controlled vocabulary will combine various concepts together to form one preferred subject heading. (e.g., children and terrorism) while thesauri tend to use singular direct terms. Thesauri list not only
637:
define the concepts and relationships (terms) used to describe a field of interest or area of concern. For instance, to declare a person in a machine-readable format, a vocabulary is needed that has the formal definition of "Person", such as the Friend of a Friend
207:(a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms (
472:
Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually
508:
On the other hand, free text searches have high exhaustivity (every word is searched) so although it has much lower precision, it has potential for high recall as long as the searcher overcome the problem of synonyms by entering every combination.
339:
Lastly the amount of pre-coordination (in which case the degree of enumeration versus synthesis becomes an issue) and post-coordination in the system is another important issue. Controlled vocabulary elements (terms/phrases) employed as
556:. Subsequently, for-profit firms (called Abstracting and indexing services) emerged to index the fast-growing literature in every field of knowledge. In the 1960s, an online bibliographic database industry developed based on dialup
480:
In some cases controlled vocabulary can enhance recall as well, because unlike natural language schemes, once the correct preferred term is searched, there is no need to search for other terms that might be synonyms of that term.
288:
equivalent terms but also narrower, broader terms and related terms among various preferred and non-preferred (but potentially synonymous) terms, while historically most subject headings did not. For example, the
283:
Because of the card catalog system, subject headings tend to have terms that are in indirect order (though with the rise of automated systems this is being removed), while thesaurus terms are always in direct
199:
between concepts and preferred terms. In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency.
642:) vocabulary, which has a Person class that defines typical properties of a person including, but not limited to, name, honorific prefix, affiliation, email address, and homepage, or the Person vocabulary of
560:
networking. These services were seldom made available to the public because they were difficult to use; specialist librarians called search intermediaries handled the searching job. In the 1980s, the first
336:
When selecting terms for a controlled vocabulary, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language.
578:. The use of controlled vocabulary ensures that everyone is using the same word to mean the same thing. This consistency of terms is one of the most important concepts in
381:
as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is
602:
Web searching could be dramatically improved by the development of a controlled vocabulary for describing Web pages; the use of such a vocabulary could culminate in a
385:). These methods have been compared in some studies, such as the 2007 article, "A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search."
566:
colleges and universities may be able to access some of these services without charge; some of these services may be accessible without charge at a public library.
548:. In the 1950s, government agencies began to develop controlled vocabularies for the burgeoning journal literature in specialized fields; an example is the
155:. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to
833:
280:
Subject headings tend to be broader in scope describing whole books, while thesauri tend to be more specialized covering very specific disciplines.
905:
1210:
619:
661:
To use machine-readable terms from any controlled vocabulary, web designers can choose from a variety of annotation formats, including RDFa,
183:
units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of
344:, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as
909:
854:
792:
553:
318:
1205:
883:
100:
513:
terms might misinterpret the author, while this precise problem is not a factor in a free text, as it uses the author's own words.
465:
therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by
72:
1230:
292:
itself did not have much syndetic structure until 1943, and it was not until 1985 when it began to adopt the thesauri type term "
17:
330:
53:
919:
806:
79:
537:
310:
204:
763:
289:
1042:
775:
86:
1215:
394:
144:
119:
778:
Links to examples of thesauri and classification schemes used in the domain of
Agriculture, Fisheries, Forestry etc.
1220:
1200:
1135:
734:
68:
743:
545:
168:
670:
490:
326:
57:
152:
1062:
830:
393:
Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce
1235:
520:
Numerous methodologies have been developed to assist in the creation of controlled vocabularies, including
719:
689:
497:
user of controlled vocabulary whose understanding of the vocabulary coincides with that of the indexer.
1083:
360:
Controlled indexing language – only approved terms can be used by the indexer to describe the document
1161:
493:, in that it will fail to retrieve some documents that are actually relevant to the search question.
450:
1109:
911:
Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works
798:
Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works
707:
549:
369:
Free indexing language – any term (not only from the document) can be used to describe the document
314:
93:
277:
by catalogers while thesauri were used by indexers to apply index terms to documents and articles.
269:. While the differences between the two are diminishing, there are still some minor differences.
626:
enable controlled vocabulary creators to publish and share metadata systems. XFML is designed on
575:
501:
focus. But it turns out that for the searcher that article is relevant and hence recall fails. A
46:
265:
There are two main kinds of controlled vocabulary tools used in libraries: subject headings and
746: – Transformation aided by semantic equivalence statements within a controlled vocabulary.
627:
521:
366:
indexing language – any term from the document in question can be used to describe the document
322:
914:. Getty Research Institute (1st ed.). Los Angeles, California: Getty Research Institute.
1225:
850:
935:
Moskovitch, Robert; Martins, Susana B.; Behiri, Eytan; Weiss, Aviram; Shahar, Yuval (2007).
710: – Extraction of named entity mentions in unstructured text into pre-defined categories
1240:
704: – Mark-up language – or grammar – for controlled vocabularies developed by IMS Global
583:
875:
8:
695:
1021:
969:
936:
172:
148:
1025:
1013:
974:
956:
915:
812:
802:
683:
579:
458:
446:
420:
937:"A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search"
1005:
964:
948:
662:
502:
402:
378:
363:
156:
136:
1009:
524:, which enables a given data record or document to be described in multiple ways.
1066:
1046:
837:
454:
406:
398:
274:
238:(terms chosen by considering the structure, scope of the controlled vocabulary).
140:
993:
135:
provide a way to organize knowledge for subsequent retrieval. They are used in
466:
434:
341:
180:
1194:
1039:
1017:
960:
816:
574:
In large organizations, controlled vocabularies may be introduced to improve
698: – List of words used by lexicographers to write dictionary definitions
517:
mentioned, the control of synonyms, homographs can help increase precision.
978:
796:
713:
634:
615:
603:
591:
541:
442:
716: – System of names or terms in a particular field of arts or sciences
606:, in which the content of Web pages is described using a machine-readable
544:, the study and classification of books. They were initially developed in
257:
to ensure that each preferred term or heading refers to only one concept.
1185:
728:
651:
611:
438:
305:
1131:
952:
655:
647:
643:
416:
242:
184:
614:
Initiative. An example of a controlled vocabulary which is usable for
994:"Controlled Vocabularies: Past, Present and Future of Subject Access"
562:
474:
234:(what terms are generally used in the literature and documents), and
196:
673:
serializations (RDF/XML, Turtle, N3, TriG, TriX) in external files.
35:
701:
607:
587:
345:
293:
266:
192:
188:
646:. Similarly, a book can be described using the Book vocabulary of
273:
Historically, subject headings were designed to describe books in
1059:
731: – Academic discipline studying terms and their general uses
666:
469:
the documents in such a way that the ambiguities are eliminated.
297:
594:
instead of slightly different ones to refer to the same thing.
425:
176:
801:(1st ed.). Los Angeles, Calif: Getty Research Institute.
586:, where effort is expended to use the same word throughout a
241:
Controlled vocabularies also typically handle the problem of
934:
876:"Controlled Vocabularies | Librarians | Library of Congress"
610:
scheme. One of the first proposals for such a scheme is the
1079:
639:
557:
1157:
686: – Unique headings used for bibliographic information
489:
A controlled vocabulary search may lead to unsatisfactory
226:
Choices of preferred terms are based on the principles of
1105:
766:
Links to examples of thesauri and classification schemes.
374:
exhaustivity, the more terms indexed for each document.
171:, controlled vocabulary is a carefully selected list of
941:
Journal of the
American Medical Informatics Association
724:
Pages displaying short descriptions of redirect targets
505:
would automatically pick up that article regardless.
419:. Worldwide the most popular of these team sports is
397:
items in the retrieval list. These irrelevant items (
309:
text. Well known subject heading systems include the
162:
739:
Pages displaying wikidata descriptions as a fallback
597:
60:. Unsourced material may be challenged and removed.
654:vocabulary, an event with the Event vocabulary of
356:There are three main types of indexing languages.
1192:
401:) are often caused by the inherent ambiguity of
1132:"Dublin Core Metadata Element Set, Version 1.1"
159:vocabularies, which have no such restriction.
906:"3. Relationships in Controlled Vocabularies"
722: – Specification of a conceptualization
1186:Directory of Linked Open Vocabularies (LOV)
415:is the name given to a number of different
569:
319:United States National Library of Medicine
260:
998:Cataloging & Classification Quarterly
968:
120:Learn how and when to remove this message
903:
851:"Karl Fast, Fred Leise and Mike Steckel"
790:
650:and general publication terms from the
249:has to be qualified to refer to either
245:with qualifiers. For example, the term
14:
1193:
1060:eXchangeable Faceted Metadata Language
793:"2. What Are Controlled Vocabularies?"
230:(what terms users are likely to use),
1211:Library cataloging and classification
1080:"The Person vocabulary of Schema.org"
991:
536:Controlled vocabularies, such as the
351:
1158:"The Event vocabulary of Schema.org"
786:
784:
692: – Subset of a natural language
538:Library of Congress Subject Headings
205:Library of Congress Subject Headings
58:adding citations to reliable sources
29:
1138:from the original on 16 August 2013
1106:"The Book vocabulary of Schema.org"
290:Library of Congress Subject Heading
24:
1164:from the original on 13 March 2015
1112:from the original on 11 March 2015
702:IMS Vocabulary Definition Exchange
423:, which also happens to be called
325:. Well known thesauri include the
163:In library and information science
25:
1252:
1179:
1086:from the original on 28 July 2015
781:
554:U.S. National Library of Medicine
223:), among other difficult issues.
215:), and choices between synonyms (
1206:Information retrieval techniques
735:Universal Data Element Framework
598:Semantic web and structured data
540:, are an essential component of
34:
1150:
1124:
1098:
1072:
1052:
1032:
992:Smith, Catherine (2021-04-03).
886:from the original on 2019-11-16
857:from the original on 2017-11-17
744:Vocabulary-based transformation
633:Controlled vocabularies of the
546:library and information science
531:
429:in several countries. The word
169:library and information science
45:needs additional citations for
1231:Ontology (information science)
985:
928:
897:
868:
843:
831:A taxonomy primer // dead link
823:
769:
757:
327:Art and Architecture Thesaurus
153:knowledge organization systems
27:Method of organizing knowledge
13:
1:
1010:10.1080/01639374.2021.1881007
751:
737: – controlled vocabulary
388:
7:
904:Harpring, Patricia (2010).
791:Harpring, Patricia (2010).
720:Ontology (computer science)
690:Controlled natural language
676:
484:
10:
1257:
311:Library of Congress system
451:Australian rules football
1216:Knowledge representation
708:Named-entity recognition
552:(MeSH) developed by the
550:Medical Subject Headings
405:. Take the English word
315:Medical Subject Headings
1221:Technical communication
1201:Controlled vocabularies
880:The Library of Congress
776:Controlled Vocabularies
764:Controlled Vocabularies
576:technical communication
570:Technical communication
261:Types used in libraries
133:Controlled vocabularies
69:"Controlled vocabulary"
18:Controlled vocabularies
628:faceted classification
522:faceted classification
477:to the search topic).
317:(MeSH) created by the
213:Periplaneta americana
853:. 16 December 2002.
584:knowledge management
421:association football
203:For example, in the
179:, which are used to
54:improve this article
1236:Information science
953:10.1197/jamia.M1953
696:Defining vocabulary
433:is also applied to
1065:2012-02-08 at the
1045:2007-05-08 at the
836:2016-03-05 at the
669:in the markup, or
616:indexing web pages
352:Indexing languages
236:structural warrant
921:978-1-60606-150-3
808:978-1-60606-018-6
684:Authority control
580:technical writing
459:Canadian football
447:American football
130:
129:
122:
104:
16:(Redirected from
1248:
1174:
1173:
1171:
1169:
1154:
1148:
1147:
1145:
1143:
1128:
1122:
1121:
1119:
1117:
1102:
1096:
1095:
1093:
1091:
1076:
1070:
1056:
1050:
1036:
1030:
1029:
1004:(2–3): 186–202.
989:
983:
982:
972:
932:
926:
925:
901:
895:
894:
892:
891:
872:
866:
865:
863:
862:
847:
841:
827:
821:
820:
788:
779:
773:
767:
761:
740:
725:
503:free text search
403:natural language
379:free text search
377:In recent years
364:Natural language
275:library catalogs
232:literary warrant
157:natural language
141:subject headings
137:subject indexing
125:
118:
114:
111:
105:
103:
62:
38:
30:
21:
1256:
1255:
1251:
1250:
1249:
1247:
1246:
1245:
1191:
1190:
1182:
1177:
1167:
1165:
1156:
1155:
1151:
1141:
1139:
1130:
1129:
1125:
1115:
1113:
1104:
1103:
1099:
1089:
1087:
1078:
1077:
1073:
1067:Wayback Machine
1057:
1053:
1047:Wayback Machine
1038:Cory Doctorow,
1037:
1033:
990:
986:
933:
929:
922:
902:
898:
889:
887:
874:
873:
869:
860:
858:
849:
848:
844:
838:Wayback Machine
828:
824:
809:
789:
782:
774:
770:
762:
758:
754:
749:
738:
723:
679:
663:HTML5 Microdata
600:
572:
534:
487:
461:. A search for
455:Gaelic football
399:false positives
391:
354:
263:
165:
126:
115:
109:
106:
63:
61:
51:
39:
28:
23:
22:
15:
12:
11:
5:
1254:
1244:
1243:
1238:
1233:
1228:
1223:
1218:
1213:
1208:
1203:
1189:
1188:
1181:
1180:External links
1178:
1176:
1175:
1149:
1123:
1097:
1071:
1058:Mark Pilgrim,
1051:
1031:
984:
947:(2): 164–174.
927:
920:
896:
867:
842:
822:
807:
780:
768:
755:
753:
750:
748:
747:
741:
732:
726:
717:
711:
705:
699:
693:
687:
680:
678:
675:
599:
596:
571:
568:
533:
530:
486:
483:
435:rugby football
390:
387:
371:
370:
367:
361:
353:
350:
302:
301:
285:
281:
278:
262:
259:
164:
161:
128:
127:
42:
40:
33:
26:
9:
6:
4:
3:
2:
1253:
1242:
1239:
1237:
1234:
1232:
1229:
1227:
1224:
1222:
1219:
1217:
1214:
1212:
1209:
1207:
1204:
1202:
1199:
1198:
1196:
1187:
1184:
1183:
1163:
1159:
1153:
1137:
1133:
1127:
1111:
1107:
1101:
1085:
1081:
1075:
1068:
1064:
1061:
1055:
1048:
1044:
1041:
1035:
1027:
1023:
1019:
1015:
1011:
1007:
1003:
999:
995:
988:
980:
976:
971:
966:
962:
958:
954:
950:
946:
942:
938:
931:
923:
917:
913:
912:
907:
900:
885:
881:
877:
871:
856:
852:
846:
839:
835:
832:
826:
818:
814:
810:
804:
800:
799:
794:
787:
785:
777:
772:
765:
760:
756:
745:
742:
736:
733:
730:
727:
721:
718:
715:
712:
709:
706:
703:
700:
697:
694:
691:
688:
685:
682:
681:
674:
672:
668:
664:
659:
658:, and so on.
657:
653:
649:
645:
641:
636:
631:
629:
623:
621:
617:
613:
609:
605:
595:
593:
589:
585:
581:
577:
567:
564:
559:
555:
551:
547:
543:
539:
529:
525:
523:
518:
514:
510:
506:
504:
498:
494:
492:
482:
478:
476:
470:
468:
464:
460:
456:
452:
448:
444:
440:
436:
432:
428:
427:
422:
418:
414:
411:for example.
410:
409:
404:
400:
396:
386:
384:
380:
375:
368:
365:
362:
359:
358:
357:
349:
347:
343:
337:
334:
332:
328:
324:
320:
316:
312:
307:
299:
295:
291:
286:
282:
279:
276:
272:
271:
270:
268:
258:
256:
252:
251:swimming pool
248:
244:
239:
237:
233:
229:
224:
222:
218:
214:
210:
206:
201:
198:
194:
190:
186:
182:
178:
174:
170:
160:
158:
154:
150:
146:
142:
138:
134:
124:
121:
113:
102:
99:
95:
92:
88:
85:
81:
78:
74:
71: –
70:
66:
65:Find sources:
59:
55:
49:
48:
43:This article
41:
37:
32:
31:
19:
1226:Semantic Web
1166:. Retrieved
1152:
1140:. Retrieved
1126:
1114:. Retrieved
1100:
1088:. Retrieved
1074:
1054:
1034:
1001:
997:
987:
944:
940:
930:
910:
899:
888:. Retrieved
879:
870:
859:. Retrieved
845:
829:Amy Warner,
825:
797:
771:
759:
714:Nomenclature
660:
635:Semantic Web
632:
630:principles.
624:
604:Semantic Web
601:
592:organization
573:
542:bibliography
535:
532:Applications
526:
519:
515:
511:
507:
499:
495:
488:
479:
471:
462:
443:rugby league
430:
424:
412:
407:
392:
382:
376:
372:
355:
338:
335:
303:
294:Broader term
264:
254:
253:or the game
250:
246:
240:
235:
231:
228:user warrant
227:
225:
220:
216:
212:
208:
202:
166:
132:
131:
116:
107:
97:
90:
83:
76:
64:
52:Please help
47:verification
44:
1241:Identifiers
729:Terminology
652:Dublin Core
612:Dublin Core
439:rugby union
417:team sports
333:Thesaurus.
298:Narrow term
1195:Categories
890:2018-05-22
861:2014-09-15
752:References
656:Schema.org
648:Schema.org
644:Schema.org
395:irrelevant
389:Advantages
243:homographs
217:automobile
185:homographs
151:and other
149:taxonomies
80:newspapers
1026:233205938
1018:0163-9374
961:1067-5027
817:456174098
563:full text
209:cockroach
197:bijection
193:polysemes
139:schemes,
110:June 2012
1168:13 March
1162:Archived
1142:13 March
1136:Archived
1116:13 March
1110:Archived
1090:13 March
1084:Archived
1063:Archived
1043:Archived
1040:Metacrap
979:17213502
884:Archived
855:Archived
834:Archived
677:See also
608:metadata
588:document
485:Problems
475:relevant
463:football
431:football
413:Football
408:football
346:metadata
329:and the
267:thesauri
189:synonyms
145:thesauri
970:2213470
667:JSON-LD
467:tagging
383:indexed
296:" and "
219:versus
211:versus
177:phrases
94:scholar
1024:
1016:
977:
967:
959:
918:
815:
805:
491:recall
457:, and
426:soccer
321:, and
284:order.
96:
89:
82:
75:
67:
1022:S2CID
665:, or
323:Sears
306:terms
195:by a
173:words
101:JSTOR
87:books
1170:2015
1144:2015
1118:2015
1092:2015
1014:ISSN
975:PMID
957:ISSN
916:ISBN
813:OCLC
803:ISBN
640:FOAF
582:and
558:X.25
441:and
342:tags
331:ERIC
304:The
255:pool
247:pool
191:and
175:and
73:news
1006:doi
965:PMC
949:doi
671:RDF
620:PSH
618:is
590:or
445:),
221:car
181:tag
167:In
56:by
1197::
1160:.
1134:.
1108:.
1082:.
1020:.
1012:.
1002:59
1000:.
996:.
973:.
963:.
955:.
945:14
943:.
939:.
908:.
882:.
878:.
811:.
795:.
783:^
622:.
453:,
449:,
348:.
313:,
300:".
187:,
147:,
143:,
1172:.
1146:.
1120:.
1094:.
1069:.
1049:.
1028:.
1008::
981:.
951::
924:.
893:.
864:.
840:.
819:.
638:(
437:(
123:)
117:(
112:)
108:(
98:·
91:·
84:·
77:·
50:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.