Knowledge

Controlled vocabulary

Source 📝

25: 485:
This is particularly problematic when the search question involves terms that are sufficiently tangential to the subject area such that the indexer might have decided to tag it using a different term (but the searcher might consider the same). Essentially, this can be avoided only by an experienced
297:
are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's
501:
Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly. Even in an ideal scenario, a controlled vocabulary is often less specific than the words of the text itself. Indexers trying to choose the appropriate index
516:
Word choice in chosen vocabularies is not neutral, and the indexer must carefully consider the ethics of their word choices. For example, traditionally colonialist terms have often been the preferred terms in chosen vocabularies when discussing First Nations issues, which has caused controversy.
614:
It is unlikely that a single metadata scheme will ever succeed in describing the content of the entire Web. To create a Semantic Web, it may be necessary to draw from two or more metadata systems to describe a Web page's contents. The eXchangeable Faceted Metadata Language (XFML) is designed to
505:
The use of controlled vocabularies can be costly compared to free text searches because human experts or expensive automated systems are necessary to index each entry. Furthermore, the user has to be familiar with the controlled vocabulary scheme to make best use of the system. But as already
489:
Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity is low. For example, an article might mention football as a secondary focus, and the indexer might decide not to tag it with "football" because it is not important enough compared to the main
362:
When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of the work will not be described with index terms. In general the higher the indexing
554:
databases appeared; these databases contain the full text of the index articles as well as the bibliographic information. Online bibliographic databases have migrated to the Internet and are now publicly available; however, most are proprietary and can be expensive to use. Students enrolled in
276:
Subject headings tend to use more pre-coordination of terms such that the designer of the controlled vocabulary will combine various concepts together to form one preferred subject heading. (e.g., children and terrorism) while thesauri tend to use singular direct terms. Thesauri list not only
626:
define the concepts and relationships (terms) used to describe a field of interest or area of concern. For instance, to declare a person in a machine-readable format, a vocabulary is needed that has the formal definition of "Person", such as the Friend of a Friend
196:(a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms ( 461:
Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually
497:
On the other hand, free text searches have high exhaustivity (every word is searched) so although it has much lower precision, it has potential for high recall as long as the searcher overcome the problem of synonyms by entering every combination.
328:
Lastly the amount of pre-coordination (in which case the degree of enumeration versus synthesis becomes an issue) and post-coordination in the system is another important issue. Controlled vocabulary elements (terms/phrases) employed as
545:. Subsequently, for-profit firms (called Abstracting and indexing services) emerged to index the fast-growing literature in every field of knowledge. In the 1960s, an online bibliographic database industry developed based on dialup 469:
In some cases controlled vocabulary can enhance recall as well, because unlike natural language schemes, once the correct preferred term is searched, there is no need to search for other terms that might be synonyms of that term.
277:
equivalent terms but also narrower, broader terms and related terms among various preferred and non-preferred (but potentially synonymous) terms, while historically most subject headings did not. For example, the
272:
Because of the card catalog system, subject headings tend to have terms that are in indirect order (though with the rise of automated systems this is being removed), while thesaurus terms are always in direct
188:
between concepts and preferred terms. In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency.
631:) vocabulary, which has a Person class that defines typical properties of a person including, but not limited to, name, honorific prefix, affiliation, email address, and homepage, or the Person vocabulary of 549:
networking. These services were seldom made available to the public because they were difficult to use; specialist librarians called search intermediaries handled the searching job. In the 1980s, the first
325:
When selecting terms for a controlled vocabulary, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language.
567:. The use of controlled vocabulary ensures that everyone is using the same word to mean the same thing. This consistency of terms is one of the most important concepts in 370:
as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is
591:
Web searching could be dramatically improved by the development of a controlled vocabulary for describing Web pages; the use of such a vocabulary could culminate in a
374:). These methods have been compared in some studies, such as the 2007 article, "A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search." 555:
colleges and universities may be able to access some of these services without charge; some of these services may be accessible without charge at a public library.
537:. In the 1950s, government agencies began to develop controlled vocabularies for the burgeoning journal literature in specialized fields; an example is the 144:. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to 822: 269:
Subject headings tend to be broader in scope describing whole books, while thesauri tend to be more specialized covering very specific disciplines.
894: 1199: 608: 650:
To use machine-readable terms from any controlled vocabulary, web designers can choose from a variety of annotation formats, including RDFa,
172:
units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of
333:, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as 898: 843: 781: 542: 307: 1194: 872: 89: 502:
terms might misinterpret the author, while this precise problem is not a factor in a free text, as it uses the author's own words.
454:
therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by
61: 1219: 281:
itself did not have much syndetic structure until 1943, and it was not until 1985 when it began to adopt the thesauri type term "
319: 42: 908: 795: 68: 526: 299: 193: 752: 278: 1031: 764: 75: 1204: 383: 133: 108: 767:
Links to examples of thesauri and classification schemes used in the domain of Agriculture, Fisheries, Forestry etc.
1209: 1189: 1124: 723: 57: 732: 534: 157: 659: 479: 315: 46: 141: 1051: 819: 382:
Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce
1224: 509:
Numerous methodologies have been developed to assist in the creation of controlled vocabularies, including
708: 678: 486:
user of controlled vocabulary whose understanding of the vocabulary coincides with that of the indexer.
1072: 349:
Controlled indexing language – only approved terms can be used by the indexer to describe the document
1150: 482:, in that it will fail to retrieve some documents that are actually relevant to the search question. 439: 1098: 900:
Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works
787:
Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works
696: 538: 358:
Free indexing language – any term (not only from the document) can be used to describe the document
303: 82: 266:
by catalogers while thesauri were used by indexers to apply index terms to documents and articles.
258:. While the differences between the two are diminishing, there are still some minor differences. 615:
enable controlled vocabulary creators to publish and share metadata systems. XFML is designed on
564: 490:
focus. But it turns out that for the searcher that article is relevant and hence recall fails. A
35: 254:
There are two main kinds of controlled vocabulary tools used in libraries: subject headings and
735: – Transformation aided by semantic equivalence statements within a controlled vocabulary. 616: 510: 355:
indexing language – any term from the document in question can be used to describe the document
311: 903:. Getty Research Institute (1st ed.). Los Angeles, California: Getty Research Institute. 1214: 839: 924:
Moskovitch, Robert; Martins, Susana B.; Behiri, Eytan; Weiss, Aviram; Shahar, Yuval (2007).
699: – Extraction of named entity mentions in unstructured text into pre-defined categories 1229: 693: – Mark-up language – or grammar – for controlled vocabularies developed by IMS Global 572: 864: 8: 684: 1010: 958: 925: 161: 137: 1014: 1002: 963: 945: 904: 801: 791: 672: 568: 447: 435: 409: 926:"A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search" 994: 953: 937: 651: 491: 391: 367: 352: 145: 125: 998: 513:, which enables a given data record or document to be described in multiple ways. 1055: 1035: 826: 443: 395: 387: 263: 227:(terms chosen by considering the structure, scope of the controlled vocabulary). 129: 982: 124:
provide a way to organize knowledge for subsequent retrieval. They are used in
455: 423: 330: 169: 1183: 1028: 1006: 949: 805: 563:
In large organizations, controlled vocabularies may be introduced to improve
687: – List of words used by lexicographers to write dictionary definitions 506:
mentioned, the control of synonyms, homographs can help increase precision.
967: 785: 702: 623: 604: 592: 580: 530: 431: 705: – System of names or terms in a particular field of arts or sciences 595:, in which the content of Web pages is described using a machine-readable 533:, the study and classification of books. They were initially developed in 246:
to ensure that each preferred term or heading refers to only one concept.
1174: 717: 640: 600: 427: 294: 1120: 941: 644: 636: 632: 405: 231: 173: 603:
Initiative. An example of a controlled vocabulary which is usable for
983:"Controlled Vocabularies: Past, Present and Future of Subject Access" 551: 463: 223:(what terms are generally used in the literature and documents), and 185: 662:
serializations (RDF/XML, Turtle, N3, TriG, TriX) in external files.
24: 690: 596: 576: 334: 282: 255: 181: 177: 635:. Similarly, a book can be described using the Book vocabulary of 262:
Historically, subject headings were designed to describe books in
1048: 720: – Academic discipline studying terms and their general uses 655: 458:
the documents in such a way that the ambiguities are eliminated.
286: 583:
instead of slightly different ones to refer to the same thing.
414: 165: 790:(1st ed.). Los Angeles, Calif: Getty Research Institute. 575:, where effort is expended to use the same word throughout a 230:
Controlled vocabularies also typically handle the problem of
923: 865:"Controlled Vocabularies | Librarians | Library of Congress" 599:
scheme. One of the first proposals for such a scheme is the
1068: 628: 546: 1146: 675: – Unique headings used for bibliographic information 478:
A controlled vocabulary search may lead to unsatisfactory
215:
Choices of preferred terms are based on the principles of
1094: 755:
Links to examples of thesauri and classification schemes.
363:
exhaustivity, the more terms indexed for each document.
160:, controlled vocabulary is a carefully selected list of 930:
Journal of the American Medical Informatics Association
713:
Pages displaying short descriptions of redirect targets
494:
would automatically pick up that article regardless.
408:. Worldwide the most popular of these team sports is 386:
items in the retrieval list. These irrelevant items (
298:
text. Well known subject heading systems include the
151: 728:
Pages displaying wikidata descriptions as a fallback
586: 49:. Unsourced material may be challenged and removed. 643:vocabulary, an event with the Event vocabulary of 345:There are three main types of indexing languages. 1181: 390:) are often caused by the inherent ambiguity of 1121:"Dublin Core Metadata Element Set, Version 1.1" 148:vocabularies, which have no such restriction. 895:"3. Relationships in Controlled Vocabularies" 711: – Specification of a conceptualization 1175:Directory of Linked Open Vocabularies (LOV) 404:is the name given to a number of different 558: 308:United States National Library of Medicine 249: 987:Cataloging & Classification Quarterly 957: 109:Learn how and when to remove this message 892: 840:"Karl Fast, Fred Leise and Mike Steckel" 779: 639:and general publication terms from the 238:has to be qualified to refer to either 234:with qualifiers. For example, the term 1182: 1049:eXchangeable Faceted Metadata Language 782:"2. What Are Controlled Vocabularies?" 219:(what terms users are likely to use), 1200:Library cataloging and classification 1069:"The Person vocabulary of Schema.org" 980: 525:Controlled vocabularies, such as the 340: 1147:"The Event vocabulary of Schema.org" 775: 773: 681: – Subset of a natural language 527:Library of Congress Subject Headings 194:Library of Congress Subject Headings 47:adding citations to reliable sources 18: 1127:from the original on 16 August 2013 1095:"The Book vocabulary of Schema.org" 279:Library of Congress Subject Heading 13: 1153:from the original on 13 March 2015 1101:from the original on 11 March 2015 691:IMS Vocabulary Definition Exchange 412:, which also happens to be called 314:. Well known thesauri include the 152:In library and information science 14: 1241: 1168: 1075:from the original on 28 July 2015 770: 543:U.S. National Library of Medicine 212:), among other difficult issues. 204:), and choices between synonyms ( 1195:Information retrieval techniques 724:Universal Data Element Framework 587:Semantic web and structured data 529:, are an essential component of 23: 1139: 1113: 1087: 1061: 1041: 1021: 981:Smith, Catherine (2021-04-03). 875:from the original on 2019-11-16 846:from the original on 2017-11-17 733:Vocabulary-based transformation 622:Controlled vocabularies of the 535:library and information science 520: 418:in several countries. The word 158:library and information science 34:needs additional citations for 1220:Ontology (information science) 974: 917: 886: 857: 832: 820:A taxonomy primer // dead link 812: 758: 746: 316:Art and Architecture Thesaurus 142:knowledge organization systems 16:Method of organizing knowledge 1: 999:10.1080/01639374.2021.1881007 740: 726: – controlled vocabulary 377: 7: 893:Harpring, Patricia (2010). 780:Harpring, Patricia (2010). 709:Ontology (computer science) 679:Controlled natural language 665: 473: 10: 1246: 300:Library of Congress system 440:Australian rules football 1205:Knowledge representation 697:Named-entity recognition 541:(MeSH) developed by the 539:Medical Subject Headings 394:. Take the English word 304:Medical Subject Headings 1210:Technical communication 1190:Controlled vocabularies 869:The Library of Congress 765:Controlled Vocabularies 753:Controlled Vocabularies 565:technical communication 559:Technical communication 250:Types used in libraries 122:Controlled vocabularies 58:"Controlled vocabulary" 617:faceted classification 511:faceted classification 466:to the search topic). 306:(MeSH) created by the 202:Periplaneta americana 842:. 16 December 2002. 573:knowledge management 410:association football 192:For example, in the 168:, which are used to 43:improve this article 1225:Information science 942:10.1197/jamia.M1953 685:Defining vocabulary 422:is also applied to 1054:2012-02-08 at the 1034:2007-05-08 at the 825:2016-03-05 at the 658:in the markup, or 605:indexing web pages 341:Indexing languages 225:structural warrant 910:978-1-60606-150-3 797:978-1-60606-018-6 673:Authority control 569:technical writing 448:Canadian football 436:American football 119: 118: 111: 93: 1237: 1163: 1162: 1160: 1158: 1143: 1137: 1136: 1134: 1132: 1117: 1111: 1110: 1108: 1106: 1091: 1085: 1084: 1082: 1080: 1065: 1059: 1045: 1039: 1025: 1019: 1018: 993:(2–3): 186–202. 978: 972: 971: 961: 921: 915: 914: 890: 884: 883: 881: 880: 861: 855: 854: 852: 851: 836: 830: 816: 810: 809: 777: 768: 762: 756: 750: 729: 714: 492:free text search 392:natural language 368:free text search 366:In recent years 353:Natural language 264:library catalogs 221:literary warrant 146:natural language 130:subject headings 126:subject indexing 114: 107: 103: 100: 94: 92: 51: 27: 19: 1245: 1244: 1240: 1239: 1238: 1236: 1235: 1234: 1180: 1179: 1171: 1166: 1156: 1154: 1145: 1144: 1140: 1130: 1128: 1119: 1118: 1114: 1104: 1102: 1093: 1092: 1088: 1078: 1076: 1067: 1066: 1062: 1056:Wayback Machine 1046: 1042: 1036:Wayback Machine 1027:Cory Doctorow, 1026: 1022: 979: 975: 922: 918: 911: 891: 887: 878: 876: 863: 862: 858: 849: 847: 838: 837: 833: 827:Wayback Machine 817: 813: 798: 778: 771: 763: 759: 751: 747: 743: 738: 727: 712: 668: 652:HTML5 Microdata 589: 561: 523: 476: 450:. A search for 444:Gaelic football 388:false positives 380: 343: 252: 154: 115: 104: 98: 95: 52: 50: 40: 28: 17: 12: 11: 5: 1243: 1233: 1232: 1227: 1222: 1217: 1212: 1207: 1202: 1197: 1192: 1178: 1177: 1170: 1169:External links 1167: 1165: 1164: 1138: 1112: 1086: 1060: 1047:Mark Pilgrim, 1040: 1020: 973: 936:(2): 164–174. 916: 909: 885: 856: 831: 811: 796: 769: 757: 744: 742: 739: 737: 736: 730: 721: 715: 706: 700: 694: 688: 682: 676: 669: 667: 664: 588: 585: 560: 557: 522: 519: 475: 472: 424:rugby football 379: 376: 360: 359: 356: 350: 342: 339: 291: 290: 274: 270: 267: 251: 248: 153: 150: 117: 116: 31: 29: 22: 15: 9: 6: 4: 3: 2: 1242: 1231: 1228: 1226: 1223: 1221: 1218: 1216: 1213: 1211: 1208: 1206: 1203: 1201: 1198: 1196: 1193: 1191: 1188: 1187: 1185: 1176: 1173: 1172: 1152: 1148: 1142: 1126: 1122: 1116: 1100: 1096: 1090: 1074: 1070: 1064: 1057: 1053: 1050: 1044: 1037: 1033: 1030: 1024: 1016: 1012: 1008: 1004: 1000: 996: 992: 988: 984: 977: 969: 965: 960: 955: 951: 947: 943: 939: 935: 931: 927: 920: 912: 906: 902: 901: 896: 889: 874: 870: 866: 860: 845: 841: 835: 828: 824: 821: 815: 807: 803: 799: 793: 789: 788: 783: 776: 774: 766: 761: 754: 749: 745: 734: 731: 725: 722: 719: 716: 710: 707: 704: 701: 698: 695: 692: 689: 686: 683: 680: 677: 674: 671: 670: 663: 661: 657: 653: 648: 647:, and so on. 646: 642: 638: 634: 630: 625: 620: 618: 612: 610: 606: 602: 598: 594: 584: 582: 578: 574: 570: 566: 556: 553: 548: 544: 540: 536: 532: 528: 518: 514: 512: 507: 503: 499: 495: 493: 487: 483: 481: 471: 467: 465: 459: 457: 453: 449: 445: 441: 437: 433: 429: 425: 421: 417: 416: 411: 407: 403: 400:for example. 399: 398: 393: 389: 385: 375: 373: 369: 364: 357: 354: 351: 348: 347: 346: 338: 336: 332: 326: 323: 321: 317: 313: 309: 305: 301: 296: 288: 284: 280: 275: 271: 268: 265: 261: 260: 259: 257: 247: 245: 241: 240:swimming pool 237: 233: 228: 226: 222: 218: 213: 211: 207: 203: 199: 195: 190: 187: 183: 179: 175: 171: 167: 163: 159: 149: 147: 143: 139: 135: 131: 127: 123: 113: 110: 102: 91: 88: 84: 81: 77: 74: 70: 67: 63: 60: –  59: 55: 54:Find sources: 48: 44: 38: 37: 32:This article 30: 26: 21: 20: 1215:Semantic Web 1155:. Retrieved 1141: 1129:. Retrieved 1115: 1103:. Retrieved 1089: 1077:. Retrieved 1063: 1043: 1023: 990: 986: 976: 933: 929: 919: 899: 888: 877:. Retrieved 868: 859: 848:. Retrieved 834: 818:Amy Warner, 814: 786: 760: 748: 703:Nomenclature 649: 624:Semantic Web 621: 619:principles. 613: 593:Semantic Web 590: 581:organization 562: 531:bibliography 524: 521:Applications 515: 508: 504: 500: 496: 488: 484: 477: 468: 460: 451: 432:rugby league 419: 413: 401: 396: 381: 371: 365: 361: 344: 327: 324: 292: 283:Broader term 253: 243: 242:or the game 239: 235: 229: 224: 220: 217:user warrant 216: 214: 209: 205: 201: 197: 191: 155: 121: 120: 105: 96: 86: 79: 72: 65: 53: 41:Please help 36:verification 33: 1230:Identifiers 718:Terminology 641:Dublin Core 601:Dublin Core 428:rugby union 406:team sports 322:Thesaurus. 287:Narrow term 1184:Categories 879:2018-05-22 850:2014-09-15 741:References 645:Schema.org 637:Schema.org 633:Schema.org 384:irrelevant 378:Advantages 232:homographs 206:automobile 174:homographs 140:and other 138:taxonomies 69:newspapers 1015:233205938 1007:0163-9374 950:1067-5027 806:456174098 552:full text 198:cockroach 186:bijection 182:polysemes 128:schemes, 99:June 2012 1157:13 March 1151:Archived 1131:13 March 1125:Archived 1105:13 March 1099:Archived 1079:13 March 1073:Archived 1052:Archived 1032:Archived 1029:Metacrap 968:17213502 873:Archived 844:Archived 823:Archived 666:See also 597:metadata 577:document 474:Problems 464:relevant 452:football 420:football 402:Football 397:football 335:metadata 318:and the 256:thesauri 178:synonyms 134:thesauri 959:2213470 656:JSON-LD 456:tagging 372:indexed 285:" and " 208:versus 200:versus 166:phrases 83:scholar 1013:  1005:  966:  956:  948:  907:  804:  794:  480:recall 446:, and 415:soccer 310:, and 273:order. 85:  78:  71:  64:  56:  1011:S2CID 654:, or 312:Sears 295:terms 184:by a 162:words 90:JSTOR 76:books 1159:2015 1133:2015 1107:2015 1081:2015 1003:ISSN 964:PMID 946:ISSN 905:ISBN 802:OCLC 792:ISBN 629:FOAF 571:and 547:X.25 430:and 331:tags 320:ERIC 293:The 244:pool 236:pool 180:and 164:and 62:news 995:doi 954:PMC 938:doi 660:RDF 609:PSH 607:is 579:or 434:), 210:car 170:tag 156:In 45:by 1186:: 1149:. 1123:. 1097:. 1071:. 1009:. 1001:. 991:59 989:. 985:. 962:. 952:. 944:. 934:14 932:. 928:. 897:. 871:. 867:. 800:. 784:. 772:^ 611:. 442:, 438:, 337:. 302:, 289:". 176:, 136:, 132:, 1161:. 1135:. 1109:. 1083:. 1058:. 1038:. 1017:. 997:: 970:. 940:: 913:. 882:. 853:. 829:. 808:. 627:( 426:( 112:) 106:( 101:) 97:( 87:· 80:· 73:· 66:· 39:.

Index


verification
improve this article
adding citations to reliable sources
"Controlled vocabulary"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
subject indexing
subject headings
thesauri
taxonomies
knowledge organization systems
natural language
library and information science
words
phrases
tag
homographs
synonyms
polysemes
bijection
Library of Congress Subject Headings
homographs
thesauri
library catalogs
Library of Congress Subject Heading

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.