Knowledge

Controlled vocabulary

Source 📝

36: 496:
This is particularly problematic when the search question involves terms that are sufficiently tangential to the subject area such that the indexer might have decided to tag it using a different term (but the searcher might consider the same). Essentially, this can be avoided only by an experienced
308:
are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's
512:
Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly. Even in an ideal scenario, a controlled vocabulary is often less specific than the words of the text itself. Indexers trying to choose the appropriate index
527:
Word choice in chosen vocabularies is not neutral, and the indexer must carefully consider the ethics of their word choices. For example, traditionally colonialist terms have often been the preferred terms in chosen vocabularies when discussing First Nations issues, which has caused controversy.
625:
It is unlikely that a single metadata scheme will ever succeed in describing the content of the entire Web. To create a Semantic Web, it may be necessary to draw from two or more metadata systems to describe a Web page's contents. The eXchangeable Faceted Metadata Language (XFML) is designed to
516:
The use of controlled vocabularies can be costly compared to free text searches because human experts or expensive automated systems are necessary to index each entry. Furthermore, the user has to be familiar with the controlled vocabulary scheme to make best use of the system. But as already
500:
Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity is low. For example, an article might mention football as a secondary focus, and the indexer might decide not to tag it with "football" because it is not important enough compared to the main
373:
When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of the work will not be described with index terms. In general the higher the indexing
565:
databases appeared; these databases contain the full text of the index articles as well as the bibliographic information. Online bibliographic databases have migrated to the Internet and are now publicly available; however, most are proprietary and can be expensive to use. Students enrolled in
287:
Subject headings tend to use more pre-coordination of terms such that the designer of the controlled vocabulary will combine various concepts together to form one preferred subject heading. (e.g., children and terrorism) while thesauri tend to use singular direct terms. Thesauri list not only
637:
define the concepts and relationships (terms) used to describe a field of interest or area of concern. For instance, to declare a person in a machine-readable format, a vocabulary is needed that has the formal definition of "Person", such as the Friend of a Friend
207:(a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms ( 472:
Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually
508:
On the other hand, free text searches have high exhaustivity (every word is searched) so although it has much lower precision, it has potential for high recall as long as the searcher overcome the problem of synonyms by entering every combination.
339:
Lastly the amount of pre-coordination (in which case the degree of enumeration versus synthesis becomes an issue) and post-coordination in the system is another important issue. Controlled vocabulary elements (terms/phrases) employed as
556:. Subsequently, for-profit firms (called Abstracting and indexing services) emerged to index the fast-growing literature in every field of knowledge. In the 1960s, an online bibliographic database industry developed based on dialup 480:
In some cases controlled vocabulary can enhance recall as well, because unlike natural language schemes, once the correct preferred term is searched, there is no need to search for other terms that might be synonyms of that term.
288:
equivalent terms but also narrower, broader terms and related terms among various preferred and non-preferred (but potentially synonymous) terms, while historically most subject headings did not. For example, the
283:
Because of the card catalog system, subject headings tend to have terms that are in indirect order (though with the rise of automated systems this is being removed), while thesaurus terms are always in direct
199:
between concepts and preferred terms. In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency.
642:) vocabulary, which has a Person class that defines typical properties of a person including, but not limited to, name, honorific prefix, affiliation, email address, and homepage, or the Person vocabulary of 560:
networking. These services were seldom made available to the public because they were difficult to use; specialist librarians called search intermediaries handled the searching job. In the 1980s, the first
336:
When selecting terms for a controlled vocabulary, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language.
578:. The use of controlled vocabulary ensures that everyone is using the same word to mean the same thing. This consistency of terms is one of the most important concepts in 381:
as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is
602:
Web searching could be dramatically improved by the development of a controlled vocabulary for describing Web pages; the use of such a vocabulary could culminate in a
385:). These methods have been compared in some studies, such as the 2007 article, "A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search." 566:
colleges and universities may be able to access some of these services without charge; some of these services may be accessible without charge at a public library.
548:. In the 1950s, government agencies began to develop controlled vocabularies for the burgeoning journal literature in specialized fields; an example is the 155:. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to 833: 280:
Subject headings tend to be broader in scope describing whole books, while thesauri tend to be more specialized covering very specific disciplines.
905: 1210: 619: 661:
To use machine-readable terms from any controlled vocabulary, web designers can choose from a variety of annotation formats, including RDFa,
183:
units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of
344:, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as 909: 854: 792: 553: 318: 1205: 883: 100: 513:
terms might misinterpret the author, while this precise problem is not a factor in a free text, as it uses the author's own words.
465:
therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by
72: 1230: 292:
itself did not have much syndetic structure until 1943, and it was not until 1985 when it began to adopt the thesauri type term "
17: 330: 53: 919: 806: 79: 537: 310: 204: 763: 289: 1042: 775: 86: 1215: 394: 144: 119: 778:
Links to examples of thesauri and classification schemes used in the domain of Agriculture, Fisheries, Forestry etc.
1220: 1200: 1135: 734: 68: 743: 545: 168: 670: 490: 326: 57: 152: 1062: 830: 393:
Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce
1235: 520:
Numerous methodologies have been developed to assist in the creation of controlled vocabularies, including
719: 689: 497:
user of controlled vocabulary whose understanding of the vocabulary coincides with that of the indexer.
1083: 360:
Controlled indexing language – only approved terms can be used by the indexer to describe the document
1161: 493:, in that it will fail to retrieve some documents that are actually relevant to the search question. 450: 1109: 911:
Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works
798:
Introduction to controlled vocabularies: terminology for art, architecture, and other cultural works
707: 549: 369:
Free indexing language – any term (not only from the document) can be used to describe the document
314: 93: 277:
by catalogers while thesauri were used by indexers to apply index terms to documents and articles.
269:. While the differences between the two are diminishing, there are still some minor differences. 626:
enable controlled vocabulary creators to publish and share metadata systems. XFML is designed on
575: 501:
focus. But it turns out that for the searcher that article is relevant and hence recall fails. A
46: 265:
There are two main kinds of controlled vocabulary tools used in libraries: subject headings and
746: – Transformation aided by semantic equivalence statements within a controlled vocabulary. 627: 521: 366:
indexing language – any term from the document in question can be used to describe the document
322: 914:. Getty Research Institute (1st ed.). Los Angeles, California: Getty Research Institute. 1225: 850: 935:
Moskovitch, Robert; Martins, Susana B.; Behiri, Eytan; Weiss, Aviram; Shahar, Yuval (2007).
710: – Extraction of named entity mentions in unstructured text into pre-defined categories 1240: 704: – Mark-up language – or grammar – for controlled vocabularies developed by IMS Global 583: 875: 8: 695: 1021: 969: 936: 172: 148: 1025: 1013: 974: 956: 915: 812: 802: 683: 579: 458: 446: 420: 937:"A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search" 1005: 964: 948: 662: 502: 402: 378: 363: 156: 136: 1009: 524:, which enables a given data record or document to be described in multiple ways. 1066: 1046: 837: 454: 406: 398: 274: 238:(terms chosen by considering the structure, scope of the controlled vocabulary). 140: 993: 135:
provide a way to organize knowledge for subsequent retrieval. They are used in
466: 434: 341: 180: 1194: 1039: 1017: 960: 816: 574:
In large organizations, controlled vocabularies may be introduced to improve
698: – List of words used by lexicographers to write dictionary definitions 517:
mentioned, the control of synonyms, homographs can help increase precision.
978: 796: 713: 634: 615: 603: 591: 541: 442: 716: – System of names or terms in a particular field of arts or sciences 606:, in which the content of Web pages is described using a machine-readable 544:, the study and classification of books. They were initially developed in 257:
to ensure that each preferred term or heading refers to only one concept.
1185: 728: 651: 611: 438: 305: 1131: 952: 655: 647: 643: 416: 242: 184: 614:
Initiative. An example of a controlled vocabulary which is usable for
994:"Controlled Vocabularies: Past, Present and Future of Subject Access" 562: 474: 234:(what terms are generally used in the literature and documents), and 196: 673:
serializations (RDF/XML, Turtle, N3, TriG, TriX) in external files.
35: 701: 607: 587: 345: 293: 266: 192: 188: 646:. Similarly, a book can be described using the Book vocabulary of 273:
Historically, subject headings were designed to describe books in
1059: 731: – Academic discipline studying terms and their general uses 666: 469:
the documents in such a way that the ambiguities are eliminated.
297: 594:
instead of slightly different ones to refer to the same thing.
425: 176: 801:(1st ed.). Los Angeles, Calif: Getty Research Institute. 586:, where effort is expended to use the same word throughout a 241:
Controlled vocabularies also typically handle the problem of
934: 876:"Controlled Vocabularies | Librarians | Library of Congress" 610:
scheme. One of the first proposals for such a scheme is the
1079: 639: 557: 1157: 686: – Unique headings used for bibliographic information 489:
A controlled vocabulary search may lead to unsatisfactory
226:
Choices of preferred terms are based on the principles of
1105: 766:
Links to examples of thesauri and classification schemes.
374:
exhaustivity, the more terms indexed for each document.
171:, controlled vocabulary is a carefully selected list of 941:
Journal of the American Medical Informatics Association
724:
Pages displaying short descriptions of redirect targets
505:
would automatically pick up that article regardless.
419:. Worldwide the most popular of these team sports is 397:
items in the retrieval list. These irrelevant items (
309:
text. Well known subject heading systems include the
162: 739:
Pages displaying wikidata descriptions as a fallback
597: 60:. Unsourced material may be challenged and removed. 654:vocabulary, an event with the Event vocabulary of 356:There are three main types of indexing languages. 1192: 401:) are often caused by the inherent ambiguity of 1132:"Dublin Core Metadata Element Set, Version 1.1" 159:vocabularies, which have no such restriction. 906:"3. Relationships in Controlled Vocabularies" 722: – Specification of a conceptualization 1186:Directory of Linked Open Vocabularies (LOV) 415:is the name given to a number of different 569: 319:United States National Library of Medicine 260: 998:Cataloging & Classification Quarterly 968: 120:Learn how and when to remove this message 903: 851:"Karl Fast, Fred Leise and Mike Steckel" 790: 650:and general publication terms from the 249:has to be qualified to refer to either 245:with qualifiers. For example, the term 14: 1193: 1060:eXchangeable Faceted Metadata Language 793:"2. What Are Controlled Vocabularies?" 230:(what terms users are likely to use), 1211:Library cataloging and classification 1080:"The Person vocabulary of Schema.org" 991: 536:Controlled vocabularies, such as the 351: 1158:"The Event vocabulary of Schema.org" 786: 784: 692: – Subset of a natural language 538:Library of Congress Subject Headings 205:Library of Congress Subject Headings 58:adding citations to reliable sources 29: 1138:from the original on 16 August 2013 1106:"The Book vocabulary of Schema.org" 290:Library of Congress Subject Heading 24: 1164:from the original on 13 March 2015 1112:from the original on 11 March 2015 702:IMS Vocabulary Definition Exchange 423:, which also happens to be called 325:. Well known thesauri include the 163:In library and information science 25: 1252: 1179: 1086:from the original on 28 July 2015 781: 554:U.S. National Library of Medicine 223:), among other difficult issues. 215:), and choices between synonyms ( 1206:Information retrieval techniques 735:Universal Data Element Framework 598:Semantic web and structured data 540:, are an essential component of 34: 1150: 1124: 1098: 1072: 1052: 1032: 992:Smith, Catherine (2021-04-03). 886:from the original on 2019-11-16 857:from the original on 2017-11-17 744:Vocabulary-based transformation 633:Controlled vocabularies of the 546:library and information science 531: 429:in several countries. The word 169:library and information science 45:needs additional citations for 1231:Ontology (information science) 985: 928: 897: 868: 843: 831:A taxonomy primer // dead link 823: 769: 757: 327:Art and Architecture Thesaurus 153:knowledge organization systems 27:Method of organizing knowledge 13: 1: 1010:10.1080/01639374.2021.1881007 751: 737: – controlled vocabulary 388: 7: 904:Harpring, Patricia (2010). 791:Harpring, Patricia (2010). 720:Ontology (computer science) 690:Controlled natural language 676: 484: 10: 1257: 311:Library of Congress system 451:Australian rules football 1216:Knowledge representation 708:Named-entity recognition 552:(MeSH) developed by the 550:Medical Subject Headings 405:. Take the English word 315:Medical Subject Headings 1221:Technical communication 1201:Controlled vocabularies 880:The Library of Congress 776:Controlled Vocabularies 764:Controlled Vocabularies 576:technical communication 570:Technical communication 261:Types used in libraries 133:Controlled vocabularies 69:"Controlled vocabulary" 18:Controlled vocabularies 628:faceted classification 522:faceted classification 477:to the search topic). 317:(MeSH) created by the 213:Periplaneta americana 853:. 16 December 2002. 584:knowledge management 421:association football 203:For example, in the 179:, which are used to 54:improve this article 1236:Information science 953:10.1197/jamia.M1953 696:Defining vocabulary 433:is also applied to 1065:2012-02-08 at the 1045:2007-05-08 at the 836:2016-03-05 at the 669:in the markup, or 616:indexing web pages 352:Indexing languages 236:structural warrant 921:978-1-60606-150-3 808:978-1-60606-018-6 684:Authority control 580:technical writing 459:Canadian football 447:American football 130: 129: 122: 104: 16:(Redirected from 1248: 1174: 1173: 1171: 1169: 1154: 1148: 1147: 1145: 1143: 1128: 1122: 1121: 1119: 1117: 1102: 1096: 1095: 1093: 1091: 1076: 1070: 1056: 1050: 1036: 1030: 1029: 1004:(2–3): 186–202. 989: 983: 982: 972: 932: 926: 925: 901: 895: 894: 892: 891: 872: 866: 865: 863: 862: 847: 841: 827: 821: 820: 788: 779: 773: 767: 761: 740: 725: 503:free text search 403:natural language 379:free text search 377:In recent years 364:Natural language 275:library catalogs 232:literary warrant 157:natural language 141:subject headings 137:subject indexing 125: 118: 114: 111: 105: 103: 62: 38: 30: 21: 1256: 1255: 1251: 1250: 1249: 1247: 1246: 1245: 1191: 1190: 1182: 1177: 1167: 1165: 1156: 1155: 1151: 1141: 1139: 1130: 1129: 1125: 1115: 1113: 1104: 1103: 1099: 1089: 1087: 1078: 1077: 1073: 1067:Wayback Machine 1057: 1053: 1047:Wayback Machine 1038:Cory Doctorow, 1037: 1033: 990: 986: 933: 929: 922: 902: 898: 889: 887: 874: 873: 869: 860: 858: 849: 848: 844: 838:Wayback Machine 828: 824: 809: 789: 782: 774: 770: 762: 758: 754: 749: 738: 723: 679: 663:HTML5 Microdata 600: 572: 534: 487: 461:. A search for 455:Gaelic football 399:false positives 391: 354: 263: 165: 126: 115: 109: 106: 63: 61: 51: 39: 28: 23: 22: 15: 12: 11: 5: 1254: 1244: 1243: 1238: 1233: 1228: 1223: 1218: 1213: 1208: 1203: 1189: 1188: 1181: 1180:External links 1178: 1176: 1175: 1149: 1123: 1097: 1071: 1058:Mark Pilgrim, 1051: 1031: 984: 947:(2): 164–174. 927: 920: 896: 867: 842: 822: 807: 780: 768: 755: 753: 750: 748: 747: 741: 732: 726: 717: 711: 705: 699: 693: 687: 680: 678: 675: 599: 596: 571: 568: 533: 530: 486: 483: 435:rugby football 390: 387: 371: 370: 367: 361: 353: 350: 302: 301: 285: 281: 278: 262: 259: 164: 161: 128: 127: 42: 40: 33: 26: 9: 6: 4: 3: 2: 1253: 1242: 1239: 1237: 1234: 1232: 1229: 1227: 1224: 1222: 1219: 1217: 1214: 1212: 1209: 1207: 1204: 1202: 1199: 1198: 1196: 1187: 1184: 1183: 1163: 1159: 1153: 1137: 1133: 1127: 1111: 1107: 1101: 1085: 1081: 1075: 1068: 1064: 1061: 1055: 1048: 1044: 1041: 1035: 1027: 1023: 1019: 1015: 1011: 1007: 1003: 999: 995: 988: 980: 976: 971: 966: 962: 958: 954: 950: 946: 942: 938: 931: 923: 917: 913: 912: 907: 900: 885: 881: 877: 871: 856: 852: 846: 839: 835: 832: 826: 818: 814: 810: 804: 800: 799: 794: 787: 785: 777: 772: 765: 760: 756: 745: 742: 736: 733: 730: 727: 721: 718: 715: 712: 709: 706: 703: 700: 697: 694: 691: 688: 685: 682: 681: 674: 672: 668: 664: 659: 658:, and so on. 657: 653: 649: 645: 641: 636: 631: 629: 623: 621: 617: 613: 609: 605: 595: 593: 589: 585: 581: 577: 567: 564: 559: 555: 551: 547: 543: 539: 529: 525: 523: 518: 514: 510: 506: 504: 498: 494: 492: 482: 478: 476: 470: 468: 464: 460: 456: 452: 448: 444: 440: 436: 432: 428: 427: 422: 418: 414: 411:for example. 410: 409: 404: 400: 396: 386: 384: 380: 375: 368: 365: 362: 359: 358: 357: 349: 347: 343: 337: 334: 332: 328: 324: 320: 316: 312: 307: 299: 295: 291: 286: 282: 279: 276: 272: 271: 270: 268: 258: 256: 252: 251:swimming pool 248: 244: 239: 237: 233: 229: 224: 222: 218: 214: 210: 206: 201: 198: 194: 190: 186: 182: 178: 174: 170: 160: 158: 154: 150: 146: 142: 138: 134: 124: 121: 113: 102: 99: 95: 92: 88: 85: 81: 78: 74: 71: –  70: 66: 65:Find sources: 59: 55: 49: 48: 43:This article 41: 37: 32: 31: 19: 1226:Semantic Web 1166:. Retrieved 1152: 1140:. Retrieved 1126: 1114:. Retrieved 1100: 1088:. Retrieved 1074: 1054: 1034: 1001: 997: 987: 944: 940: 930: 910: 899: 888:. Retrieved 879: 870: 859:. Retrieved 845: 829:Amy Warner, 825: 797: 771: 759: 714:Nomenclature 660: 635:Semantic Web 632: 630:principles. 624: 604:Semantic Web 601: 592:organization 573: 542:bibliography 535: 532:Applications 526: 519: 515: 511: 507: 499: 495: 488: 479: 471: 462: 443:rugby league 430: 424: 412: 407: 392: 382: 376: 372: 355: 338: 335: 303: 294:Broader term 264: 254: 253:or the game 250: 246: 240: 235: 231: 228:user warrant 227: 225: 220: 216: 212: 208: 202: 166: 132: 131: 116: 107: 97: 90: 83: 76: 64: 52:Please help 47:verification 44: 1241:Identifiers 729:Terminology 652:Dublin Core 612:Dublin Core 439:rugby union 417:team sports 333:Thesaurus. 298:Narrow term 1195:Categories 890:2018-05-22 861:2014-09-15 752:References 656:Schema.org 648:Schema.org 644:Schema.org 395:irrelevant 389:Advantages 243:homographs 217:automobile 185:homographs 151:and other 149:taxonomies 80:newspapers 1026:233205938 1018:0163-9374 961:1067-5027 817:456174098 563:full text 209:cockroach 197:bijection 193:polysemes 139:schemes, 110:June 2012 1168:13 March 1162:Archived 1142:13 March 1136:Archived 1116:13 March 1110:Archived 1090:13 March 1084:Archived 1063:Archived 1043:Archived 1040:Metacrap 979:17213502 884:Archived 855:Archived 834:Archived 677:See also 608:metadata 588:document 485:Problems 475:relevant 463:football 431:football 413:Football 408:football 346:metadata 329:and the 267:thesauri 189:synonyms 145:thesauri 970:2213470 667:JSON-LD 467:tagging 383:indexed 296:" and " 219:versus 211:versus 177:phrases 94:scholar 1024:  1016:  977:  967:  959:  918:  815:  805:  491:recall 457:, and 426:soccer 321:, and 284:order. 96:  89:  82:  75:  67:  1022:S2CID 665:, or 323:Sears 306:terms 195:by a 173:words 101:JSTOR 87:books 1170:2015 1144:2015 1118:2015 1092:2015 1014:ISSN 975:PMID 957:ISSN 916:ISBN 813:OCLC 803:ISBN 640:FOAF 582:and 558:X.25 441:and 342:tags 331:ERIC 304:The 255:pool 247:pool 191:and 175:and 73:news 1006:doi 965:PMC 949:doi 671:RDF 620:PSH 618:is 590:or 445:), 221:car 181:tag 167:In 56:by 1197:: 1160:. 1134:. 1108:. 1082:. 1020:. 1012:. 1002:59 1000:. 996:. 973:. 963:. 955:. 945:14 943:. 939:. 908:. 882:. 878:. 811:. 795:. 783:^ 622:. 453:, 449:, 348:. 313:, 300:". 187:, 147:, 143:, 1172:. 1146:. 1120:. 1094:. 1069:. 1049:. 1028:. 1008:: 981:. 951:: 924:. 893:. 864:. 840:. 819:. 638:( 437:( 123:) 117:( 112:) 108:( 98:· 91:· 84:· 77:· 50:. 20:)

Index

Controlled vocabularies

verification
improve this article
adding citations to reliable sources
"Controlled vocabulary"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
subject indexing
subject headings
thesauri
taxonomies
knowledge organization systems
natural language
library and information science
words
phrases
tag
homographs
synonyms
polysemes
bijection
Library of Congress Subject Headings
homographs
thesauri
library catalogs

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.