Knowledge

Thesaurus (information retrieval)

Source đź“ť

317:
between them make it very easy to browse the thesaurus, selecting useful terms for a search. When a single term could have more than one meaning, like tables (furniture) or tables (data), these are listed separately so that the user can choose which concept to search for and avoid retrieving irrelevant results. For any one concept, all known synonyms are listed, such as "mad cow disease", "bovine spongiform encephalopathy", "BSE", etc. The idea is to guide all the indexers and all the searchers to use the same term for the same concept, so that search results will be as complete as possible. If the thesaurus is multilingual, equivalent terms in other languages are shown too. Following international standards, concepts are generally arranged hierarchically within facets or grouped by themes or topics. Unlike a general thesaurus that is used for literary purposes, information retrieval thesauri typically focus on one discipline, subject or field of study.
265: 181: 168:
seen from the titles of the latest ISO and NISO standards, there is a recognition that thesauri need to work in harness with other forms of vocabulary or knowledge organization system, such as subject heading schemes, classification schemes, taxonomies and ontologies. The official website for ISO 25964 gives more information, including a reading list.
237:. This means that the semantic conceptual expressions of information bearing entities are easier to locate due to uniformity of language. Additionally, a thesaurus is used for maintaining a hierarchical listing of terms, usually single words or bound phrases, that aid the indexer in narrowing the terms and limiting semantic ambiguity. 47:, the international standard for information retrieval thesauri, defines a thesaurus as a “controlled and structured vocabulary in which concepts are represented by terms, organized so that relationships between concepts are made explicit, and preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms.” 167:
The most clearly visible trend across this history of thesaurus development has been from the context of small-scale isolation to a networked world. Access to information was notably enhanced when thesauri crossed the divide between monolingual and multilingual applications. More recently, as can be
59:
Wherever there have been large collections of information, whether on paper or in computers, scholars have faced a challenge in pinpointing the items they seek. The use of classification schemes to arrange the documents in order was only a partial solution. Another approach was to index the contents
95:
that have guided thesaurus construction ever since. Hundreds of thesauri have been produced since then, perhaps thousands. The most notable innovations since TEST have been: (a) Extension from monolingual to multilingual capability; and (b) Addition of a conceptually organized display to the basic
39:
in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an
316:
Information retrieval thesauri are formally organized so that existing relationships between concepts are made clear. For example, "citrus fruits" might be linked to the broader concept of "fruits" and to the narrower ones of "oranges", "lemons", etc. When the terms are displayed online, the links
50:
A thesaurus is composed by at least three elements: 1-a list of words (or terms), 2-the relationship amongst the words (or terms), indicated by their hierarchical relative position (e.g. parent/broader term; child/narrower term, synonym, etc.), 3-a set of rules on how to use the thesaurus.
232:
In information retrieval, a thesaurus can be used as a form of controlled vocabulary to aid in the indexing of appropriate metadata for information bearing entities. A thesaurus helps with expressing the manifestations of a concept in a prescribed way, to aid in improving
40:
information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.
91:(TEST) published jointly by the Engineers Joint Council and the US Department of Defense in 1967. TEST did more than just serve as an example; its Appendix 1 presented 43:
A thesaurus serves to guide both an indexer and a searcher in selecting the same preferred term or combination of preferred terms to represent a given subject.
508: 668: 281: 197: 369:
ANSI & NISO 2005, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, NISO, Maryland, U.S.A, p.12
360:
ANSI & NISO 2005, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, NISO, Maryland, U.S.A, p.11
72:, collected up their index terms in various kinds of list that they called a “thesaurus” (by analogy with the well known thesaurus developed by 466:, vocabularyserver.com — Web application for management formal representations of knowledge, thesauri, taxonomies and multilingual vocabularies 76:). The first such list put seriously to use in information retrieval was the thesaurus developed in 1959 at the E I Dupont de Nemours Company. 646: 87:
of the American Institute of Chemical Engineers (1961), a descendant of the Dupont thesaurus. More followed, culminating in the influential
481: 1257: 1057: 501: 99:
Here we mention only some of the national and international standards that have built steadily on the basic rules set out in TEST:
1226: 967: 658: 494: 1221: 828: 60:
of the documents using words or terms, rather than classification codes. In the 1940s and 1950s some pioneers, such as
813: 303: 249: 219: 753: 1170: 823: 818: 563: 285: 241: 201: 1087: 808: 780: 419:
From ISO 2788 to ISO 25964: the evolution of thesaurus standards towards interoperability and data modeling
1125: 1110: 1082: 947: 942: 517: 862: 833: 611: 451: 17: 391:
Aitchison, J. and Dextre Clarke, S. The thesaurus: a historical viewpoint, with a look to the future.
252:, is used to index and/or search its AGRIS database of worldwide literature on agricultural research. 705: 558: 1231: 1155: 887: 843: 728: 626: 1135: 1105: 772: 435:
ISO 25964 – the international standard for thesauri and interoperability with other vocabularies.
274: 190: 606: 992: 685: 663: 653: 621: 596: 146:
Guidelines for the construction, format, and management of monolingual controlled vocabularies
65: 852: 326: 244:, for example, is used by countless museums around the world to catalogue their collections. 32: 24: 1205: 881: 857: 710: 418: 234: 8: 1185: 1115: 1072: 1028: 800: 790: 785: 673: 404:
Krooks, D.A. and Lancaster, F.W. The evolution of guidelines for thesaurus construction.
1195: 1067: 932: 695: 678: 536: 460:– the international standard for thesauri and interoperability with other vocabularies 1262: 1200: 912: 720: 631: 1077: 962: 937: 738: 641: 1189: 1150: 1145: 1013: 743: 616: 591: 573: 69: 897: 877: 601: 142:
Guidelines for the construction, format, and management of monolingual thesauri
486: 1251: 1160: 972: 952: 733: 61: 1140: 758: 478:, Basic Register of Thesauri, Ontologies & Classifications, bartoc.org 1097: 977: 690: 583: 531: 135:
Guidelines for the establishment and development of multilingual thesauri
128:
American National Standard for Thesaurus Structure, Construction, and Use
73: 457: 434: 121:
Guidelines for the establishment and development of monolingual thesauri
114:
Guidelines for the establishment and development of monolingual thesauri
107:
Guidelines for the establishment and development of monolingual thesauri
700: 288: in this section. Unsourced material may be challenged and removed. 204: in this section. Unsourced material may be challenged and removed. 568: 336: 331: 44: 469: 378:
Roberts, N. The pre-history of the information retrieval thesaurus.
264: 180: 1043: 1023: 1008: 987: 957: 902: 867: 748: 463: 36: 1180: 1038: 1018: 892: 636: 551: 245: 546: 541: 341: 103: 1236: 872: 130:. 1974 (revised 1980 and superseded by ANSI/NISO Z39.19-1993) 1033: 475: 79:
The first two of these lists to be published were the
153:
Thesauri and interoperability with other vocabularies
109:. 1970 (followed by later editions in 1971 and 1981) 438:National Information Standards Organization, 2013. 35:that seeks to dictate semantic manifestations of 1249: 719: 516: 417:Dextre Clarke, Stella G. and Zeng, Marcia Lei. 482:Wikiversity: Thesaurus (information retrieval) 502: 452:Thesauri: Introduction and Recent Development 89:Thesaurus of Engineering and Scientific Terms 509: 495: 393:Cataloging & Classification Quarterly 304:Learn how and when to remove this message 220:Learn how and when to remove this message 161:Interoperability with other vocabularies 1250: 490: 968:Simple Knowledge Organization System 286:adding citations to reliable sources 259: 202:adding citations to reliable sources 175: 116:. 1972 (followed by later editions) 13: 157:Thesauri for information retrieval 31:(plural: "thesauri") is a form of 14: 1274: 983:Thesaurus (information retrieval) 445: 250:Food and Agriculture Organization 144:. 1993 (revised 2005 and renamed 1258:Information retrieval techniques 263: 242:Art & Architecture Thesaurus 179: 422:Information standards quarterly 273:needs additional citations for 189:needs additional citations for 93:Thesaurus rules and conventions 564:Natural language understanding 427: 411: 398: 385: 372: 363: 354: 85:Chemical Engineering Thesaurus 81:Thesaurus of ASTIA Descriptors 1: 1088:Optical character recognition 347: 781:Multi-document summarization 255: 248:, the thesaurus of the UN's 7: 1111:Latent Dirichlet allocation 1083:Natural language generation 948:Machine-readable dictionary 943:Linguistic Linked Open Data 518:Natural language processing 458:Official site for ISO 25964 320: 96:alphabetical presentation. 10: 1279: 863:Explicit semantic analysis 612:Deep linguistic processing 171: 159:) published 2011; Part 2 ( 54: 18:Thesaurus (disambiguation) 15: 1214: 1169: 1124: 1096: 1056: 1001: 923: 911: 842: 799: 771: 706:Word-sense disambiguation 582: 559:Computational linguistics 524: 408:, 43(4), 1993, p.326-342. 395:, 37 (3/4), 2004, p.5-21. 382:, 40(4), 1984, p.271-285. 1232:Natural Language Toolkit 1156:Pronunciation assessment 1058:Automatic identification 888:Latent semantic analysis 844:Distributional semantics 729:Compound-term processing 627:Named-entity recognition 380:Journal of Documentation 1136:Automated essay scoring 1106:Document classification 773:Automatic summarization 472:, taxonomywarehouse.com 424:, 24(1), 2012, p.20-26. 993:Universal Dependencies 686:Terminology extraction 669:Semantic decomposition 664:Semantic role labeling 654:Part-of-speech tagging 622:Information extraction 607:Coreference resolution 597:Collocation extraction 64:, Charles L. Bernier, 754:Sentence segmentation 454:, books.infotoday.com 327:Controlled vocabulary 123:. 1974 (revised 1986) 33:controlled vocabulary 25:information retrieval 1206:Voice user interface 917:datasets and corpora 858:Document-term matrix 711:Word-sense induction 282:improve this article 235:precision and recall 198:improve this article 16:For other uses, see 1186:Interactive fiction 1116:Pachinko allocation 1073:Speech segmentation 1029:Google Ngram Viewer 801:Machine translation 791:Text simplification 786:Sentence extraction 674:Semantic similarity 1196:Question answering 1068:Speech recognition 933:Corpus linguistics 913:Language resources 696:Textual entailment 679:Sentiment analysis 470:Taxonomy Warehouse 23:In the context of 1245: 1244: 1201:Virtual assistant 1126:Computer-assisted 1052: 1051: 809:Computer-assisted 767: 766: 759:Word segmentation 721:Text segmentation 659:Semantic analysis 647:Syntactic parsing 632:Ontology learning 314: 313: 306: 230: 229: 222: 163:) published 2013. 140:ANSI/NISO Z39.19 1270: 1222:Formal semantics 1171:Natural language 1078:Speech synthesis 1060:and data capture 963:Semantic network 938:Lexical resource 921: 920: 739:Lexical analysis 717: 716: 642:Semantic parsing 511: 504: 497: 488: 487: 439: 431: 425: 415: 409: 402: 396: 389: 383: 376: 370: 367: 361: 358: 309: 302: 298: 295: 289: 267: 260: 225: 218: 214: 211: 205: 183: 176: 1278: 1277: 1273: 1272: 1271: 1269: 1268: 1267: 1248: 1247: 1246: 1241: 1210: 1190:Syntax guessing 1172: 1165: 1151:Predictive text 1146:Grammar checker 1127: 1120: 1092: 1059: 1048: 1014:Bank of English 997: 925: 916: 907: 838: 795: 763: 715: 617:Distant reading 592:Argument mining 578: 574:Text processing 520: 515: 448: 443: 442: 432: 428: 416: 412: 403: 399: 390: 386: 377: 373: 368: 364: 359: 355: 350: 323: 310: 299: 293: 290: 279: 258: 226: 215: 209: 206: 195: 174: 83:(1960) and the 70:Hans Peter Luhn 57: 21: 12: 11: 5: 1276: 1266: 1265: 1260: 1243: 1242: 1240: 1239: 1234: 1229: 1224: 1218: 1216: 1212: 1211: 1209: 1208: 1203: 1198: 1193: 1183: 1177: 1175: 1173:user interface 1167: 1166: 1164: 1163: 1158: 1153: 1148: 1143: 1138: 1132: 1130: 1122: 1121: 1119: 1118: 1113: 1108: 1102: 1100: 1094: 1093: 1091: 1090: 1085: 1080: 1075: 1070: 1064: 1062: 1054: 1053: 1050: 1049: 1047: 1046: 1041: 1036: 1031: 1026: 1021: 1016: 1011: 1005: 1003: 999: 998: 996: 995: 990: 985: 980: 975: 970: 965: 960: 955: 950: 945: 940: 935: 929: 927: 918: 909: 908: 906: 905: 900: 898:Word embedding 895: 890: 885: 878:Language model 875: 870: 865: 860: 855: 849: 847: 840: 839: 837: 836: 831: 829:Transfer-based 826: 821: 816: 811: 805: 803: 797: 796: 794: 793: 788: 783: 777: 775: 769: 768: 765: 764: 762: 761: 756: 751: 746: 741: 736: 731: 725: 723: 714: 713: 708: 703: 698: 693: 688: 682: 681: 676: 671: 666: 661: 656: 651: 650: 649: 644: 634: 629: 624: 619: 614: 609: 604: 602:Concept mining 599: 594: 588: 586: 580: 579: 577: 576: 571: 566: 561: 556: 555: 554: 549: 539: 534: 528: 526: 522: 521: 514: 513: 506: 499: 491: 485: 484: 479: 473: 467: 461: 455: 447: 446:External links 444: 441: 440: 426: 410: 397: 384: 371: 362: 352: 351: 349: 346: 345: 344: 339: 334: 329: 322: 319: 312: 311: 270: 268: 257: 254: 228: 227: 186: 184: 173: 170: 165: 164: 149: 138: 131: 124: 117: 110: 56: 53: 9: 6: 4: 3: 2: 1275: 1264: 1261: 1259: 1256: 1255: 1253: 1238: 1235: 1233: 1230: 1228: 1227:Hallucination 1225: 1223: 1220: 1219: 1217: 1213: 1207: 1204: 1202: 1199: 1197: 1194: 1191: 1187: 1184: 1182: 1179: 1178: 1176: 1174: 1168: 1162: 1161:Spell checker 1159: 1157: 1154: 1152: 1149: 1147: 1144: 1142: 1139: 1137: 1134: 1133: 1131: 1129: 1123: 1117: 1114: 1112: 1109: 1107: 1104: 1103: 1101: 1099: 1095: 1089: 1086: 1084: 1081: 1079: 1076: 1074: 1071: 1069: 1066: 1065: 1063: 1061: 1055: 1045: 1042: 1040: 1037: 1035: 1032: 1030: 1027: 1025: 1022: 1020: 1017: 1015: 1012: 1010: 1007: 1006: 1004: 1000: 994: 991: 989: 986: 984: 981: 979: 976: 974: 973:Speech corpus 971: 969: 966: 964: 961: 959: 956: 954: 953:Parallel text 951: 949: 946: 944: 941: 939: 936: 934: 931: 930: 928: 922: 919: 914: 910: 904: 901: 899: 896: 894: 891: 889: 886: 883: 879: 876: 874: 871: 869: 866: 864: 861: 859: 856: 854: 851: 850: 848: 845: 841: 835: 832: 830: 827: 825: 822: 820: 817: 815: 814:Example-based 812: 810: 807: 806: 804: 802: 798: 792: 789: 787: 784: 782: 779: 778: 776: 774: 770: 760: 757: 755: 752: 750: 747: 745: 744:Text chunking 742: 740: 737: 735: 734:Lemmatisation 732: 730: 727: 726: 724: 722: 718: 712: 709: 707: 704: 702: 699: 697: 694: 692: 689: 687: 684: 683: 680: 677: 675: 672: 670: 667: 665: 662: 660: 657: 655: 652: 648: 645: 643: 640: 639: 638: 635: 633: 630: 628: 625: 623: 620: 618: 615: 613: 610: 608: 605: 603: 600: 598: 595: 593: 590: 589: 587: 585: 584:Text analysis 581: 575: 572: 570: 567: 565: 562: 560: 557: 553: 550: 548: 545: 544: 543: 540: 538: 535: 533: 530: 529: 527: 525:General terms 523: 519: 512: 507: 505: 500: 498: 493: 492: 489: 483: 480: 477: 474: 471: 468: 465: 462: 459: 456: 453: 450: 449: 437: 436: 430: 423: 420: 414: 407: 401: 394: 388: 381: 375: 366: 357: 353: 343: 340: 338: 335: 333: 330: 328: 325: 324: 318: 308: 305: 297: 287: 283: 277: 276: 271:This section 269: 266: 262: 261: 253: 251: 247: 243: 238: 236: 224: 221: 213: 203: 199: 193: 192: 187:This section 185: 182: 178: 177: 169: 162: 158: 154: 150: 147: 143: 139: 136: 132: 129: 125: 122: 118: 115: 111: 108: 105: 102: 101: 100: 97: 94: 90: 86: 82: 77: 75: 71: 67: 66:Evan J. Crane 63: 62:Calvin Mooers 52: 48: 46: 41: 38: 34: 30: 26: 19: 1141:Concordancer 982: 537:Bag-of-words 433: 429: 421: 413: 405: 400: 392: 387: 379: 374: 365: 356: 315: 300: 291: 280:Please help 275:verification 272: 239: 231: 216: 207: 196:Please help 191:verification 188: 166: 160: 156: 152: 145: 141: 134: 127: 120: 113: 106: 98: 92: 88: 84: 80: 78: 58: 49: 42: 28: 22: 1098:Topic model 978:Text corpus 824:Statistical 691:Text mining 532:AI-complete 74:Peter Roget 1252:Categories 819:Rule-based 701:Truecasing 569:Stop words 348:References 294:March 2016 210:March 2016 155:. Part 1 ( 151:ISO 25964 1128:reviewing 926:standards 924:Types and 337:Thesaurus 332:ISO 25964 256:Structure 133:ISO 5964 119:ISO 2788 112:DIN 1463 45:ISO 25964 29:thesaurus 1263:Thesauri 1044:Wikidata 1024:FrameNet 1009:BabelNet 988:Treebank 958:PropBank 903:Word2vec 868:fastText 749:Stemming 464:TemaTres 321:See also 37:metadata 1215:Related 1181:Chatbot 1039:WordNet 1019:DBpedia 893:Seq2seq 637:Parsing 552:Trigram 246:AGROVOC 172:Purpose 55:History 1188:(c.f. 846:models 834:Neural 547:Bigram 542:n-gram 476:BARTOC 342:BARTOC 137:. 1985 104:UNESCO 1237:spaCy 882:large 873:GloVe 406:Libri 126:ANSI 1002:Data 853:BERT 240:The 68:and 27:, a 1034:UBY 284:by 200:by 1254:: 148:.) 1192:) 915:, 884:) 880:( 510:e 503:t 496:v 307:) 301:( 296:) 292:( 278:. 223:) 217:( 212:) 208:( 194:. 20:.

Index

Thesaurus (disambiguation)
information retrieval
controlled vocabulary
metadata
ISO 25964
Calvin Mooers
Evan J. Crane
Hans Peter Luhn
Peter Roget
UNESCO

verification
improve this article
adding citations to reliable sources
Learn how and when to remove this message
precision and recall
Art & Architecture Thesaurus
AGROVOC
Food and Agriculture Organization

verification
improve this article
adding citations to reliable sources
Learn how and when to remove this message
Controlled vocabulary
ISO 25964
Thesaurus
BARTOC
From ISO 2788 to ISO 25964: the evolution of thesaurus standards towards interoperability and data modeling
ISO 25964 – the international standard for thesauri and interoperability with other vocabularies.

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑