317:
between them make it very easy to browse the thesaurus, selecting useful terms for a search. When a single term could have more than one meaning, like tables (furniture) or tables (data), these are listed separately so that the user can choose which concept to search for and avoid retrieving irrelevant results. For any one concept, all known synonyms are listed, such as "mad cow disease", "bovine spongiform encephalopathy", "BSE", etc. The idea is to guide all the indexers and all the searchers to use the same term for the same concept, so that search results will be as complete as possible. If the thesaurus is multilingual, equivalent terms in other languages are shown too. Following international standards, concepts are generally arranged hierarchically within facets or grouped by themes or topics. Unlike a general thesaurus that is used for literary purposes, information retrieval thesauri typically focus on one discipline, subject or field of study.
265:
181:
168:
seen from the titles of the latest ISO and NISO standards, there is a recognition that thesauri need to work in harness with other forms of vocabulary or knowledge organization system, such as subject heading schemes, classification schemes, taxonomies and ontologies. The official website for ISO 25964 gives more information, including a reading list.
237:. This means that the semantic conceptual expressions of information bearing entities are easier to locate due to uniformity of language. Additionally, a thesaurus is used for maintaining a hierarchical listing of terms, usually single words or bound phrases, that aid the indexer in narrowing the terms and limiting semantic ambiguity.
47:, the international standard for information retrieval thesauri, defines a thesaurus as a “controlled and structured vocabulary in which concepts are represented by terms, organized so that relationships between concepts are made explicit, and preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms.”
167:
The most clearly visible trend across this history of thesaurus development has been from the context of small-scale isolation to a networked world. Access to information was notably enhanced when thesauri crossed the divide between monolingual and multilingual applications. More recently, as can be
59:
Wherever there have been large collections of information, whether on paper or in computers, scholars have faced a challenge in pinpointing the items they seek. The use of classification schemes to arrange the documents in order was only a partial solution. Another approach was to index the contents
95:
that have guided thesaurus construction ever since. Hundreds of thesauri have been produced since then, perhaps thousands. The most notable innovations since TEST have been: (a) Extension from monolingual to multilingual capability; and (b) Addition of a conceptually organized display to the basic
39:
in the indexing of content objects. A thesaurus serves to minimise semantic ambiguity by ensuring uniformity and consistency in the storage and retrieval of the manifestations of content objects. ANSI/NISO Z39.19-2005 defines a content object as "any item that is to be described for inclusion in an
316:
Information retrieval thesauri are formally organized so that existing relationships between concepts are made clear. For example, "citrus fruits" might be linked to the broader concept of "fruits" and to the narrower ones of "oranges", "lemons", etc. When the terms are displayed online, the links
50:
A thesaurus is composed by at least three elements: 1-a list of words (or terms), 2-the relationship amongst the words (or terms), indicated by their hierarchical relative position (e.g. parent/broader term; child/narrower term, synonym, etc.), 3-a set of rules on how to use the thesaurus.
232:
In information retrieval, a thesaurus can be used as a form of controlled vocabulary to aid in the indexing of appropriate metadata for information bearing entities. A thesaurus helps with expressing the manifestations of a concept in a prescribed way, to aid in improving
40:
information retrieval system, website, or other source of information". The thesaurus aids the assignment of preferred terms to convey semantic metadata associated with the content object.
91:(TEST) published jointly by the Engineers Joint Council and the US Department of Defense in 1967. TEST did more than just serve as an example; its Appendix 1 presented
43:
A thesaurus serves to guide both an indexer and a searcher in selecting the same preferred term or combination of preferred terms to represent a given subject.
508:
668:
281:
197:
369:
ANSI & NISO 2005, Guidelines for the
Construction, Format, and Management of Monolingual Controlled Vocabularies, NISO, Maryland, U.S.A, p.12
360:
ANSI & NISO 2005, Guidelines for the
Construction, Format, and Management of Monolingual Controlled Vocabularies, NISO, Maryland, U.S.A, p.11
72:, collected up their index terms in various kinds of list that they called a “thesaurus” (by analogy with the well known thesaurus developed by
466:, vocabularyserver.com — Web application for management formal representations of knowledge, thesauri, taxonomies and multilingual vocabularies
76:). The first such list put seriously to use in information retrieval was the thesaurus developed in 1959 at the E I Dupont de Nemours Company.
646:
87:
of the
American Institute of Chemical Engineers (1961), a descendant of the Dupont thesaurus. More followed, culminating in the influential
481:
1257:
1057:
501:
99:
Here we mention only some of the national and international standards that have built steadily on the basic rules set out in TEST:
1226:
967:
658:
494:
1221:
828:
60:
of the documents using words or terms, rather than classification codes. In the 1940s and 1950s some pioneers, such as
813:
303:
249:
219:
753:
1170:
823:
818:
563:
285:
241:
201:
1087:
808:
780:
419:
From ISO 2788 to ISO 25964: the evolution of thesaurus standards towards interoperability and data modeling
1125:
1110:
1082:
947:
942:
517:
862:
833:
611:
451:
17:
391:
Aitchison, J. and Dextre Clarke, S. The thesaurus: a historical viewpoint, with a look to the future.
252:, is used to index and/or search its AGRIS database of worldwide literature on agricultural research.
705:
558:
1231:
1155:
887:
843:
728:
626:
1135:
1105:
772:
435:
ISO 25964 – the international standard for thesauri and interoperability with other vocabularies.
274:
190:
606:
992:
685:
663:
653:
621:
596:
146:
Guidelines for the construction, format, and management of monolingual controlled vocabularies
65:
852:
326:
244:, for example, is used by countless museums around the world to catalogue their collections.
32:
24:
1205:
881:
857:
710:
418:
234:
8:
1185:
1115:
1072:
1028:
800:
790:
785:
673:
404:
Krooks, D.A. and
Lancaster, F.W. The evolution of guidelines for thesaurus construction.
1195:
1067:
932:
695:
678:
536:
460:– the international standard for thesauri and interoperability with other vocabularies
1262:
1200:
912:
720:
631:
1077:
962:
937:
738:
641:
1189:
1150:
1145:
1013:
743:
616:
591:
573:
69:
897:
877:
601:
142:
Guidelines for the construction, format, and management of monolingual thesauri
486:
1251:
1160:
972:
952:
733:
61:
1140:
758:
478:, Basic Register of Thesauri, Ontologies & Classifications, bartoc.org
1097:
977:
690:
583:
531:
135:
Guidelines for the establishment and development of multilingual thesauri
128:
American
National Standard for Thesaurus Structure, Construction, and Use
73:
457:
434:
121:
Guidelines for the establishment and development of monolingual thesauri
114:
Guidelines for the establishment and development of monolingual thesauri
107:
Guidelines for the establishment and development of monolingual thesauri
700:
288: in this section. Unsourced material may be challenged and removed.
204: in this section. Unsourced material may be challenged and removed.
568:
336:
331:
44:
469:
378:
Roberts, N. The pre-history of the information retrieval thesaurus.
264:
180:
1043:
1023:
1008:
987:
957:
902:
867:
748:
463:
36:
1180:
1038:
1018:
892:
636:
551:
245:
546:
541:
341:
103:
1236:
872:
130:. 1974 (revised 1980 and superseded by ANSI/NISO Z39.19-1993)
1033:
475:
79:
The first two of these lists to be published were the
153:
Thesauri and interoperability with other vocabularies
109:. 1970 (followed by later editions in 1971 and 1981)
438:National Information Standards Organization, 2013.
35:that seeks to dictate semantic manifestations of
1249:
719:
516:
417:Dextre Clarke, Stella G. and Zeng, Marcia Lei.
482:Wikiversity: Thesaurus (information retrieval)
502:
452:Thesauri: Introduction and Recent Development
89:Thesaurus of Engineering and Scientific Terms
509:
495:
393:Cataloging & Classification Quarterly
304:Learn how and when to remove this message
220:Learn how and when to remove this message
161:Interoperability with other vocabularies
1250:
490:
968:Simple Knowledge Organization System
286:adding citations to reliable sources
259:
202:adding citations to reliable sources
175:
116:. 1972 (followed by later editions)
13:
157:Thesauri for information retrieval
31:(plural: "thesauri") is a form of
14:
1274:
983:Thesaurus (information retrieval)
445:
250:Food and Agriculture Organization
144:. 1993 (revised 2005 and renamed
1258:Information retrieval techniques
263:
242:Art & Architecture Thesaurus
179:
422:Information standards quarterly
273:needs additional citations for
189:needs additional citations for
93:Thesaurus rules and conventions
564:Natural language understanding
427:
411:
398:
385:
372:
363:
354:
85:Chemical Engineering Thesaurus
81:Thesaurus of ASTIA Descriptors
1:
1088:Optical character recognition
347:
781:Multi-document summarization
255:
248:, the thesaurus of the UN's
7:
1111:Latent Dirichlet allocation
1083:Natural language generation
948:Machine-readable dictionary
943:Linguistic Linked Open Data
518:Natural language processing
458:Official site for ISO 25964
320:
96:alphabetical presentation.
10:
1279:
863:Explicit semantic analysis
612:Deep linguistic processing
171:
159:) published 2011; Part 2 (
54:
18:Thesaurus (disambiguation)
15:
1214:
1169:
1124:
1096:
1056:
1001:
923:
911:
842:
799:
771:
706:Word-sense disambiguation
582:
559:Computational linguistics
524:
408:, 43(4), 1993, p.326-342.
395:, 37 (3/4), 2004, p.5-21.
382:, 40(4), 1984, p.271-285.
1232:Natural Language Toolkit
1156:Pronunciation assessment
1058:Automatic identification
888:Latent semantic analysis
844:Distributional semantics
729:Compound-term processing
627:Named-entity recognition
380:Journal of Documentation
1136:Automated essay scoring
1106:Document classification
773:Automatic summarization
472:, taxonomywarehouse.com
424:, 24(1), 2012, p.20-26.
993:Universal Dependencies
686:Terminology extraction
669:Semantic decomposition
664:Semantic role labeling
654:Part-of-speech tagging
622:Information extraction
607:Coreference resolution
597:Collocation extraction
64:, Charles L. Bernier,
754:Sentence segmentation
454:, books.infotoday.com
327:Controlled vocabulary
123:. 1974 (revised 1986)
33:controlled vocabulary
25:information retrieval
1206:Voice user interface
917:datasets and corpora
858:Document-term matrix
711:Word-sense induction
282:improve this article
235:precision and recall
198:improve this article
16:For other uses, see
1186:Interactive fiction
1116:Pachinko allocation
1073:Speech segmentation
1029:Google Ngram Viewer
801:Machine translation
791:Text simplification
786:Sentence extraction
674:Semantic similarity
1196:Question answering
1068:Speech recognition
933:Corpus linguistics
913:Language resources
696:Textual entailment
679:Sentiment analysis
470:Taxonomy Warehouse
23:In the context of
1245:
1244:
1201:Virtual assistant
1126:Computer-assisted
1052:
1051:
809:Computer-assisted
767:
766:
759:Word segmentation
721:Text segmentation
659:Semantic analysis
647:Syntactic parsing
632:Ontology learning
314:
313:
306:
230:
229:
222:
163:) published 2013.
140:ANSI/NISO Z39.19
1270:
1222:Formal semantics
1171:Natural language
1078:Speech synthesis
1060:and data capture
963:Semantic network
938:Lexical resource
921:
920:
739:Lexical analysis
717:
716:
642:Semantic parsing
511:
504:
497:
488:
487:
439:
431:
425:
415:
409:
402:
396:
389:
383:
376:
370:
367:
361:
358:
309:
302:
298:
295:
289:
267:
260:
225:
218:
214:
211:
205:
183:
176:
1278:
1277:
1273:
1272:
1271:
1269:
1268:
1267:
1248:
1247:
1246:
1241:
1210:
1190:Syntax guessing
1172:
1165:
1151:Predictive text
1146:Grammar checker
1127:
1120:
1092:
1059:
1048:
1014:Bank of English
997:
925:
916:
907:
838:
795:
763:
715:
617:Distant reading
592:Argument mining
578:
574:Text processing
520:
515:
448:
443:
442:
432:
428:
416:
412:
403:
399:
390:
386:
377:
373:
368:
364:
359:
355:
350:
323:
310:
299:
293:
290:
279:
258:
226:
215:
209:
206:
195:
174:
83:(1960) and the
70:Hans Peter Luhn
57:
21:
12:
11:
5:
1276:
1266:
1265:
1260:
1243:
1242:
1240:
1239:
1234:
1229:
1224:
1218:
1216:
1212:
1211:
1209:
1208:
1203:
1198:
1193:
1183:
1177:
1175:
1173:user interface
1167:
1166:
1164:
1163:
1158:
1153:
1148:
1143:
1138:
1132:
1130:
1122:
1121:
1119:
1118:
1113:
1108:
1102:
1100:
1094:
1093:
1091:
1090:
1085:
1080:
1075:
1070:
1064:
1062:
1054:
1053:
1050:
1049:
1047:
1046:
1041:
1036:
1031:
1026:
1021:
1016:
1011:
1005:
1003:
999:
998:
996:
995:
990:
985:
980:
975:
970:
965:
960:
955:
950:
945:
940:
935:
929:
927:
918:
909:
908:
906:
905:
900:
898:Word embedding
895:
890:
885:
878:Language model
875:
870:
865:
860:
855:
849:
847:
840:
839:
837:
836:
831:
829:Transfer-based
826:
821:
816:
811:
805:
803:
797:
796:
794:
793:
788:
783:
777:
775:
769:
768:
765:
764:
762:
761:
756:
751:
746:
741:
736:
731:
725:
723:
714:
713:
708:
703:
698:
693:
688:
682:
681:
676:
671:
666:
661:
656:
651:
650:
649:
644:
634:
629:
624:
619:
614:
609:
604:
602:Concept mining
599:
594:
588:
586:
580:
579:
577:
576:
571:
566:
561:
556:
555:
554:
549:
539:
534:
528:
526:
522:
521:
514:
513:
506:
499:
491:
485:
484:
479:
473:
467:
461:
455:
447:
446:External links
444:
441:
440:
426:
410:
397:
384:
371:
362:
352:
351:
349:
346:
345:
344:
339:
334:
329:
322:
319:
312:
311:
270:
268:
257:
254:
228:
227:
186:
184:
173:
170:
165:
164:
149:
138:
131:
124:
117:
110:
56:
53:
9:
6:
4:
3:
2:
1275:
1264:
1261:
1259:
1256:
1255:
1253:
1238:
1235:
1233:
1230:
1228:
1227:Hallucination
1225:
1223:
1220:
1219:
1217:
1213:
1207:
1204:
1202:
1199:
1197:
1194:
1191:
1187:
1184:
1182:
1179:
1178:
1176:
1174:
1168:
1162:
1161:Spell checker
1159:
1157:
1154:
1152:
1149:
1147:
1144:
1142:
1139:
1137:
1134:
1133:
1131:
1129:
1123:
1117:
1114:
1112:
1109:
1107:
1104:
1103:
1101:
1099:
1095:
1089:
1086:
1084:
1081:
1079:
1076:
1074:
1071:
1069:
1066:
1065:
1063:
1061:
1055:
1045:
1042:
1040:
1037:
1035:
1032:
1030:
1027:
1025:
1022:
1020:
1017:
1015:
1012:
1010:
1007:
1006:
1004:
1000:
994:
991:
989:
986:
984:
981:
979:
976:
974:
973:Speech corpus
971:
969:
966:
964:
961:
959:
956:
954:
953:Parallel text
951:
949:
946:
944:
941:
939:
936:
934:
931:
930:
928:
922:
919:
914:
910:
904:
901:
899:
896:
894:
891:
889:
886:
883:
879:
876:
874:
871:
869:
866:
864:
861:
859:
856:
854:
851:
850:
848:
845:
841:
835:
832:
830:
827:
825:
822:
820:
817:
815:
814:Example-based
812:
810:
807:
806:
804:
802:
798:
792:
789:
787:
784:
782:
779:
778:
776:
774:
770:
760:
757:
755:
752:
750:
747:
745:
744:Text chunking
742:
740:
737:
735:
734:Lemmatisation
732:
730:
727:
726:
724:
722:
718:
712:
709:
707:
704:
702:
699:
697:
694:
692:
689:
687:
684:
683:
680:
677:
675:
672:
670:
667:
665:
662:
660:
657:
655:
652:
648:
645:
643:
640:
639:
638:
635:
633:
630:
628:
625:
623:
620:
618:
615:
613:
610:
608:
605:
603:
600:
598:
595:
593:
590:
589:
587:
585:
584:Text analysis
581:
575:
572:
570:
567:
565:
562:
560:
557:
553:
550:
548:
545:
544:
543:
540:
538:
535:
533:
530:
529:
527:
525:General terms
523:
519:
512:
507:
505:
500:
498:
493:
492:
489:
483:
480:
477:
474:
471:
468:
465:
462:
459:
456:
453:
450:
449:
437:
436:
430:
423:
420:
414:
407:
401:
394:
388:
381:
375:
366:
357:
353:
343:
340:
338:
335:
333:
330:
328:
325:
324:
318:
308:
305:
297:
287:
283:
277:
276:
271:This section
269:
266:
262:
261:
253:
251:
247:
243:
238:
236:
224:
221:
213:
203:
199:
193:
192:
187:This section
185:
182:
178:
177:
169:
162:
158:
154:
150:
147:
143:
139:
136:
132:
129:
125:
122:
118:
115:
111:
108:
105:
102:
101:
100:
97:
94:
90:
86:
82:
77:
75:
71:
67:
66:Evan J. Crane
63:
62:Calvin Mooers
52:
48:
46:
41:
38:
34:
30:
26:
19:
1141:Concordancer
982:
537:Bag-of-words
433:
429:
421:
413:
405:
400:
392:
387:
379:
374:
365:
356:
315:
300:
291:
280:Please help
275:verification
272:
239:
231:
216:
207:
196:Please help
191:verification
188:
166:
160:
156:
152:
145:
141:
134:
127:
120:
113:
106:
98:
92:
88:
84:
80:
78:
58:
49:
42:
28:
22:
1098:Topic model
978:Text corpus
824:Statistical
691:Text mining
532:AI-complete
74:Peter Roget
1252:Categories
819:Rule-based
701:Truecasing
569:Stop words
348:References
294:March 2016
210:March 2016
155:. Part 1 (
151:ISO 25964
1128:reviewing
926:standards
924:Types and
337:Thesaurus
332:ISO 25964
256:Structure
133:ISO 5964
119:ISO 2788
112:DIN 1463
45:ISO 25964
29:thesaurus
1263:Thesauri
1044:Wikidata
1024:FrameNet
1009:BabelNet
988:Treebank
958:PropBank
903:Word2vec
868:fastText
749:Stemming
464:TemaTres
321:See also
37:metadata
1215:Related
1181:Chatbot
1039:WordNet
1019:DBpedia
893:Seq2seq
637:Parsing
552:Trigram
246:AGROVOC
172:Purpose
55:History
1188:(c.f.
846:models
834:Neural
547:Bigram
542:n-gram
476:BARTOC
342:BARTOC
137:. 1985
104:UNESCO
1237:spaCy
882:large
873:GloVe
406:Libri
126:ANSI
1002:Data
853:BERT
240:The
68:and
27:, a
1034:UBY
284:by
200:by
1254::
148:.)
1192:)
915:,
884:)
880:(
510:e
503:t
496:v
307:)
301:(
296:)
292:(
278:.
223:)
217:(
212:)
208:(
194:.
20:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.