Knowledge

Automatic indexing

Source 📝

171:
H.P. Lunh through a series of papers that were published. Lunh proposed that a computer could handle keyword matching, sorting, and content analysis. This was the beginning of Automatic Indexing and the formula to pull keywords from text based on frequency analysis. It was later determined that frequency alone was not sufficient for good descriptors however this began the path to where we are now with Automatic Indexing. This was highlighted by the information explosion, which was predicted in the 1960s and came through the emergence of information technology and the World Wide Web. The prediction was prepared by Mooers where an outline was created with the expected role that computing would have for text processing and information retrieval. This prediction said that machines would be used for storage of documents in large collections and that we would use these machines to run searches. Mooers also predicted the online aspect and retrieval environment for indexing databases. This led Mooers to predict an Induction Inference Machine which would revolutionize indexing. This phenomenon required the development of an indexing system that can cope with the challenge of storing and organizing vast amount of data and can facilitate information access. New electronic hardware further advanced automated indexing since it overcame the barrier imposed by old paper archives, allowing the encoding of information at the molecular level. With this new electronic hardware there were tools developed for assisting users. These were used to manage files and were organized into different categories such as PDM Suites like Outlook or Lotus Note and Mind Mapping Tools such as MindManager and Freemind. These allow users to focus on storage and building a cognitive model. The automatic indexing is also partly driven by the emergence of the field called
162:
challenges and specific problems and involve semantic and syntactic aspects of language. These problems occur based on defined keywords. With these keywords you are able to determine the accuracy of the system based on Hits, Misses, and Noise. These terms relate to exact matches, keywords that a computerized system missed that a human wouldn't, and keywords that the computer selected that a human would not have. The Accuracy statistic based on this should be above 85% for Hits out of 100% for human indexing. This puts Misses and Noise combined to be 15% or less. This scale provides a basis for what is considered a good Automatic Indexing System and shows where problems are being encountered.
192:
portal that is designed to give information about drugs. The website uses MeSH thesaurus to index the scientific articles of the MEDLINE database and the Dublin Core Metadata. The system creates a meta term drug and uses that as search criteria to find all information about a specific drug. The website uses simple and advanced search. The simple search allows you to search by a brand name or by any code given by the drugs. Advanced search allows a more specific search by allowing you enter everything that describes the drug you are looking for.
22: 158:. Natural language systems are used to train a system based on seven different methods to help with this sea of irrelevant information. These methods are Morphological, Lexical, Syntactic, Numerical, Phraseological, Semantic, and Pragmatic. Each of these look and different parts of speed and terms to build a domain for the specific information that is being covered for indexing. This is used in the automated process of indexing. 191:
Automatic Indexing has many practical applications like for instance in the field of medicine. In research published in 2009, researchers talk about how automatic indexing can be used to create an information portal where users can find out reliable information about a drug. CISMeF is one such health
170:
There are scholars who cite that the subject of automatic indexing attracted attention as early as the 1950s, particularly with the demand for faster and more comprehensive access to scientific and engineering literature. This attention in indexing began with text processing between 1957 and 1959 by
149:
depositories. These keywords or language are applied by training a system on the rules that determine what words to match. There are additional parts to this such as syntax, usage, proximity, and other algorithms based on the system and what is required for indexing. This is taken into account using
161:
The automated process can encounter problems and these are primarily caused by two factors: 1) the complexity of the language; and, 2) the lack intuitiveness and the difficulty in extrapolating concepts out of statements on the part of the computing technology. These are primarily linguistic
175:, which steered research that eventually produced techniques such as the application of computer analysis to the structure and meaning of languages. Automatic indexing is further spurred by research and development in the area of 362:
Natural Language and Information Systems: 13th International Conference on Applications of Natural Language to Information Systems, NLDB 2008 London, UK, June 24-27, 2008, Proceedings
322:
Historical Note: The Past Thirty Years in Information Retrieval Salton, Gerard Journal of the American Society for Information Science (1986-1998); Sep 1987; 38, 5; ProQuest pg. 375
150:
Boolean statements to gather and capture the indexing information out of the text. As the number of documents exponentially increases with the proliferation of the
458:
Sakji, Saoussen; Letord, Catherine; Dahamna, Badisse; Kergourlay, Ivan; Pereira, Suzanne; Joubert, Michel; Darmoni, Stéfan (2009).
389:
Secrets of the Super Net Searchers: The Reflections, Revelations, and Hard-won Wisdom of 35 of the World's Top Internet Researchers
86: 58: 369: 154:, automatic indexing will become essential to maintaining the ability to find relevant information in a sea of irrelevant 65: 39: 344: 284: 105: 72: 442: 402: 54: 43: 503: 172: 176: 32: 79: 394: 387: 130: 8: 146: 134: 479: 471: 438: 398: 365: 340: 280: 180: 419: 252: 201: 497: 475: 459: 483: 214: 155: 360:
Kapetanios, Epaminondas; Sugumaran, Vijayan; Spiliopoulou, Myra (2008).
145:
and using those controlled terms to quickly and effectively index large
245:
Bulletin of the American Society for Information Science and Technology
204: – the process which is automated by automatic indexing 257: 240: 138: 126: 21: 208: 151: 142: 122: 359: 364:. Berlin: Springer Science & Business Media. p. 350. 309:. Delhi: Atlantic Publishers & Distributors. p. 263. 457: 277:
Introduction to Indexing and Abstracting: Fourth Edition
418:
Jayaweera, Y. D.; Johar, Md Gapar Md; Perera, S. N.
339:. Hoboken, NJ: John Wiley & Sons. pp. xii. 46:. Unsourced material may be challenged and removed. 386: 274: 460:"Automatic indexing in a drug information portal" 495: 417: 393:. Medford, NJ: Information Today, Inc. pp.  334: 464:Studies in Health Technology and Informatics 279:. Santa Barbara, CA: ABC-CLIO. p. 289. 307:Advanced Indexing and Abstracting Practices 275:Cleveland, Ana; Cleveland, Donald (2013). 437:. Cambridge, MA: MIT Press. p. 291. 432: 256: 106:Learn how and when to remove this message 241:"Automatic Indexing: A Matter of Degree" 496: 239:Hlava, Marjorie M. (31 January 2005). 183:also referred to as thinking machine. 384: 238: 125:process of scanning large volumes of 330: 328: 318: 316: 304: 300: 298: 296: 270: 268: 234: 232: 230: 44:adding citations to reliable sources 15: 335:Torres-Moreno, Juan-Manuel (2014). 13: 14: 515: 325: 313: 293: 265: 227: 20: 451: 31:needs additional citations for 426: 411: 378: 353: 1: 220: 337:Automatic Text Summarization 7: 195: 186: 10: 520: 165: 433:Armstrong, Susan (1994). 173:computational linguistics 305:Riaz, Muhammad (1989). 177:artificial intelligence 420:"Open Journal Systems" 181:self-organizing system 131:controlled vocabulary 385:Basch, Reva (1996). 55:"Automatic indexing" 40:improve this article 435:Using Large Corpora 147:electronic document 504:Index (publishing) 119:Automatic indexing 371:978-3-540-69857-9 116: 115: 108: 90: 511: 488: 487: 455: 449: 448: 430: 424: 423: 415: 409: 408: 392: 382: 376: 375: 357: 351: 350: 332: 323: 320: 311: 310: 302: 291: 290: 272: 263: 262: 260: 258:10.1002/bult.261 236: 202:Subject indexing 111: 104: 100: 97: 91: 89: 48: 24: 16: 519: 518: 514: 513: 512: 510: 509: 508: 494: 493: 492: 491: 456: 452: 445: 431: 427: 416: 412: 405: 383: 379: 372: 358: 354: 347: 333: 326: 321: 314: 303: 294: 287: 273: 266: 237: 228: 223: 198: 189: 168: 112: 101: 95: 92: 49: 47: 37: 25: 12: 11: 5: 517: 507: 506: 490: 489: 450: 443: 425: 410: 403: 377: 370: 352: 345: 324: 312: 292: 285: 264: 225: 224: 222: 219: 218: 217: 212: 205: 197: 194: 188: 185: 167: 164: 114: 113: 28: 26: 19: 9: 6: 4: 3: 2: 516: 505: 502: 501: 499: 485: 481: 477: 473: 469: 465: 461: 454: 446: 440: 436: 429: 421: 414: 406: 400: 396: 391: 390: 381: 373: 367: 363: 356: 348: 346:9781848216686 342: 338: 331: 329: 319: 317: 308: 301: 299: 297: 288: 286:9781598849769 282: 278: 271: 269: 259: 254: 250: 246: 242: 235: 233: 231: 226: 216: 213: 210: 206: 203: 200: 199: 193: 184: 182: 178: 174: 163: 159: 157: 153: 148: 144: 140: 136: 132: 128: 124: 120: 110: 107: 99: 88: 85: 81: 78: 74: 71: 67: 64: 60: 57: –  56: 52: 51:Find sources: 45: 41: 35: 34: 29:This article 27: 23: 18: 17: 467: 463: 453: 434: 428: 413: 388: 380: 361: 355: 336: 306: 276: 251:(1): 12–15. 248: 244: 215:Web indexing 190: 169: 160: 123:computerized 118: 117: 102: 93: 83: 76: 69: 62: 50: 38:Please help 33:verification 30: 470:: 112–122. 156:information 96:August 2010 444:0262510820 404:0910965226 221:References 129:against a 66:newspapers 476:0926-9630 139:thesaurus 127:documents 498:Category 484:19745241 209:metadata 196:See also 187:Medicine 152:Internet 143:ontology 135:taxonomy 166:History 121:is the 80:scholar 482:  474:  441:  401:  368:  343:  283:  82:  75:  68:  61:  53:  207:Tag ( 87:JSTOR 73:books 480:PMID 472:ISSN 439:ISBN 399:ISBN 366:ISBN 341:ISBN 281:ISBN 179:and 59:news 468:148 395:271 253:doi 141:or 42:by 500:: 478:. 466:. 462:. 397:. 327:^ 315:^ 295:^ 267:^ 249:29 247:. 243:. 229:^ 137:, 133:, 486:. 447:. 422:. 407:. 374:. 349:. 289:. 261:. 255:: 211:) 109:) 103:( 98:) 94:( 84:· 77:· 70:· 63:· 36:.

Index


verification
improve this article
adding citations to reliable sources
"Automatic indexing"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
computerized
documents
controlled vocabulary
taxonomy
thesaurus
ontology
electronic document
Internet
information
computational linguistics
artificial intelligence
self-organizing system
Subject indexing
metadata
Web indexing



"Automatic Indexing: A Matter of Degree"

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.