Named-entity recognition

474: 370:"errors" are close to correct, and might be adequate for a given purpose. For example, one system might always omit titles such as "Ms." or "Ph.D.", but be compared to a system or ground-truth data that expects titles to be included. In that case, every such name is treated as an error. Because of such issues, it is important actually to examine the kinds of errors, and decide how important they are given one's goals and requirements. 333:

real-world text are not part of entity names, so the baseline accuracy (always predict "not an entity") is extravagantly high, typically >90%; and second, mispredicting the full span of an entity name is not properly penalized (finding only a person's first name when his last name follows might be scored as ½ accuracy).

675:

Wolf; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Pierric; Rault, Tim; Louf, Remi; Funtowicz, Morgan; Davison, Joe; Shleifer, Sam; von Platen, Patrick; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama;

424:

In 2001, research indicated that even state-of-the-art NER systems were brittle, meaning that NER systems developed for one domain did not typically perform well on other domains. Considerable effort is involved in tuning NER systems to perform well in a new domain; this is true for both rule-based

1248:

Han, Li-Feng Aaron, Wong, Fai, Chao, Lidia Sam. (2013). Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics. Proceeding of International Conference of Language Processing and Intelligent Information Systems. M.A. Klopotek et al. (Eds.): IIS 2013,

332:

One overly simple method of measuring accuracy is merely to count what fraction of all tokens in the text were correctly or incorrectly identified as part of entity references (or as being entities of the correct type). This suffers from at least two problems: first, the vast majority of tokens in

327:

correctly identifying an entity, when what the user wanted was a smaller- or larger-scope entity (for example, identifying "James Madison" as a personal name, when it's part of "James Madison University"). Some NER systems impose the restriction that entities may never overlap or nest, which means

307:

These statistical measures work reasonably well for the obvious cases of finding or missing a real entity exactly; and for finding a non-entity. However, NER can fail in many other ways, many of which are arguably "partially correct", and should not be counted as complete success or failures. For

215:

of the names by the type of entity they refer to (e.g. person, organization, or location). The first phase is typically simplified to a segmentation problem: names are defined to be contiguous spans of tokens, with no nesting, so that "Bank of America" is a single name, disregarding the fact that

369:

It follows from the above definition that any prediction that misses a single token, includes a spurious token, or has the wrong class, is a hard error and does not contribute positively to either precision or recall. Thus, this measure may be said to be pessimistic: it can be the case that many

230:

and some numerical expressions (e.g., money, percentages, etc.) may also be considered as named entities in the context of the NER task. While some instances of these types are good examples of rigid designators (e.g., the year 2001) there are also many invalid ones (e.g., I take my vacations in

1122:

Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceeding of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 384–394). Association for Computational Linguistics.

583:

and other microblogs, considered "noisy" due to non-standard orthography, shortness and informality of texts. NER challenges in English Tweets have been organized by research communities to compare performances of various approaches, such as

538:

can be seen as an instance of extremely fine-grained named-entity recognition, where the types are the actual Knowledge pages describing the (potentially ambiguous) concepts. Below is an example output of a Wikification system:

1260:

Han, Li-Feng Aaron, Wong, Zeng, Xiaodong, Derek Fai, Chao, Lidia Sam. (2015). Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model. In Proceedings of SIGHAN workshop in ACL-IJCNLP. 2015.

504:

Despite high F1 numbers reported on the MUC-7 dataset, the problem of named-entity recognition is far from being solved. The main efforts are directed to reducing the annotations labor by employing

428:

Early work in NER systems in the 1990s was aimed primarily at extraction from journalistic articles. Attention then turned to processing of military dispatches and reports. Later stages of the

516:

and semi-supervised machine learning approaches to NER. Another challenging task is devising models to deal with linguistically complex contexts such as Twitter and search queries.

280:

and consists of 29 types and 64 subtypes. Sekine's extended hierarchy, proposed in 2002, is made of 200 subtypes. More recently, in 2011 Ritter used a hierarchy based on common

1329:, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphael Troncy, Johann Petrak, and Kalian Botcheva (2014). “Analysis of named entity recognition and linking for tweets”. 176:

restricts the task to those entities for which one or many strings, such as words or phrases, stand (fairly) consistently for some referent. This is closely related to

1098:

Krallinger, M; Leitner, F; Rabal, O; Vazquez, M; Oyarzabal, J; Valencia, A (2013). "Overview of the chemical compound and drug name recognition (CHEMDNER) task".

355:

evaluation data. I.e. when is predicted but was required, precision for the predicted name is zero. Precision is then averaged over all predicted entity names.

1284: 1459: 1050: 373:

Evaluation models based on a token-by-token matching have been proposed. Such models may be given partial credit for overlapping matches (such as using the

1619: 211:

Full named-entity recognition is often broken down, conceptually and possibly also in implementations, as two distinct problems: detection of names, and

695:

Kariampuzha, William; Alyea, Gioconda; Qu, Sue; Sanjak, Jaleal; Mathé, Ewy; Sid, Eric; Chatelaine, Haley; Yadaw, Arjun; Xu, Yanji; Zhu, Qian (2023).

853: 31: 200:). Rigid designators include proper names as well as terms for certain biological species and substances, but exclude pronouns (such as "it"; see 440:

from conversational telephone speech conversations. Since about 1998, there has been a great deal of interest in entity identification in the

105:

In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified.

2213: 1597: 1184:

Lee, Changki; Hwang, Yi-Gyu; Oh, Hyo-Jung; Lim, Soojong; Heo, Jeong; Lee, Chung-Hee; Kim, Hyeon-Jin; Wang, Ji-Hyun; Jang, Myung-Gil (2006).

397:. Hand-crafted grammar-based systems typically obtain better precision, but at the cost of lower recall and months of work by experienced 531:), and feature sets. And some researchers recently proposed graph-based semi-supervised learning model for language specific NER tasks. 2008: 1452: 1035: 978: 128: 1153: 457: 2177: 652: 508:, robust performance across domains and scaling up to fine-grained entity types. In recent years, many projects have turned to 519:

There are some researchers who did some comparisons about the NER performances from different statistical models such as HMM (

1205: 1345:"Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition" 1918: 1609: 1445: 154: 2172: 358:

Recall is similarly the number of names in the gold standard that appear at exactly the same location in the predictions.

2208: 1779: 184:, although in practice NER deals with many names and referents that are not philosophically "rigid". For instance, the 1933: 1764: 1291: 949: 296:

To evaluate the quality of an NER system's output, several measures have been defined. The usual measures are called

17: 1704: 109: 663: 2121: 1774: 1081: 1769: 1514: 108:

State-of-the-art NER systems for English produce near-human performance. For example, the best system entering

78:

Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one:

1221:

Web 2.0-based crowdsourcing for high-quality gold standard development in clinical Natural Language Processing

318:

partitioning adjacent entities differently (for example, treating "Smith, Jones Robinson" as 2 vs. 3 entities)

2038: 1759: 682:

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

324:

assigning it a related but inexact type (for example, "substance" vs. "drug", or "school" vs. "organization")

216:

inside this name, the substring "America" is itself a name. This segmentation problem is formally similar to

1343:

Baldwin, Timothy; de Marneffe, Marie Catherine; Han, Bo; Kim, Young-Bum; Ritter, Alan; Xu, Wei (July 2015).

1731: 524: 429: 352: 651:

Elaine Marsh, Dennis Perzanowski, "MUC-7 Evaluation of IE Technology: Overview of Results", 29 April 1998

2076: 2061: 2033: 1898: 1893: 1468: 1250: 449: 437: 136: 861: 1813: 1784: 1562: 212: 1656: 1509: 398: 1108: 2182: 2106: 1838: 1794: 1679: 528: 505: 413: 2086: 2056: 1723: 406: 1557: 602: 460:

and drugs in the context of the CHEMDNER competition, with 27 teams participating in this task.

1943: 1636: 1614: 1604: 1572: 1547: 1103: 613: 585: 315:

with more tokens than desired (for example, including the first word of "The University of MD")

201: 72: 60: 1186:"Fine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering" 939: 1803: 919: 633: 597: 197: 2156: 1832: 1808: 1661: 1124: 618: 344: 321:

assigning it a completely wrong type (for example, calling a personal name an organization)

297: 579:

Another field that has seen progress but remains challenging is the application of NER to

312:

with fewer tokens than desired (for example, missing the last token of "John Smith, M.D.")

263:

is therefore not strict and often has to be explained in the context in which it is used.

8: 2136: 2066: 2023: 1979: 1751: 1741: 1736: 1624: 1028:

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

826:

Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition

520: 513: 281: 227: 1160: 1136: 1026: 762: 487:

Please help update this article to reflect recent events or newly available information.

377:

criterion). They allow a finer grained evaluation and comparison of extraction systems.

2146: 2018: 1883: 1646: 1629: 1487: 1404:

Partalas, Ioannis; Lopez, Cédric; Derbas, Nadia; Kalitvianski, Ruslan (December 2016).

1362: 1073: 723: 696: 276: 205: 801: 2151: 1863: 1671: 1582: 1233: 1201: 969: 945: 728: 512:, which is a promising solution to obtain high-quality aggregate human judgments for 441: 390: 68: 1366: 1077: 993: 412:

Many different classifier types have been used to perform machine-learned NER, with

27:

Extraction of named entity mentions in unstructured text into pre-defined categories

2028: 1913: 1888: 1689: 1592: 1352: 1314: 1193: 1141:

Proceedings of the Thirteenth Conference on Computational Natural Language Learning

1065: 1005: 718: 708: 456:

and gene products. There has been also considerable interest in the recognition of

394: 271: 177: 1024: 899: 452:

communities. The most common entity of interest in that domain has been names of

2140: 2101: 2096: 1964: 1694: 1567: 1542: 1524: 1010: 217: 208:), and names for kinds of things as opposed to individuals (for example "Bank"). 151:

features fast statistical NER as well as an open-source named-entity visualizer.

86:

And producing an annotated block of text that highlights the names of entities:

1848: 1828: 1552: 839: 781: 713: 628: 607: 535: 445: 386: 1437: 1069: 432:(ACE) evaluation also included several types of informal text styles, such as 2202: 2111: 1923: 1903: 1684: 1326: 509: 374: 362: 30:"Named entities" redirects here. For HTML, XML, and SGML named entities, see 1272: 534:

A recently emerging task of identifying "important expressions" in text and

259:

is loosened in such cases for practical reasons. The definition of the term

131:

supports NER across many languages and domains out of the box, usable via a

71:

into pre-defined categories such as person names, organizations, locations,

2091: 1709: 1357: 1262: 824: 732: 285: 168: 64: 1100:

Proceedings of the Fourth BioCreative Challenge Evaluation Workshop vol. 2

2048: 1928: 1641: 1534: 1482: 697:"Precision information extraction for rare disease epidemiology at scale" 181: 1197: 1185: 938:

Kapetanios, Epaminondas; Tatar, Doina; Sacarea, Christian (2013-11-14).

304:. However, several issues remain in just how to calculate those values. 1651: 1192:. Lecture Notes in Computer Science. Vol. 4182. pp. 581–587. 623: 567:"https://en.wikipedia.org/University_of_California,_Berkeley" 409:

approaches have been suggested to avoid part of the annotation effort.

402: 401:. Statistical NER systems typically require a large amount of manually 1351:. Beijing, China: Association for Computational Linguistics: 126–135. 1519: 1425:"Bidirectional LSTM for Named Entity Recognition in Twitter Messages" 1380: 746:

Kripke, Saul (1971). "Identity and Necessity". In M.K. Munitz (ed.).

267: 204:), descriptions that pick out a referent by its properties (see also 1424: 1405: 1344: 328:

that in some cases one must make arbitrary or task-specific choices.

1994: 1974: 1959: 1938: 1908: 1853: 1818: 1699: 1429:

Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

1410:

Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

337: 301: 221: 113: 75:, time expressions, quantities, monetary values, percentages, etc. 1220: 2131: 1989: 1969: 1843: 1587: 1502: 1235:

A Two-Step Named Entity Recognizer for Open-Domain Search Queries

1137:

Design challenges and misconceptions in named entity recognition.

580: 142: 196:, although "Ford" can refer to many other entities as well (see 1497: 1492: 1403: 1025:

Jenny Rose Finkel; Trond Grenager; Christopher Manning (2005).

994:"Learning multilingual named entity recognition from Knowledge" 433: 1431:. Osaka, Japan: The COLING 2016 Organizing Committee: 145–152. 1412:. Osaka, Japan: The COLING 2016 Organizing Committee: 171–177. 1406:"Learning to Search for Recognizing Named Entities in Twitter" 2187: 1823: 1097: 885: 148: 145:

includes rule-based and statistical named-entity recognition.

1315:

Local and Global Algorithms for Disambiguation to Knowledge.

270:

of named entity types have been proposed in the literature.

1381:"COLING 2016 Workshop on Noisy User-generated Text (W-NUT)" 927:. Cross-Language Evaluation Forum (CLEF). pp. 100–111. 453: 678:

Transformers: State-of-the-art natural language processing

347:

is the number of predicted entity name spans that line up

1984: 1143:(pp. 147–155). Association for Computational Linguistics. 907:. Proc. Empirical Methods in Natural Language Processing. 901:

Named Entity Recognition in Tweets: An Experimental Study

897: 157:

features token classification using deep learning models.

132: 1349:

Proceedings of the Workshop on Noisy User-generated Text

1342: 822: 694: 336:

In academic conferences such as CoNLL, a variant of the

284:

entity types in ground-breaking experiments on NER over

800:

Carreras, Xavier; Màrquez, Lluís; Padró, Lluís (2003).

783:

A survey of named entity recognition and classification

750:. New York: New York University Press. pp. 135–64. 676:

Lhoest, Quentin; Wolf, Thomas; Rush, Alexander (2020).

610:(aka named entity normalization, entity disambiguation) 937: 550:"https://en.wikipedia.org/Michael_I._Jordan" 674: 119: 1051:"Proper Name Extraction from Non-Journalistic Texts" 898:

Ritter, A.; Clark, S.; Mausam; Etzioni., O. (2011).

799: 1423:Limsopatham, Nut; Collier, Nigel (December 2016). 1422: 823:Tjong Kim Sang, Erik F.; De Meulder, Fien (2003). 463: 385:NER systems have been created that use linguistic 881: 879: 116:while human annotators scored 97.60% and 96.95%. 2200: 1670: 186:automotive company created by Henry Ford in 1903 32:List of XML and HTML character entity references 1467: 917: 255:, etc.). It is arguable that the definition of 1231: 1048: 963: 961: 876: 818: 816: 803:A simple named entity extractor using AdaBoost 1453: 1232:Eiselt, Andreas; Figueroa, Alejandro (2013). 1183: 971:Phrase clustering for discriminative learning 941:Natural Language Processing: Semantic Aspects 243:may refer to the month of an undefined year ( 1273:Linking Documents to Encyclopedic Knowledge. 918:Esuli, Andrea; Sebastiani, Fabrizio (2010). 860:. Linguistic Data Consortium. Archived from 779: 82:Jim bought 300 shares of Acme Corp. in 2006. 958: 813: 274:categories, proposed in 2002, are used for 224:by which to organize categories of things. 1460: 1446: 1135:Ratinov, L., & Roth, D. (2009, June). 888:. Nlp.cs.nyu.edu. Retrieved on 2013-07-21. 308:example, identifying a real entity, but: 1356: 1107: 1049:Poibeau, Thierry; Kosseim, Leila (2001). 1036:Association for Computational Linguistics 1009: 722: 712: 886:Sekine's Extended Named Entity Hierarchy 854:"Annotation Guidelines for Answer Types" 220:. The second phase requires choosing an 991: 842:. Webknox.com. Retrieved on 2013-07-21. 780:Nadeau, David; Sekine, Satoshi (2007). 767:The Stanford Encyclopedia of Philosophy 760: 664:MUC-07 Proceedings (Named Entity Tasks) 14: 2201: 1154:"Frustratingly Easy Domain Adaptation" 745: 1441: 1331:Information Processing and Management 967: 851: 237:2001st year of the Gregorian calendar 231:“June”). In the first case, the year 161: 2214:Tasks of natural language processing 1919:Simple Knowledge Organization System 467: 291: 992:Nothman, Joel; et al. (2013). 425:and trainable statistical systems. 24: 419: 120:Named-entity recognition platforms 63:that seeks to locate and classify 25: 2225: 1934:Thesaurus (information retrieval) 1285:"Learning to link with Knowledge" 968:Lin, Dekang; Wu, Xiaoyun (2009). 921:Evaluating Information Extraction 701:Journal of Translational Medicine 1190:Information Retrieval Technology 472: 239:. In the second case, the month 1416: 1397: 1373: 1336: 1319: 1308: 1277: 1266: 1254: 1242: 1225: 1214: 1177: 1146: 1129: 1116: 1091: 1042: 1018: 985: 981:and IJCNLP. pp. 1030–1038. 931: 911: 891: 845: 833: 789:. Lingvisticae Investigationes. 588:, Learning-to-Search, or CRFs. 536:cross-linking them to Knowledge 464:Current challenges and research 124:Notable NER platforms include: 1515:Natural language understanding 793: 773: 754: 739: 688: 668: 657: 645: 13: 1: 2039:Optical character recognition 1034:. 43rd Annual Meeting of the 639: 389:-based techniques as well as 380: 340:has been defined as follows: 1732:Multi-document summarization 1011:10.1016/j.artint.2012.03.006 430:automatic content extraction 7: 2062:Latent Dirichlet allocation 2034:Natural language generation 1899:Machine-readable dictionary 1894:Linguistic Linked Open Data 1469:Natural language processing 1238:. IJCNLP. pp. 829–833. 591: 450:natural language processing 10: 2230: 1814:Explicit semantic analysis 1563:Deep linguistic processing 1249:LNCS Vol. 7912, pp. 57–68 944:. CRC Press. p. 298. 748:Identity and Individuation 714:10.1186/s12967-023-04011-y 29: 2209:Computational linguistics 2165: 2120: 2075: 2047: 2007: 1952: 1874: 1862: 1793: 1750: 1722: 1657:Word-sense disambiguation 1533: 1510:Computational linguistics 1475: 1333:51(2) : pages 32–49. 1070:10.1163/9789004333901_011 529:conditional random fields 481:This section needs to be 414:conditional random fields 2183:Natural Language Toolkit 2107:Pronunciation assessment 2009:Automatic identification 1839:Latent semantic analysis 1795:Distributional semantics 1680:Compound-term processing 1578:Named-entity recognition 977:. Annual Meeting of the 761:LaPorte, Joseph (2018). 541: 506:semi-supervised learning 416:being a typical choice. 38:Named-entity recognition 2087:Automated essay scoring 2057:Document classification 1724:Automatic summarization 998:Artificial Intelligence 840:Named Entity Definition 399:computational linguists 375:Intersection over Union 1944:Universal Dependencies 1637:Terminology extraction 1620:Semantic decomposition 1615:Semantic role labeling 1605:Part-of-speech tagging 1573:Information extraction 1558:Coreference resolution 1548:Collocation extraction 1058:Language and Computers 614:Information extraction 603:Coreference resolution 202:coreference resolution 188:can be referred to as 103: 84: 61:information extraction 1705:Sentence segmentation 1325:Derczynski, Leon and 634:Smart tag (Microsoft) 598:Controlled vocabulary 93:bought 300 shares of 88: 80: 49:entity identification 2157:Voice user interface 1868:datasets and corpora 1809:Document-term matrix 1662:Word-sense induction 1385:noisy-text.github.io 1358:10.18653/v1/W15-4319 619:Knowledge extraction 228:Temporal expressions 2137:Interactive fiction 2067:Pachinko allocation 2024:Speech segmentation 1980:Google Ngram Viewer 1752:Machine translation 1742:Text simplification 1737:Sentence extraction 1625:Semantic similarity 1198:10.1007/11880592_49 1038:. pp. 363–370. 763:"Rigid Designators" 586:bidirectional LSTMs 521:hidden Markov model 133:graphical interface 2147:Question answering 2019:Speech recognition 1884:Corpus linguistics 1864:Language resources 1647:Textual entailment 1630:Sentiment analysis 391:statistical models 351:with spans in the 277:question answering 206:De dicto and de re 194:Ford Motor Company 166:In the expression 162:Problem definition 59:) is a subtask of 2196: 2195: 2152:Virtual assistant 2077:Computer-assisted 2003: 2002: 1760:Computer-assisted 1718: 1717: 1710:Word segmentation 1672:Text segmentation 1610:Semantic analysis 1598:Syntactic parsing 1583:Ontology learning 1207:978-3-540-45780-0 1102:. pp. 6–37. 684:. pp. 38–45. 502: 501: 458:chemical entities 442:molecular biology 298:precision, recall 292:Formal evaluation 178:rigid designators 112:scored 93.39% of 69:unstructured text 57:entity extraction 44:) (also known as 18:Entity extraction 16:(Redirected from 2221: 2173:Formal semantics 2122:Natural language 2029:Speech synthesis 2011:and data capture 1914:Semantic network 1889:Lexical resource 1872: 1871: 1690:Lexical analysis 1668: 1667: 1593:Semantic parsing 1462: 1455: 1448: 1439: 1438: 1433: 1432: 1420: 1414: 1413: 1401: 1395: 1394: 1392: 1391: 1377: 1371: 1370: 1360: 1340: 1334: 1323: 1317: 1312: 1306: 1305: 1303: 1302: 1296: 1290:. Archived from 1289: 1281: 1275: 1270: 1264: 1258: 1252: 1246: 1240: 1239: 1229: 1223: 1218: 1212: 1211: 1181: 1175: 1174: 1172: 1171: 1165: 1159:. Archived from 1158: 1150: 1144: 1133: 1127: 1120: 1114: 1113: 1111: 1095: 1089: 1088: 1086: 1080:. Archived from 1055: 1046: 1040: 1039: 1033: 1022: 1016: 1015: 1013: 989: 983: 982: 976: 965: 956: 955: 935: 929: 928: 926: 915: 909: 908: 906: 895: 889: 883: 874: 873: 871: 869: 864:on 16 April 2016 852:Brunstein, Ada. 849: 843: 837: 831: 830: 820: 811: 810: 808: 797: 791: 790: 788: 777: 771: 770: 758: 752: 751: 743: 737: 736: 726: 716: 692: 686: 685: 672: 666: 661: 655: 649: 575: 571: 568: 565: 562: 558: 554: 551: 548: 545: 497: 494: 488: 476: 475: 468: 438:text transcripts 395:machine learning 361:F1 score is the 180:, as defined by 21: 2229: 2228: 2224: 2223: 2222: 2220: 2219: 2218: 2199: 2198: 2197: 2192: 2161: 2141:Syntax guessing 2123: 2116: 2102:Predictive text 2097:Grammar checker 2078: 2071: 2043: 2010: 1999: 1965:Bank of English 1948: 1876: 1867: 1858: 1789: 1746: 1714: 1666: 1568:Distant reading 1543:Argument mining 1529: 1525:Text processing 1471: 1466: 1436: 1421: 1417: 1402: 1398: 1389: 1387: 1379: 1378: 1374: 1341: 1337: 1324: 1320: 1313: 1309: 1300: 1298: 1294: 1287: 1283: 1282: 1278: 1271: 1267: 1259: 1255: 1247: 1243: 1230: 1226: 1219: 1215: 1208: 1182: 1178: 1169: 1167: 1163: 1156: 1152: 1151: 1147: 1134: 1130: 1121: 1117: 1109:10.1.1.684.4118 1096: 1092: 1084: 1053: 1047: 1043: 1031: 1023: 1019: 990: 986: 974: 966: 959: 952: 936: 932: 924: 916: 912: 904: 896: 892: 884: 877: 867: 865: 850: 846: 838: 834: 821: 814: 806: 798: 794: 786: 778: 774: 759: 755: 744: 740: 693: 689: 673: 669: 662: 658: 650: 646: 642: 594: 577: 576: 574:</ENTITY> 573: 569: 566: 563: 560: 557:</ENTITY> 556: 552: 549: 546: 543: 525:maximum entropy 498: 492: 489: 486: 477: 473: 466: 422: 420:Problem domains 405:training data. 383: 294: 164: 122: 100: 96: 92: 53:entity chunking 35: 28: 23: 22: 15: 12: 11: 5: 2227: 2217: 2216: 2211: 2194: 2193: 2191: 2190: 2185: 2180: 2175: 2169: 2167: 2163: 2162: 2160: 2159: 2154: 2149: 2144: 2134: 2128: 2126: 2124:user interface 2118: 2117: 2115: 2114: 2109: 2104: 2099: 2094: 2089: 2083: 2081: 2073: 2072: 2070: 2069: 2064: 2059: 2053: 2051: 2045: 2044: 2042: 2041: 2036: 2031: 2026: 2021: 2015: 2013: 2005: 2004: 2001: 2000: 1998: 1997: 1992: 1987: 1982: 1977: 1972: 1967: 1962: 1956: 1954: 1950: 1949: 1947: 1946: 1941: 1936: 1931: 1926: 1921: 1916: 1911: 1906: 1901: 1896: 1891: 1886: 1880: 1878: 1869: 1860: 1859: 1857: 1856: 1851: 1849:Word embedding 1846: 1841: 1836: 1829:Language model 1826: 1821: 1816: 1811: 1806: 1800: 1798: 1791: 1790: 1788: 1787: 1782: 1780:Transfer-based 1777: 1772: 1767: 1762: 1756: 1754: 1748: 1747: 1745: 1744: 1739: 1734: 1728: 1726: 1720: 1719: 1716: 1715: 1713: 1712: 1707: 1702: 1697: 1692: 1687: 1682: 1676: 1674: 1665: 1664: 1659: 1654: 1649: 1644: 1639: 1633: 1632: 1627: 1622: 1617: 1612: 1607: 1602: 1601: 1600: 1595: 1585: 1580: 1575: 1570: 1565: 1560: 1555: 1553:Concept mining 1550: 1545: 1539: 1537: 1531: 1530: 1528: 1527: 1522: 1517: 1512: 1507: 1506: 1505: 1500: 1490: 1485: 1479: 1477: 1473: 1472: 1465: 1464: 1457: 1450: 1442: 1435: 1434: 1415: 1396: 1372: 1335: 1318: 1307: 1276: 1265: 1253: 1241: 1224: 1213: 1206: 1176: 1145: 1128: 1115: 1090: 1087:on 2019-07-30. 1064:(1): 144–157. 1041: 1017: 984: 957: 950: 930: 910: 890: 875: 844: 832: 812: 792: 772: 753: 738: 687: 667: 656: 643: 641: 638: 637: 636: 631: 629:Record linkage 626: 621: 616: 611: 608:Entity linking 605: 600: 593: 590: 559:isaprofessorat 542: 500: 499: 480: 478: 471: 465: 462: 448:, and medical 446:bioinformatics 421: 418: 407:Semisupervised 382: 379: 367: 366: 359: 356: 330: 329: 325: 322: 319: 316: 313: 293: 290: 235:refers to the 213:classification 163: 160: 159: 158: 152: 146: 140: 121: 118: 98: 94: 90: 65:named entities 26: 9: 6: 4: 3: 2: 2226: 2215: 2212: 2210: 2207: 2206: 2204: 2189: 2186: 2184: 2181: 2179: 2178:Hallucination 2176: 2174: 2171: 2170: 2168: 2164: 2158: 2155: 2153: 2150: 2148: 2145: 2142: 2138: 2135: 2133: 2130: 2129: 2127: 2125: 2119: 2113: 2112:Spell checker 2110: 2108: 2105: 2103: 2100: 2098: 2095: 2093: 2090: 2088: 2085: 2084: 2082: 2080: 2074: 2068: 2065: 2063: 2060: 2058: 2055: 2054: 2052: 2050: 2046: 2040: 2037: 2035: 2032: 2030: 2027: 2025: 2022: 2020: 2017: 2016: 2014: 2012: 2006: 1996: 1993: 1991: 1988: 1986: 1983: 1981: 1978: 1976: 1973: 1971: 1968: 1966: 1963: 1961: 1958: 1957: 1955: 1951: 1945: 1942: 1940: 1937: 1935: 1932: 1930: 1927: 1925: 1924:Speech corpus 1922: 1920: 1917: 1915: 1912: 1910: 1907: 1905: 1904:Parallel text 1902: 1900: 1897: 1895: 1892: 1890: 1887: 1885: 1882: 1881: 1879: 1873: 1870: 1865: 1861: 1855: 1852: 1850: 1847: 1845: 1842: 1840: 1837: 1834: 1830: 1827: 1825: 1822: 1820: 1817: 1815: 1812: 1810: 1807: 1805: 1802: 1801: 1799: 1796: 1792: 1786: 1783: 1781: 1778: 1776: 1773: 1771: 1768: 1766: 1765:Example-based 1763: 1761: 1758: 1757: 1755: 1753: 1749: 1743: 1740: 1738: 1735: 1733: 1730: 1729: 1727: 1725: 1721: 1711: 1708: 1706: 1703: 1701: 1698: 1696: 1695:Text chunking 1693: 1691: 1688: 1686: 1685:Lemmatisation 1683: 1681: 1678: 1677: 1675: 1673: 1669: 1663: 1660: 1658: 1655: 1653: 1650: 1648: 1645: 1643: 1640: 1638: 1635: 1634: 1631: 1628: 1626: 1623: 1621: 1618: 1616: 1613: 1611: 1608: 1606: 1603: 1599: 1596: 1594: 1591: 1590: 1589: 1586: 1584: 1581: 1579: 1576: 1574: 1571: 1569: 1566: 1564: 1561: 1559: 1556: 1554: 1551: 1549: 1546: 1544: 1541: 1540: 1538: 1536: 1535:Text analysis 1532: 1526: 1523: 1521: 1518: 1516: 1513: 1511: 1508: 1504: 1501: 1499: 1496: 1495: 1494: 1491: 1489: 1486: 1484: 1481: 1480: 1478: 1476:General terms 1474: 1470: 1463: 1458: 1456: 1451: 1449: 1444: 1443: 1440: 1430: 1426: 1419: 1411: 1407: 1400: 1386: 1382: 1376: 1368: 1364: 1359: 1354: 1350: 1346: 1339: 1332: 1328: 1327:Diana Maynard 1322: 1316: 1311: 1297:on 2019-01-25 1293: 1286: 1280: 1274: 1269: 1263: 1257: 1251: 1245: 1237: 1236: 1228: 1222: 1217: 1209: 1203: 1199: 1195: 1191: 1187: 1180: 1166:on 2010-06-13 1162: 1155: 1149: 1142: 1138: 1132: 1126: 1119: 1110: 1105: 1101: 1094: 1083: 1079: 1075: 1071: 1067: 1063: 1059: 1052: 1045: 1037: 1030: 1029: 1021: 1012: 1007: 1003: 999: 995: 988: 980: 973: 972: 964: 962: 953: 951:9781466584969 947: 943: 942: 934: 923: 922: 914: 903: 902: 894: 887: 882: 880: 863: 859: 855: 848: 841: 836: 828: 827: 819: 817: 805: 804: 796: 785: 784: 776: 768: 764: 757: 749: 742: 734: 730: 725: 720: 715: 710: 706: 702: 698: 691: 683: 679: 671: 665: 660: 654: 648: 644: 635: 632: 630: 627: 625: 622: 620: 617: 615: 612: 609: 606: 604: 601: 599: 596: 595: 589: 587: 582: 555:MichaelJordan 540: 537: 532: 530: 526: 522: 517: 515: 511: 510:crowdsourcing 507: 496: 484: 479: 470: 469: 461: 459: 455: 451: 447: 443: 439: 435: 431: 426: 417: 415: 410: 408: 404: 400: 396: 392: 388: 378: 376: 371: 365:of these two. 364: 363:harmonic mean 360: 357: 354: 353:gold standard 350: 346: 343: 342: 341: 339: 334: 326: 323: 320: 317: 314: 311: 310: 309: 305: 303: 299: 289: 287: 283: 279: 278: 273: 269: 264: 262: 258: 254: 250: 246: 242: 238: 234: 229: 225: 223: 219: 214: 209: 207: 203: 199: 195: 191: 187: 183: 179: 175: 171: 170: 156: 153: 150: 147: 144: 141: 138: 134: 130: 127: 126: 125: 117: 115: 111: 106: 102: 87: 83: 79: 76: 74: 73:medical codes 70: 67:mentioned in 66: 62: 58: 54: 50: 47: 43: 39: 33: 19: 2092:Concordancer 1577: 1488:Bag-of-words 1428: 1418: 1409: 1399: 1388:. Retrieved 1384: 1375: 1348: 1338: 1330: 1321: 1310: 1299:. Retrieved 1292:the original 1279: 1268: 1256: 1244: 1234: 1227: 1216: 1189: 1179: 1168:. Retrieved 1161:the original 1148: 1140: 1131: 1118: 1099: 1093: 1082:the original 1061: 1057: 1044: 1027: 1020: 1001: 997: 987: 970: 940: 933: 920: 913: 900: 893: 866:. Retrieved 862:the original 857: 847: 835: 825: 802: 795: 782: 775: 766: 756: 747: 741: 704: 700: 690: 681: 677: 670: 659: 647: 578: 533: 527:), and CRF ( 518: 503: 490: 482: 427: 423: 411: 384: 372: 368: 348: 335: 331: 306: 295: 286:social media 275: 265: 261:named entity 260: 257:named entity 256: 252: 248: 244: 240: 236: 232: 226: 210: 193: 189: 185: 173: 169:named entity 167: 165: 155:Transformers 123: 107: 104: 95:Organization 89: 85: 81: 77: 56: 52: 48: 45: 41: 37: 36: 2049:Topic model 1929:Text corpus 1775:Statistical 1642:Text mining 1483:AI-complete 1004:: 151–175. 858:LDC Catalog 268:hierarchies 172:, the word 2203:Categories 1770:Rule-based 1652:Truecasing 1520:Stop words 1390:2022-08-13 1301:2014-07-21 1170:2012-04-05 707:(1): 157. 640:References 624:Onomastics 561:<ENTITY 544:<ENTITY 514:supervised 381:Approaches 253:every June 2079:reviewing 1877:standards 1875:Types and 1104:CiteSeerX 493:July 2021 403:annotated 345:Precision 249:next June 245:past June 114:F-measure 1995:Wikidata 1975:FrameNet 1960:BabelNet 1939:Treebank 1909:PropBank 1854:Word2vec 1819:fastText 1700:Stemming 1367:14500933 1078:12591786 829:. CoNLL. 809:. CoNLL. 733:36855134 592:See also 572:Berkeley 393:such as 338:F1 score 302:F1 score 282:Freebase 266:Certain 222:ontology 218:chunking 2166:Related 2132:Chatbot 1990:WordNet 1970:DBpedia 1844:Seq2seq 1588:Parsing 1503:Trigram 868:21 July 724:9972634 581:Twitter 523:), ME ( 483:updated 434:weblogs 387:grammar 349:exactly 143:OpenNLP 46:(named) 2139:(c.f. 1797:models 1785:Neural 1498:Bigram 1493:n-gram 1365: 1204: 1106: 1076: 948: 731: 721: 300:, and 288:text. 182:Kripke 135:and a 91:Person 55:, and 2188:spaCy 1833:large 1824:GloVe 1363:S2CID 1295:(PDF) 1288:(PDF) 1164:(PDF) 1157:(PDF) 1085:(PDF) 1074:S2CID 1054:(PDF) 1032:(PDF) 975:(PDF) 925:(PDF) 905:(PDF) 807:(PDF) 787:(PDF) 454:genes 174:named 149:SpaCy 110:MUC-7 1953:Data 1804:BERT 1202:ISBN 946:ISBN 870:2013 729:PMID 570:> 564:url= 553:> 547:url= 436:and 241:June 233:2001 198:Ford 190:Ford 139:API. 137:Java 129:GATE 99:Time 1985:UBY 1353:doi 1194:doi 1139:In 1125:PDF 1066:doi 1006:doi 1002:194 979:ACL 719:PMC 709:doi 653:PDF 272:BBN 192:or 97:in 42:NER 2205:: 1427:. 1408:. 1383:. 1361:. 1347:. 1200:. 1188:. 1072:. 1062:37 1060:. 1056:. 1000:. 996:. 960:^ 878:^ 856:. 815:^ 765:. 727:. 717:. 705:21 703:. 699:. 680:. 444:, 251:, 247:, 51:, 2143:) 1866:, 1835:) 1831:( 1461:e 1454:t 1447:v 1393:. 1369:. 1355:: 1304:. 1210:. 1196:: 1173:. 1112:. 1068:: 1014:. 1008:: 954:. 872:. 769:. 735:. 711:: 495:) 491:( 485:. 101:. 40:( 34:. 20:)

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index