60:
27:
279:
elements represented in only one corpus in order to extract cleaner parallel fragments of bilingual elements. Comparable corpora are used to directly obtain knowledge for translation purposes. High-quality parallel data is difficult to obtain, however, especially for under-resourced languages.
338:
have some similarities with translation memories. The most salient difference is that a translation memory loses the original context, while a bitext retains the original sentence order. That said, some implementations of translation memory, such as
361:
In his original 1988 article, Harris also posited that bitext represents how translators hold their source and target texts together in their mental working memories as they progress. However, this hypothesis has not been followed up.
485:
Abdallah, A. (2021). Impact of using parallel text strategy on teaching reading to intermediate II level students. International
Journal on Social and Education Sciences (IJonSES), 3(1), 95-108.
313:, which automatically aligns the original and translated versions of the same text. The tool generally matches these two texts sentence by sentence. A collection of bitexts is called a
241:
contains bilingual sentences that are not perfectly aligned or have poor quality translations. Nevertheless, most of its contents are bilingual translations of a specific document.
268:
algorithms are usually extracted from large bodies of similar sources, such as databases of news articles written in the first and second languages describing similar events.
234:
contains translations of the same document in two or more languages, aligned at least at the sentence level. These tend to be rarer than less-comparable corpora.
921:
1081:
211:
research. During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task.
358:, not by a machine. As such, small alignment errors or minor discrepancies that would cause a translation memory to fail are of no importance.
1059:
365:
Online bitexts and translation memories may also be called online bilingual concordances. Several are available on the public Web, including
880:
271:
However, extracted fragments may be noisy, with extra elements inserted in each corpus. Extraction techniques can differentiate between
1470:
914:
558:
193:
47:
1639:
731:
810:
707:
500:"Noisy-Parallel and Comparable Corpora Filtering Methodology for the Extraction of Bi-Lingual Equivalent Data at Sentence Level"
176:
may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study;
1675:
Proceedings of the 5th
International Conference on Language Resources and Evaluation (LREC'2006). Genoa, Italy, 24–26 May 2006
470:
443:
124:
1669:
Ralf, Ralf
Steinberger; Pouliquen, Bruno; Widiger, Anna; Ignat, Camelia; Erjavec, Tomaž; Tufiş, Dan; Varga, Dániel (2006).
1380:
1071:
907:
96:
1634:
1241:
1395:
1226:
392:
143:
103:
678:
1166:
1583:
1236:
820:
758:
722:
TERMSEARCH – English/Russian/French parallel corpora (Major international treaties, conventions, agreements, etc.
412:
1693:
1231:
976:
815:
110:
81:
77:
1698:
1500:
1221:
842:
348:
248:
is built from non-sentence-aligned and untranslated bilingual documents, but the documents are topic-aligned.
1193:
92:
1703:
1538:
1523:
1495:
1360:
1355:
930:
397:
387:
340:
1275:
1246:
1024:
255:
includes very heterogeneous and non-parallel bilingual documents that may or may not be topic-aligned.
184:(Greek for "sixfold") placed six versions of the Old Testament side by side. A famous example is the
1118:
971:
744:
1644:
1568:
1300:
1256:
1141:
1039:
609:
and Their
Reliability Through a Contrastive Analysis of Complex Prepositions from French to English
370:
1548:
1518:
1185:
877:
70:
1019:
832:
1405:
1098:
1076:
1066:
1034:
1009:
165:
164:
is the identification of the corresponding sentences in both halves of the parallel text. The
1265:
894:
382:
289:
169:
302:
is a merged document composed of both source- and target-language versions of a given text.
1618:
1294:
1270:
1123:
687:
521:
117:
8:
1598:
1528:
1485:
1441:
1213:
1203:
1198:
1086:
295:
265:
573:
525:
1608:
1480:
1345:
1108:
1091:
949:
789:
652:
539:
511:
330:
215:
207:). Alignments of parallel corpora at sentence level are prerequisite for many areas of
173:
1613:
1325:
1133:
1044:
704:
649:
WeBiText: Building Large
Heterogeneous Translation Memories from Parallel Web Content
466:
439:
189:
39:
656:
543:
1490:
1375:
1350:
1151:
1054:
754:
613:
529:
799:
Language Grid – Multilingual service platform that includes parallel text services
1674:
1602:
1563:
1558:
1426:
1156:
1029:
1004:
986:
884:
748:
711:
647:
Désilets, Alain; Farley, Benoît; Stojanović, Marta; Patenaude, Geneviève (2008).
460:
433:
402:
1310:
1290:
1014:
793:
682:
534:
499:
407:
20:
899:
1687:
1573:
1385:
1146:
781:
651:. Proceedings of Translating and the Computer. Vol. 30. pp. 27–28.
185:
43:
31:
843:
An implementation of the Gale and Church sentence alignment algorithm (2005)
612:(M.A. thesis). Université catholique de Louvain & Universitetet i Oslo.
1553:
1171:
766:, concordancer (open source AGPL) with online search on JCR and UNO corpus
1671:
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages
1510:
1390:
1103:
996:
944:
867:
847:
355:
276:
208:
204:
889:
862:
837:
769:
1113:
857:
618:
852:
486:
981:
821:
Proceedings of the 2005 Workshop on
Building and Using Parallel Texts
816:
Proceedings of the 2003 Workshop on
Building and Using Parallel Texts
811:
Parallel text processing bibliography by J. Veronis and M.-D. Mahimon
700:
The Opus project aims at collecting freely available parallel corpora
593:
How
Reliable Are Online Bilingual Concordancers? An investigation of
459:
Williams, Philip; Sennrich, Rico; Post, Matt; Koehn, Philipp (2016).
272:
872:
716:
591:
59:
1456:
1436:
1421:
1400:
1370:
1315:
1280:
1161:
516:
694:
351:(CAT) programs, allow preserving the original order of sentences.
1593:
1451:
1431:
1305:
1049:
964:
785:
646:
366:
181:
26:
959:
954:
868:
Bleualign – machine translation based sentence alignment (2010)
705:
Japanese-English
Bilingual Corpus of Knowledge's Kyoto Articles
177:
775:
226:
Parallel corpora can be classified into four main categories:
1649:
1285:
721:
699:
172:
are two examples of dual-language series of texts. Reference
160:
is a text placed alongside its translation or translations.
35:
736:
726:
1668:
1446:
741:
737:
ParaSol – A parallel corpus of Slavic and other languages
695:
European
Parliament Proceedings Parallel Corpus 1996–2011
458:
344:
778:
multilingual parallel corpora, online search interface.
305:
Bitexts are generated by a piece of software called an
633:
798:
763:
727:
TradooIT – English/French/Spanish – Free Online tools
16:
Text placed alongside its translation or translations
838:
Uplug – tools for processing parallel corpora (2003)
732:
Nunavut Hansard – English/Inuktitut parallel corpus
452:
347:format for exchanging translation memories between
324:
84:. Unsourced material may be challenged and removed.
1685:
1132:
435:Routledge Encyclopedia of Translation Technology
354:Bitexts are designed to be consulted by a human
929:
199:Large collections of parallel texts are called
559:"Bi-Text, A New Concept in Translation Theory"
915:
717:COMPARA – Portuguese/English parallel corpora
38:engraved with the same decree in both of the
895:Web Alignment Tool at University of Grenoble
890:Vecalign sentence alignment algorithm (2019)
462:Syntax-based Statistical Machine Translation
221:
679:The JRC-Acquis Multilingual Parallel Corpus
321:, and can be consulted with a search tool.
922:
908:
755:InterCorp: A multilingual parallel corpus
617:
533:
515:
275:elements represented in both corpora and
144:Learn how and when to remove this message
878:Hierarchical alignment tool (HAT) (2018)
264:Large corpora used as training sets for
25:
589:
425:
1686:
742:Glosbe: Multilanguage parallel corpora
556:
903:
1381:Simple Knowledge Organization System
848:The Hunalign sentence aligner (2005)
497:
431:
82:adding citations to reliable sources
53:
672:
634:"TradooIT – Concordancier bilingue"
487:https://doi.org/10.46328/ijonses.48
259:
13:
826:
792:and other public documents of the
14:
1715:
1396:Thesaurus (information retrieval)
863:Gargantua sentence aligner (2010)
757:40 languages aligned with Czech,
667:
393:Example-based machine translation
804:
325:Bitexts and translation memories
58:
772:, with online search interface.
413:Statistical machine translation
69:needs additional citations for
1662:
977:Natural language understanding
640:
626:
583:
550:
491:
479:
214:Parallel texts may be used in
188:, whose discovery allowed the
50:the Ancient Egyptian language.
1:
1501:Optical character recognition
418:
349:computer-assisted translation
1194:Multi-document summarization
833:GIZA++ alignment tool (1999)
751:with online search interface
7:
1524:Latent Dirichlet allocation
1496:Natural language generation
1361:Machine-readable dictionary
1356:Linguistic Linked Open Data
931:Natural language processing
398:Natural language processing
388:Computer-assisted reviewing
376:
341:Translation Memory eXchange
46:. Its discovery was key to
10:
1720:
1276:Explicit semantic analysis
1025:Deep linguistic processing
535:10.7494/csci.2015.16.2.169
328:
287:
18:
1627:
1582:
1537:
1509:
1469:
1414:
1336:
1324:
1255:
1212:
1184:
1119:Word-sense disambiguation
995:
972:Computational linguistics
937:
557:Harris, B. (March 1988).
465:. Morgan & Claypool.
283:
222:Types of parallel corpora
190:Ancient Egyptian language
1645:Natural Language Toolkit
1569:Pronunciation assessment
1471:Automatic identification
1301:Latent semantic analysis
1257:Distributional semantics
1142:Compound-term processing
1040:Named-entity recognition
691:with 231 language pairs.
498:Wołk, Krzysztof (2015).
40:Ancient Egyptian scripts
19:Not to be confused with
1549:Automated essay scoring
1519:Document classification
1186:Automatic summarization
782:EUR-Lex Corpus – corpus
759:online search interface
590:Genette, Marie (2016).
253:quasi-comparable corpus
162:Parallel text alignment
1406:Universal Dependencies
1099:Terminology extraction
1082:Semantic decomposition
1077:Semantic role labeling
1067:Part-of-speech tagging
1035:Information extraction
1020:Coreference resolution
1010:Collocation extraction
572:: 8–10. Archived from
432:Chan, Sin-Wai (2015).
166:Loeb Classical Library
51:
1694:Translation databases
1167:Sentence segmentation
788:database consists of
681:of the total body of
438:. London: Routledge.
383:Bilingual inscription
290:Bitext word alignment
239:noisy parallel corpus
170:Clay Sanskrit Library
29:
1699:Language acquisition
1619:Voice user interface
1330:datasets and corpora
1271:Document-term matrix
1124:Word-sense induction
688:Acquis Communautaire
78:improve this article
1599:Interactive fiction
1529:Pachinko allocation
1486:Speech segmentation
1442:Google Ngram Viewer
1214:Machine translation
1204:Text simplification
1199:Sentence extraction
1087:Semantic similarity
858:mALIGNa (2008–2020)
526:2015arXiv151004500W
296:translation studies
266:machine translation
1704:Corpus linguistics
1609:Question answering
1481:Speech recognition
1346:Corpus linguistics
1326:Language resources
1109:Textual entailment
1092:Sentiment analysis
883:2020-07-05 at the
853:Champollion (2006)
790:European Union law
747:2013-05-27 at the
710:2012-08-22 at the
343:(TMX), a standard
331:Translation memory
216:language education
52:
1658:
1657:
1614:Virtual assistant
1539:Computer-assisted
1465:
1464:
1222:Computer-assisted
1180:
1179:
1172:Word segmentation
1134:Text segmentation
1072:Semantic analysis
1060:Syntactic parsing
1045:Ontology learning
472:978-1-62705-502-4
445:978-1-315-74912-9
246:comparable corpus
154:
153:
146:
128:
1711:
1679:
1678:
1666:
1635:Formal semantics
1584:Natural language
1491:Speech synthesis
1473:and data capture
1376:Semantic network
1351:Lexical resource
1334:
1333:
1152:Lexical analysis
1130:
1129:
1055:Semantic parsing
924:
917:
910:
901:
900:
784:built up of the
673:Parallel corpora
661:
660:
644:
638:
637:
630:
624:
623:
621:
587:
581:
580:
578:
566:Language Monthly
563:
554:
548:
547:
537:
519:
504:Computer Science
495:
489:
483:
477:
476:
456:
450:
449:
429:
373:, and Tradooit.
319:bilingual corpus
294:In the field of
260:Noise in corpora
201:parallel corpora
149:
142:
138:
135:
129:
127:
86:
62:
54:
1719:
1718:
1714:
1713:
1712:
1710:
1709:
1708:
1684:
1683:
1682:
1667:
1663:
1659:
1654:
1623:
1603:Syntax guessing
1585:
1578:
1564:Predictive text
1559:Grammar checker
1540:
1533:
1505:
1472:
1461:
1427:Bank of English
1410:
1338:
1329:
1320:
1251:
1208:
1176:
1128:
1030:Distant reading
1005:Argument mining
991:
987:Text processing
933:
928:
885:Wayback Machine
829:
827:Alignment tools
807:
749:Wayback Machine
712:Wayback Machine
675:
670:
665:
664:
645:
641:
632:
631:
627:
588:
584:
576:
561:
555:
551:
496:
492:
484:
480:
473:
457:
453:
446:
430:
426:
421:
403:Polyglot (book)
379:
333:
327:
315:bitext database
292:
286:
262:
232:parallel corpus
224:
192:to begin being
150:
139:
133:
130:
93:"Parallel text"
87:
85:
75:
63:
24:
17:
12:
11:
5:
1717:
1707:
1706:
1701:
1696:
1681:
1680:
1660:
1656:
1655:
1653:
1652:
1647:
1642:
1637:
1631:
1629:
1625:
1624:
1622:
1621:
1616:
1611:
1606:
1596:
1590:
1588:
1586:user interface
1580:
1579:
1577:
1576:
1571:
1566:
1561:
1556:
1551:
1545:
1543:
1535:
1534:
1532:
1531:
1526:
1521:
1515:
1513:
1507:
1506:
1504:
1503:
1498:
1493:
1488:
1483:
1477:
1475:
1467:
1466:
1463:
1462:
1460:
1459:
1454:
1449:
1444:
1439:
1434:
1429:
1424:
1418:
1416:
1412:
1411:
1409:
1408:
1403:
1398:
1393:
1388:
1383:
1378:
1373:
1368:
1363:
1358:
1353:
1348:
1342:
1340:
1331:
1322:
1321:
1319:
1318:
1313:
1311:Word embedding
1308:
1303:
1298:
1291:Language model
1288:
1283:
1278:
1273:
1268:
1262:
1260:
1253:
1252:
1250:
1249:
1244:
1242:Transfer-based
1239:
1234:
1229:
1224:
1218:
1216:
1210:
1209:
1207:
1206:
1201:
1196:
1190:
1188:
1182:
1181:
1178:
1177:
1175:
1174:
1169:
1164:
1159:
1154:
1149:
1144:
1138:
1136:
1127:
1126:
1121:
1116:
1111:
1106:
1101:
1095:
1094:
1089:
1084:
1079:
1074:
1069:
1064:
1063:
1062:
1057:
1047:
1042:
1037:
1032:
1027:
1022:
1017:
1015:Concept mining
1012:
1007:
1001:
999:
993:
992:
990:
989:
984:
979:
974:
969:
968:
967:
962:
952:
947:
941:
939:
935:
934:
927:
926:
919:
912:
904:
898:
897:
892:
887:
875:
870:
865:
860:
855:
850:
845:
840:
835:
828:
825:
824:
823:
818:
813:
806:
803:
802:
801:
796:
794:European Union
779:
773:
767:
764:myCAT – Olanto
761:
752:
739:
734:
729:
724:
719:
714:
702:
697:
692:
683:European Union
674:
671:
669:
668:External links
666:
663:
662:
639:
625:
607:ReversoContext
582:
579:on 2018-03-02.
549:
510:(2): 169–184.
490:
478:
471:
451:
444:
423:
422:
420:
417:
416:
415:
410:
408:Ruby character
405:
400:
395:
390:
385:
378:
375:
329:Main article:
326:
323:
307:alignment tool
288:Main article:
285:
282:
261:
258:
257:
256:
249:
242:
235:
223:
220:
152:
151:
66:
64:
57:
21:Parallel novel
15:
9:
6:
4:
3:
2:
1716:
1705:
1702:
1700:
1697:
1695:
1692:
1691:
1689:
1676:
1672:
1665:
1661:
1651:
1648:
1646:
1643:
1641:
1640:Hallucination
1638:
1636:
1633:
1632:
1630:
1626:
1620:
1617:
1615:
1612:
1610:
1607:
1604:
1600:
1597:
1595:
1592:
1591:
1589:
1587:
1581:
1575:
1574:Spell checker
1572:
1570:
1567:
1565:
1562:
1560:
1557:
1555:
1552:
1550:
1547:
1546:
1544:
1542:
1536:
1530:
1527:
1525:
1522:
1520:
1517:
1516:
1514:
1512:
1508:
1502:
1499:
1497:
1494:
1492:
1489:
1487:
1484:
1482:
1479:
1478:
1476:
1474:
1468:
1458:
1455:
1453:
1450:
1448:
1445:
1443:
1440:
1438:
1435:
1433:
1430:
1428:
1425:
1423:
1420:
1419:
1417:
1413:
1407:
1404:
1402:
1399:
1397:
1394:
1392:
1389:
1387:
1386:Speech corpus
1384:
1382:
1379:
1377:
1374:
1372:
1369:
1367:
1366:Parallel text
1364:
1362:
1359:
1357:
1354:
1352:
1349:
1347:
1344:
1343:
1341:
1335:
1332:
1327:
1323:
1317:
1314:
1312:
1309:
1307:
1304:
1302:
1299:
1296:
1292:
1289:
1287:
1284:
1282:
1279:
1277:
1274:
1272:
1269:
1267:
1264:
1263:
1261:
1258:
1254:
1248:
1245:
1243:
1240:
1238:
1235:
1233:
1230:
1228:
1227:Example-based
1225:
1223:
1220:
1219:
1217:
1215:
1211:
1205:
1202:
1200:
1197:
1195:
1192:
1191:
1189:
1187:
1183:
1173:
1170:
1168:
1165:
1163:
1160:
1158:
1157:Text chunking
1155:
1153:
1150:
1148:
1147:Lemmatisation
1145:
1143:
1140:
1139:
1137:
1135:
1131:
1125:
1122:
1120:
1117:
1115:
1112:
1110:
1107:
1105:
1102:
1100:
1097:
1096:
1093:
1090:
1088:
1085:
1083:
1080:
1078:
1075:
1073:
1070:
1068:
1065:
1061:
1058:
1056:
1053:
1052:
1051:
1048:
1046:
1043:
1041:
1038:
1036:
1033:
1031:
1028:
1026:
1023:
1021:
1018:
1016:
1013:
1011:
1008:
1006:
1003:
1002:
1000:
998:
997:Text analysis
994:
988:
985:
983:
980:
978:
975:
973:
970:
966:
963:
961:
958:
957:
956:
953:
951:
948:
946:
943:
942:
940:
938:General terms
936:
932:
925:
920:
918:
913:
911:
906:
905:
902:
896:
893:
891:
888:
886:
882:
879:
876:
874:
871:
869:
866:
864:
861:
859:
856:
854:
851:
849:
846:
844:
841:
839:
836:
834:
831:
830:
822:
819:
817:
814:
812:
809:
808:
805:Documentation
800:
797:
795:
791:
787:
783:
780:
777:
774:
771:
768:
765:
762:
760:
756:
753:
750:
746:
743:
740:
738:
735:
733:
730:
728:
725:
723:
720:
718:
715:
713:
709:
706:
703:
701:
698:
696:
693:
690:
689:
684:
680:
677:
676:
658:
654:
650:
643:
635:
629:
620:
615:
611:
610:
606:
602:
598:
594:
586:
575:
571:
567:
560:
553:
545:
541:
536:
531:
527:
523:
518:
513:
509:
505:
501:
494:
488:
482:
474:
468:
464:
463:
455:
447:
441:
437:
436:
428:
424:
414:
411:
409:
406:
404:
401:
399:
396:
394:
391:
389:
386:
384:
381:
380:
374:
372:
368:
363:
359:
357:
352:
350:
346:
342:
337:
332:
322:
320:
316:
312:
308:
303:
301:
297:
291:
281:
278:
274:
269:
267:
254:
250:
247:
243:
240:
236:
233:
229:
228:
227:
219:
217:
212:
210:
206:
202:
197:
195:
191:
187:
186:Rosetta Stone
183:
179:
175:
171:
167:
163:
159:
158:parallel text
148:
145:
137:
126:
123:
119:
116:
112:
109:
105:
102:
98:
95: –
94:
90:
89:Find sources:
83:
79:
73:
72:
67:This article
65:
61:
56:
55:
49:
45:
44:Ancient Greek
41:
37:
33:
32:Rosetta Stone
28:
22:
1670:
1664:
1554:Concordancer
1365:
950:Bag-of-words
686:
648:
642:
628:
608:
604:
600:
596:
592:
585:
574:the original
569:
565:
552:
507:
503:
493:
481:
461:
454:
434:
427:
364:
360:
353:
335:
334:
318:
314:
310:
306:
304:
299:
293:
270:
263:
252:
245:
238:
231:
225:
213:
200:
198:
161:
157:
155:
140:
131:
121:
114:
107:
100:
88:
76:Please help
71:verification
68:
1511:Topic model
1391:Text corpus
1237:Statistical
1104:Text mining
945:AI-complete
873:YASA (2013)
776:linguatools
619:10852/51577
311:bitext tool
277:monolingual
205:text corpus
48:deciphering
42:as well as
1688:Categories
1232:Rule-based
1114:Truecasing
982:Stop words
685:(EU) law:
517:1510.04500
419:References
356:translator
209:linguistic
194:deciphered
104:newspapers
1541:reviewing
1339:standards
1337:Types and
273:bilingual
1457:Wikidata
1437:FrameNet
1422:BabelNet
1401:Treebank
1371:PropBank
1316:Word2vec
1281:fastText
1162:Stemming
881:Archived
745:Archived
708:Archived
657:14586900
603:WeBiText
599:TradooIT
544:12860633
377:See also
168:and the
134:May 2008
1628:Related
1594:Chatbot
1452:WordNet
1432:DBpedia
1306:Seq2seq
1050:Parsing
965:Trigram
786:EUR-Lex
595:Linguee
522:Bibcode
371:Reverso
367:Linguée
336:Bitexts
309:, or a
182:Hexapla
118:scholar
1601:(c.f.
1259:models
1247:Neural
960:Bigram
955:n-gram
655:
542:
469:
442:
300:bitext
284:Bitext
178:Origen
174:Bibles
120:
113:
106:
99:
91:
1650:spaCy
1295:large
1286:GloVe
653:S2CID
577:(PDF)
562:(PDF)
540:S2CID
512:arXiv
317:or a
203:(see
125:JSTOR
111:books
36:stele
1415:Data
1266:BERT
770:TAUS
605:and
467:ISBN
440:ISBN
97:news
34:, a
30:The
1447:UBY
614:hdl
530:doi
345:XML
180:'s
80:by
1690::
1673:.
601:,
597:,
570:54
568:.
564:.
538:.
528:.
520:.
508:16
506:.
502:.
369:,
298:a
251:A
244:A
237:A
230:A
218:.
196:.
156:A
1677:.
1605:)
1328:,
1297:)
1293:(
923:e
916:t
909:v
659:.
636:.
622:.
616::
546:.
532::
524::
514::
475:.
448:.
147:)
141:(
136:)
132:(
122:·
115:·
108:·
101:·
74:.
23:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.