Parallel Text
A text presented alongside its translation into one or more other languages, used for language learning, translation studies, philology, and as the foundation for statistical and machine translation systems.
Also known as: Bitext, Parallel Corpus, Bilingual Text
Category: Learning & Education
Tags: language, languages, linguistics, learning, translation-studies, natural-language-processing
Explanation
A parallel text is a document in which the same content appears in two or more languages, arranged so that corresponding passages can be compared. When the alignment is done at the level of sentences or smaller units, the result is often called a **parallel corpus**. Parallel texts are among the oldest and most powerful tools for crossing language barriers; they sit behind everything from the [[rosetta-stone|Rosetta Stone]] to modern neural machine translation.
## Historical Forms
Parallel texts have a long history in religious, legal, and scholarly contexts:
- **Trilingual and bilingual inscriptions** issued by ancient states to publish edicts in the languages of their subjects (e.g., the Rosetta Stone, the Behistun Inscription).
- **Polyglot Bibles** such as the Complutensian Polyglot (1517) and the London Polyglot (1657), which printed Hebrew, Greek, Latin, Aramaic, and other versions side by side for scholarly comparison.
- **Interlinear glosses** in medieval manuscripts, with word-for-word translations between the lines of a sacred or classical text.
- **Bilingual editions** of literary works, with original and translation on facing pages—still a common publishing format.
## Uses
### Language Learning
Parallel texts let learners read meaningful content in a target language while keeping a known-language safety net at hand. Methods built around them include the **Schliemann method** (reading the same text first in a known then in an unknown language), the **Assimil** approach (graded parallel dialogues), and the **LR (Listening-Reading) method** popularized by language learners online. Parallel reading is particularly effective at supplying [[comprehensible-input|comprehensible input]] in volumes that pure study cannot match.
### Translation and Philology
Translators use parallel texts as references for terminology, style, and idiom. Philologists and historical linguists use them to reconstruct lost vocabulary, trace semantic change, and study how concepts move between languages.
### Computational Linguistics
Large parallel corpora—such as the Europarl corpus (European Parliament proceedings), the UN Parallel Corpus, OPUS, and many Bible translations—underpin both **statistical machine translation** (which learns word and phrase alignments from aligned bitext) and **neural machine translation** (which trains sequence-to-sequence models on the same data). Sentence-aligned parallel corpora remain among the most valuable training resources in NLP.
## Alignment
A raw parallel text is more useful once aligned. Alignment can be:
- **Document-level** — two texts known to be translations of each other.
- **Paragraph- or section-level** — coarse correspondences.
- **Sentence-level** — the standard granularity for most computational uses.
- **Word- or sub-word-level** — needed for fine-grained lexical work and as a byproduct of many translation models.
Automatic alignment algorithms (Gale–Church, IBM Models, modern neural aligners) make it feasible to build large aligned corpora from raw bitext.
## Limits
Parallel texts are constrained by the styles and genres they cover (parliamentary proceedings and religious texts are over-represented), by translation quality, and by the fact that some languages have very little aligned data available. For low-resource languages, building usable parallel corpora is itself an active research problem.
Related Concepts
← Back to all concepts