The Parallel Corpus German / Spanish, PaGeS , is a bilingual parallel corpus consisting of German and Spanish original and translated texts as well as a small percentage of German and Spanish translations from a third language that have been linked together sentence by sentence.
They form a growing collection of fiction (roughly 90% of novels and short stories) and nonfiction (essays and popular science texts). Many of the selected books are represented not by the full texts but by samples, allowing a better cross-section of texts.
The creation of PaGes is part of a broader research project which aims at studying and analyzing the expression of spatial relations in Spanish and German. This project is carried out by the research group SpatiAlEs led by Prof. Irene Doval at the University of Santiago de Compostela, Galicia.
Even though the Corpus was created for the mentioned purpose, efforts are being made regarding interoperability and standardization in order to design a multifunctional resource able to meet the needs of diverse user groups. The main idea behind this effort is to build a representative language resource for German and Spanish that can be exploited for multiple purposes. The applications can include general research in contrastive linguistics, linguistic typology, translation studies and bilingual lexicography, as well as training automatic translation systems. The Corpus is also useful for German or Spanish learners at intermediate to advanced levels for getting a multitude of translation suggestions shown in usage examples.
At the current stage (November 2018), PaGeS contains ca. 25.000.000 words (the inclusion of punctuation marks and other symbols it would lead to more than 28.000.000 tokens) and 858. 470 bisegments, i.e. pairs of aligned text chunks (sentences or smaller segments).
To guarantee the quality the corpus has been manually verified at different levels, including preprocessing, sentence splitting and sentence alignment. Each text is supplied with information about author, title, year of the first publication and, when applicable, the used edition.
Statistics (Release: 15/11/2018)
|German Translation < Spanish||54||319,315||136,543||5,057,274|
|German Translation < 3rd language||18||152,841||74,339||2,143,959|
|Spanish Translation < German||62||386,314||102,431||5,311,191|
|Spanish Translation <3rd language||18||152,841||57,509||2,161,909|