en | es | de | gl
|
List of works
|
Team
|
Contact

About PaGeS


The Parallel Corpus German / Spanish, PaGeS , is a bilingual parallel corpus consisting of German and Spanish original and translated texts as well as a small percentage of German and Spanish translations from a third language that have been linked together sentence by sentence.

They form a growing collection of fiction (roughly 90% of novels and short stories) and nonfiction (essays and popular science texts). Many of the selected books are represented not by the full texts but by samples, allowing a better cross-section of texts.

The creation of PaGes is part of a broader research project which aims at studying and analyzing the expression of spatial relations in Spanish and German. This project is carried out by the research group SpatiAlEs led by Prof. Irene Doval at the University of Santiago de Compostela, Galicia.

Even though the Corpus was created for the mentioned purpose, efforts are being made regarding interoperability and standardization in order to design a multifunctional resource able to meet the needs of diverse user groups. The main idea behind this effort is to build a representative language resource for German and Spanish that can be exploited for multiple purposes. The applications can include general research in contrastive linguistics, linguistic typology, translation studies and bilingual lexicography, as well as training automatic translation systems. The Corpus is also useful for German or Spanish learners at intermediate to advanced levels for getting a multitude of translation suggestions shown in usage examples.

At the current stage (November 2018), PaGeS contains ca. 25.000.000 words (the inclusion of punctuation marks and other symbols it would lead to more than 28.000.000 tokens) and 858. 470 bisegments, i.e. pairs of aligned text chunks (sentences or smaller segments).

To guarantee the quality the corpus has been manually verified at different levels, including preprocessing, sentence splitting and sentence alignment. Each text is supplied with information about author, title, year of the first publication and, when applicable, the used edition.

Statistics (Release: 15/11/2018)

LANGUAGE WORKS BISEGEMENTS TYPES WORDS
German Original 62 386,314 158,198 5,081,806
German Translation < Spanish 54 319,315 136,543 5,057,274
German Translation < 3rd language 18 152,841 74,339 2,143,959
Spanish Original 54 319,315 103,674 5,073,514
Spanish Translation < German 62 386,314 102,431 5,311,191
Spanish Translation <3rd language 18 152,841 57,509 2,161,909
Total 134 (x2) 858,470 24,829,653
                                                              
PaGeS Vers. 2.0
Last updated: 26.04.2019
ISSN 2605-5228 ©SpatiAlEs
Creative Commons Licencia Creative Commons
University of Santiago de Compostela
This project is funded by the State Research Agency (AEI) of Spanish Ministry of Science, Innovation and Universities (FFI2017-85938-R) and by the Department of Economy and Industry of the Galician Government (2017-PG023).