en | es | de | gl
|
List of works
|
Team
|
Contact

About PaGeS


The Parallel Corpus German / Spanish, PaGeS , is a bilingual parallel corpus consisting of German and Spanish original and translated texts as well as a small percentage of German and Spanish translations from a third language that have been linked together sentence by sentence.They form a growing collection of fiction (roughly 90% of novels and short stories) and nonfiction (essays and popular science texts). Many of the selected books are represented not by the full texts but by samples, allowing a better cross-section of texts.

The creation of PaGes is part of a broader research project which aims at studying and analyzing the expression of spatial relations in Spanish and German. This project is carried out by the research group SpatiAlEs led by Prof. Irene Doval at the University of Santiago de Compostela, Galicia.

Even though the Corpus was created for the mentioned purpose, efforts are being made regarding interoperability and standardization in order to design a multifunctional resource able to meet the needs of diverse user groups. The main idea behind this effort is to build a representative language resource for German and Spanish that can be exploited for multiple purposes. The applications can include general research in contrastive linguistics, linguistic typology, translation studies and bilingual lexicography, as well as training automatic translation systems. The Corpus is also useful for German or Spanish learners at intermediate to advanced levels for getting a multitude of translation suggestions shown in usage examples.

At the current stage, PaGeS contains 19,017,837 words and 655.321 bisegments, i.e. pairs of aligned text chunks (sentences or smaller segments). To guarantee the quality the corpus has been manually verified at different levels, including preprocessing, sentence splitting and sentence alignment. Each text is supplied with information about author, title, year of the first publication and, when applicable, the used edition.

Statistics (Release: 31/07/2017)

LANGUAGE WORKS TYPES TOKENS
German Original 54 140,750 4,253,900
Spanish Translation < German 54 89,109 4,507,832
Spanish Original 38 99,710 3,584,908
German Translation < Spanish 38 126,225 3,564,688
German Translation <3rd language 12 49,333 1,577,794
Spanish Translation <3rd language 12 61,925 1,528,715
19,017,837
                                                              
PaGeS Vers. 1.0.0
Last updated: 30.07.2017
©SpatiAlEs
University of Santiago de Compostela
This project is funded by Spanish Ministry of Economy and Competitiveness (FFI2013-42571-P).