Multilingual corpus

来源:互联网 发布:淘宝助手mac版下载 编辑:程序博客网 时间:2024/05/18 14:12

         Multilingual corpus is a collection of text in electronic form (written language corpus) where texts in different languages are put together either based on parallelism or comparability. Multilingual corpus constructed based on parallelism and comparability are known as parallel corpus and comparable corpus, respectively. Parallel corpus can be developed using overt translation or covert translation. The overt translation posses a directional relationship between the pair of texts in two languages, which means texts in language A (source text) is translated into texts in language B (translated text) (Rose, 1981). The covert translation is non-directional. Multilingual documents expressing the same content in different languages are generated by the same source (Leonardi, 2000). Therefore, none of the text in each pair of such parallel corpus is marked as translated text or source text. 

原创粉丝点击