Using Cross-Lingual Distributional Word Vectors for Bitext Alignment

dc.contributor.advisorHerendi, Tamás
dc.contributor.authorPethő, Gergely Tamás
dc.contributor.departmentDE--Informatikai Karhu_HU
dc.date.accessioned2020-12-03T07:21:07Z
dc.date.available2020-12-03T07:21:07Z
dc.date.created2020-12-02
dc.description.abstractAfter introducing the necessary background based on a review of the literature, this paper presents a case study that examines ways in which a range of techniques that have been developed in the field of deep learning as applied to natural language processing can be used for bitext alignment. More specifically, the case study explores how various types of dense vector representations of words, popularly called word embeddings, which are usually used to approximate the meanings of words in a single language, can be utilised for bitext alignment, which has not been considered so far to my knowledge in the literature. To this end several new variants of cross-lingual lexical vector space representations are proposed. One specific goal of the case study is to examine whether cross-lingual vector space models which use subword information can improve alignment results as compared to cross-lingual embeddings that represent whole words as a discrete unit and ignore their form and internal structure. The ability of the new solutions utilising word vectors to identify corresponding segments in a bitext is measured on a small, manually checked gold standard parallel corpus. The linguistic data on which the study is based comprise several volumes of the English and Hungarian versions of the Official Journal of the European Union.hu_HU
dc.description.courseComputer Science / Programtervező informatikushu_HU
dc.description.degreeMSc/MAhu_HU
dc.format.extent121hu_HU
dc.identifier.urihttp://hdl.handle.net/2437/299192
dc.language.isoenhu_HU
dc.subjectnatural language processinghu_HU
dc.subjectword embeddingshu_HU
dc.subjectbitext alignmenthu_HU
dc.subject.dspaceDEENK Témalista::Informatikahu_HU
dc.titleUsing Cross-Lingual Distributional Word Vectors for Bitext Alignmenthu_HU
Fájlok
Gyűjtemények