• Corpus ID: 6315414

A Simple Approach to Use Bilingual Information Sources for Word Alignment

@article{EsplGomis2012ASA,
  title={A Simple Approach to Use Bilingual Information Sources for Word Alignment},
  author={M. Espl{\`a}-Gomis and Felipe S{\'a}nchez-Mart{\'i}nez and Mikel L. Forcada},
  journal={Proces. del Leng. Natural},
  year={2012},
  volume={49},
  pages={93-100},
  url={https://api.semanticscholar.org/CorpusID:6315414}
}
In this paper we present a new and simple method for using sources of bilingual information for word alignment between parallel segments of text. This method can be used "on the fly", since it does

Figures and Tables from this paper

Bilingual phrase-to-phrase alignment for arbitrarily-small datasets

A novel system for sub-sentential alignment of bilingual sentence pairs, however few, using readily-available machine-readable bilingual dictionaries is presented, showing results that are a considerable improvement on a comparable system and on GIZA++ performance for the same corpus.

Using external sources of bilingual information for word-level quality estimation in translation technologies

Experiments conducted in the translation of Spanish texts into English show that this approach is able to predict which target words have to be changed or kept unedited with an accuracy above 94% for fuzzy-match scores greater or equal to 60%.

Online Word Alignment for Online Adaptive Machine Translation

The application of popular state-of-the-art word aligners to this scenario and their poor performance in aligning unknown words are discussed and a fast procedure to refine their outputs and to get more reliable and accurate alignments for unknown words is proposed.

Translation Alignment and Extraction Within a Lexica-Centered Iterative Workflow

The methods proposed in this thesis were designed to take advantage of knowledge accumulated in human-validated bilingual lexica and translation tables obtained by unsupervised methods to revisit the alignment and extraction problems in the context of a lexica-centered iterative workflow that includes human validation.

Using Machine Translation to Provide Target-Language Edit Hints in Computer Aided Translation Based on Translation Memories

This paper explores the use of general-purpose machine translation (MT) in assisting the users of computer-aided translation (CAT) systems based on translation memory (TM) to identify the target

Online Chinese-Vietnamese Bilingual Topic Detection Based on RCRP Algorithm with Event Elements

A Chinese-Vietnamese bilingual topic model based on the Recurrent Chinese Restaurant Process and integrated with event elements is proposed, which achieves a good effect on topic detection.

Ranking suggestions for black-box interactive translation prediction systems with multilayer perceptrons

This paper proposes a more principled suggestion ranking approach using a regressor (a multilayer perceptron) that achieves significantly better results.

Black-box interactive translation prediction

En un mundo globalizado como el actual en el que, ademas, muchas sociedades son inherentemente multilingues, la traduccion e interpretacion entre diversas lenguas requiere de un esfuerzo notable

Finding Terminology Translations from Non-parallel Corpora

We present a statistical word feature, the Word Relation Matrix, which can be used to find translated pairs of words and terms from non-parallel corpora, across language groups. Online dictionary

Translating Named Entities Using Monolingual and Bilingual Resources

A novel algorithm for translating named entity phrases using easily obtainable monolingual and bilingual resources is presented and evaluation of this algorithm in translating Arabic named entities to English is reported on.

Automatic Identification of Word Translations from Unrelated English and German Corpora

The current study, based on the assumption that there is a correlation between the patterns of word co-occurrences in corpora of different languages, makes a significant improvement to about 72% of word translations identified correctly.

Guidelines for Word Alignment Evaluation and Manual Alignment

Standard scoring metrics for full text alignment and explanations on how to use them better are reviewed, and it is shown that the ratio between ambiguous and unambiguous links in the reference has a great impact on scores measured with these metrics.

Europarl: A Parallel Corpus for Statistical Machine Translation

A corpus of parallel text in 11 languages from the proceedings of the European Parliament is collected and its acquisition and application as training data for statistical machine translation (SMT) is focused on.

Robust Bilingual Word Alignment for Machine Aided Translation

Because word_align and char_align were designed to work robustly on texts that are smaller and more noisy than the Hansards, it has been possible to successfully deploy the programs at AT&T Language Line Services, a commercial translation service, to help them with difficult terminology.

HMM-Based Word Alignment in Statistical Translation

A new model for word alignment in statistical translation using a first-order Hidden Markov model for the word alignment problem as they are used successfully in speech recognition for the time alignment problem.

UAlacant: Using Online Machine Translation for Cross-Lingual Textual Entailment

A new method for cross-lingual textual entailment (CLTE) detection based on machine translation (MT) is described, which presented to the SemEval 2012 task 8 obtaining an accuracy up to 59.8% on the English-Spanish test set, the second best performing approach in the contest.

Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited

Experiments conducted in the translation of Spanish texts into English show that this approach is able to predict which target words have to be changed or kept unedited with an accuracy above 94% for fuzzy-match scores greater or equal to 60%.

Moses: Open Source Toolkit for Statistical Machine Translation

We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c)