Using BERT-trained contextual word embeddings, measure the similarity between certain relevant vocabulary words used in different subject tags. I will identify a subset of phrases or tags in both the English translation and the French text. Using these contextual embeddings, I would like to determine whether translation loss can be measured with word embeddings, and if so, what loss has been experienced in the translation of this manuscript.
Considering the level of effort that was put into the translation process and specifically the effort that was taken to maintain the translator’s voice, this project will provide a potential metric for semantic faithfulness. The intratext vocabulary comparisons will provide an additional level of insight into the author-practitioner’s lexicon for different contexts.
- Review outstanding literature on BERT and translation loss
- Review the manuscript and its translations/editions
- Google Collab with HuggingFace transformer library for BERT
- What type of preprocessing needs to be done with the text
- Narrow down examples to work with from within the manuscript
- Some textual analysis
- Develop more formal hypotheses about translation loss
- “Making and Knowing: Encoding BnF Ms. Fr. 640” https://edition640.makingandknowing.org/#/essays/ann_335_ie_19
- “Turning Turtle: The Process of Translating BnF Ms. Fr. 640” https://edition640.makingandknowing.org/#/essays/ann_318_ie_19
- “Principles of Encoding” https://edition640.makingandknowing.org/#/content/resources/principles
- “Dicitionaries” https://edition640.makingandknowing.org/#/content/resources/principles#linguistic-resources-dictionaries-and-technical-encyclopedias
- "Training Neural Machine Translation using Word Embedding-based Loss" https://arxiv.org/abs/1807.11219
- "Understanding BERT" https://www.techtarget.com/searchenterpriseai/definition/BERT-language-model#:~:text=BERT%20is%20an%20open%20source,surrounding%20text%20to%20establish%20context.
- "BERT Word Embeddings Tutorial" https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/
- https://towardsdatascience.com/google-drive-google-colab-github-dont-just-read-do-it-5554d5824228
- https://huggingface.co/flaubert/flaubert_base_uncased