chk Jun 21

RomanKoshkin · Jun 21, 2024 · db7fee4 · db7fee4
1 parent 384c8ab
commit db7fee4
Show file tree

Hide file tree

Showing 2 changed files with 20 additions and 5 deletions.
diff --git a/data/projects.mdx b/data/projects.mdx
@@ -16,7 +16,10 @@ title: Projects
 ## Recent
 
 <div className="grid sm:grid-cols-2 gap-6">
-	<ProjectWithBadges url="https://github.com/RomanKoshkin/transllama" title="🦙TransLLaMa" badges={["NLP", "LLMOps", "machine translation", "MLOps"]}>
+	<ProjectWithBadges url="https://github.com/RomanKoshkin/transllama" title="🔪toLLMatch" badges={["NLP", "pytorch", "machine translation", "AWS"]}>
+  		Zero-shot context-aware simultaneous machine translation leveraging an open-source ASR engine and LLM.
+	</ProjectWithBadges>
+	<ProjectWithBadges url="https://github.com/RomanKoshkin/toLLMatch" title="🦙TransLLaMa" badges={["NLP", "LLMOps", "machine translation", "MLOps"]}>
   		LLM-based simultaneous speech-to-text machine translation. Decoder-only large language models (LLMs) have recently demonstrated impressive capabilities in text generation and reasoning. Nonetheless, they have limited applications in simultaneous machine translation (SiMT), currently dominated by encoder-decoder transformers. This study demonstrates that, after fine-tuning on a small dataset comprising causally aligned source and target sentence pairs, a pre-trained open-source LLM can control input segmentation directly by generating a special "wait" token. This obviates the need for a separate policy and enables the LLM to perform English-German and English-Russian SiMT tasks with BLEU scores that are comparable to those of specific state-of-the-art baselines. We also evaluated closed-source models such as GPT-4, which displayed encouraging results in performing the SiMT task without prior training (zero-shot), indicating a promising avenue for enhancing future SiMT systems.
 	</ProjectWithBadges>
 	<ProjectWithBadges url="https://github.com/RomanKoshkin/conv-seq" title="🧠convSeq" badges={["DL", "computational neuroscience"]}>

diff --git a/data/publications.mdx b/data/publications.mdx
@@ -5,6 +5,18 @@ title: Publications & Patents
 
 ## Pre-prints
 
+<Publication
+	title="LLMs Are Zero-Shot Context-Aware Simultaneous Translators"
+	year="2024"
+	journal="arXiv"
+	issue=""
+	pageRange="2406.13476"
+	authors="Roman Koshkin, Katsuhito Sudoh & Satoshi Nakamura"
+	abstract="The advent of transformers has fueled progress in machine translation. More recently large language models (LLMs) have come to the spotlight thanks to their generality and strong performance in a wide range of language tasks, including translation. Here we show that open-source LLMs perform on par with or better than some state-of-the-art baselines in simultaneous machine translation (SiMT) tasks, zero-shot. We also demonstrate that injection of minimal background information, which is easy with an LLM, brings further performance gains, especially on challenging technical subject-matter. This highlights LLMs' potential for building next generation of massively multilingual, context-aware and terminologically accurate SiMT systems that require no resource-intensive training or fine-tuning."
+	pdf="https://arxiv.org/abs/2406.13476"
+/>
+
+
 <Publication
 	title="TransLLaMa: LLM-based Simultaneous Translation System."
 	year="2024"
@@ -16,19 +28,19 @@ title: Publications & Patents
 	pdf="https://arxiv.org/pdf/2402.04636.pdf"
 />
 
+## Peer-reviewed Publications
+
 <Publication
 	title="convSeq: Fast and Scalable Method for Detecting Patterns in Spike Data."
 	year="2024"
-	journal="arXiv"
+	journal="ICML"
 	issue=""
-	pageRange="2402.01130"
+	pageRange="PMLR 235, 2024"
 	authors="Roman Koshkin, Tomoki Fukai"
 	abstract="Spontaneous neural activity, crucial in memory, learning, and spatial navigation, often manifests itself as repetitive spatiotemporal patterns. Despite their importance, analyzing these patterns in large neural recordings remains challenging due to a lack of efficient and scalable detection methods. Addressing this gap, we introduce convSeq, an unsupervised method that employs backpropagation for optimizing spatiotemporal filters that effectively identify these neural patterns. Our method's performance is validated on various synthetic data and real neural recordings, revealing spike sequences with unprecedented scalability and efficiency. Significantly surpassing existing methods in speed, convSeq sets a new standard for analyzing spontaneous neural activity, potentially advancing our understanding of information processing in neural circuits."
 	pdf="https://arxiv.org/pdf/2402.01130.pdf"
 />
 
-## Peer-reviewed Publications
-
 <Publication
 	title="Unsupervised Detection of Cell Assemblies with Graph Neural Networks."
 	journal="ICLR"