Welcome to the Telugu Natural Language Processing (NLP) Project! This open-source initiative is dedicated to developing resources and tools for language learners, teachers, linguists, and researchers in the Telugu language community. Our mission is to enhance the capabilities and accessibility of NLP technologies for Telugu.
To create a comprehensive database of publicly available linguistic resources for Telugu, including dictionaries, grammar guides, texts, audio-visual materials, and research papers, and ensure they are accessible and properly formatted for NLP applications.
To develop a range of interactive educational tools, such as language learning apps, grammar checkers, and writing aids, specifically designed to support and enhance the learning experience for Telugu learners and educators.
To significantly improve the machine translation capabilities for Telugu, facilitating better translation both from and into other languages, thus enhancing communication and understanding.
To provide advanced tools and comprehensive datasets for linguistic research, including sophisticated corpus analysis tools and extensive databases, to document and analyze the unique linguistic features of Telugu.
To leverage the platform to actively promote and preserve the Telugu language and culture on a global scale, thereby contributing to its broader appreciation and recognition.
These goals aim to foster a vibrant ecosystem around the Telugu language, enabling better learning, research, and cultural exchange.
- Telugu Dictionary Words by Anusha Motamarri.
- Telugu Newspaper Article Dataset by Anusha Motamarri.
- Telugu News Articles by Shubham Jain.
- Telugu Books Dataset by Anusha Motamarri.
- Telugu Wikipedia Dataset by Shubham Jain.
- Parallel Corpus for Indian Languages by Kartik.
- Indic NLP Catalog by AI4Bharat.
- Indic Tagger (Indian Language Tagger) by Avinesh PVS.
- All words in all languages - Telugu by Eymen Efe Altun.
- Thousand most common Telugu words by Samuel Menigat
- Likitham - Repo containing scripts and datasets for processing Telugu language data by Chillar Anand.
- LOIT (Lot of Indic Tweets) by Bedapudi Praneeth
- English Telugu Bilingual Sentence Pairs by Sai Kumar Yava. English-Telugu-Bilingual-Sentence-Pairs dataset contains English sentances translated into Telugu language and it has total 155798 sentences.
- Telugu Terms by etymology by Wiktionary
- Word Level Language Identification in English Telugu Code Mixed Data
- EN-TE Transliteration Dataset
- Code Switched Papers
- A Tale of Two Languages: The Code Mixing story by Arindam Chatterjee
1. [en_te_wiki_titles](https://github.com/notAI-tech/Datasets/releases/download/En-Te_Transliteration/v1.en_te_wiki_titles.txt):
contains 13,811 word en-te pairs, generated from Wikipedia by comparing titles of parallel articles.
2. [ni_bondha_comments](https://github.com/notAI-tech/Datasets/releases/download/En-Te_Transliteration/v1.ni_bondha_comment_words.txt):
contains 24,757 word en-te pairs.
The english versions of telugu words are obtained from the subreddit [r/Ni_Bondha](https://www.reddit.com/r/Ni_Bondha/).
The corresponding telugu words are obtained by ranking transliterations of the subreddit comments from multiple models and APIs,
using a [flair](https://github.com/zalandoresearch/flair) based character lm trained on Telugu text.
Please note that english words are not lower-cased in this data. Since the english words are human written, we decided to retain the capitalization information in this release. Only punctuation was removed.
- IndicASR (Speech Recognition for Indian Languages) by notAI-tech.
- Topic Modeling/Extraction for Telugu articles by Nirupam Purushothama - Medium (Topic Modeling — 2: Performing LDA on Telugu (తెలుగు) Articles).
- Telugu Text classification — Part 1 by Pradeep Miriyala.
- The Banti Framework (Comprehensive OCR System for Telugu Language)
- Telugu Tokenizer and Stemmer by chraghavendra
- Telugu Language — Lemmatization & POS Tag Extraction by Nirupam Purushothama
- Sangita. A Natural Language Toolkit for Indian Languages. (currently supports only Hindi).
- Program for tokenizing Indian language inp by Anoop Kunchukuttan.
- Language Modeling for (తెలుగు) Telugu by Karthik Uppuluri.
- Telugu Experiments by Karthik Uppuluri
- Telugu Language Research Project by Luke Carlson.
- NLP for Telugu by Shubham Jain.
- TTD Selenium Crawler by Pradeep Miriyala.
- Telugu POS by Pradeep Miriyala.
- Deep Learning Language Model for Telugu Corpus using LSTM by Akanksha Telagamsetty.
- advertools
- UGC-NET/JRF Code 103 Indian Knowledge System (IKS) Syllabus by Heera Samvaya.
- Sentiment Analysis of Twitter Data using NLTK in Python.
- Text Analytics with Python.
- Project Chalam - Telugu Books.
- Memorize - Code and real data for "Enhancing Human Learning via Spaced Repetition Optimization", PNAS 2019
- మాతృభాషే ఎందుకు?
- The digital language divide
- Creative Writing and Translation - An interdisciplinary approach
- Large language models: A guide on its benefits, use cases, and types
- Bhasha - MT.
- Tirumala Tirupati Devasthanams - TTD E-books.
- archive.org - Telugu : Books by Language.
- Free Gurukul - ఉచిత గురుకుల విద్య ఫౌండేషన్.
- స్తోత్రనిధి - To collect sanskrit stotras and translate them to Telugu.
- ai4bharat
- Areas
- Translation
- Transliteration
- Speech Recognition
- Language Understanding
- Language Generation
- Sign Language
- Text to Speech
- Shoonya
- Chitralekha
- Anuvaad
- Applications
- SHOONYA - https://ai4bharat.iitm.ac.in/shoonya/
- Chitralekha - https://ai4bharat.iitm.ac.in/chitralekha/
- Anuvaad - https://ai4bharat.iitm.ac.in/anuvaad/
- Data Collection
- Models
- Areas
- https://niceorg.in/
- హార్ట్ ఫుల్ నెస్ - ప్రేమతో పురోగమనం - ప్రేమపూర్వక సంభాషణ
- సైకో థెరపీ అంటే ఏమిటి?
- కళ ఆధారిత అభ్యాసన
- FDR - Foundation for Democratic Reforms
- వ్యవసాయం - నష్టాల బాట నుంచి లాభాల బాటలోకి - Dr. జయప్రకాష్ నారాయణ
- ప్రధాన మంత్రి మైక్రో ఫుడ్ ప్రాసెసింగ్ ఎంట్రప్రెస్స్ స్కీం
- Charaka Samhita Vimanasthanamu
- Farming a Few Acres of Herbs: An Herb Growers Handbook
- Herbal Gardens
- Natural Farming: Oriental Herbal Nutrient
- Ayurveda Offering - Herbal Healing
- Ayurvedic Business Opportunity in India
- The ayurvedic medicine industry: Current status and sustainability
- Exploring Potential for Medicinal Plants Cultivation in Telangana
- Self-employment - agcas
- Bizup - Self-employment skills for young people
- We are all self-employed
- The Skills You Need Guide to SELF-EMPLOYMENT AND RUNNING YOUR OWN BUSINESS
- Self-employment, Small Firms and Enterprise
- The Handbook of Research on Freelancing and Self-Employment
-
Enhancing human learning via spaced repetitionoptimization - Behzad Tabibiana,b,1, Utkarsh Upadhyaya, Abir Dea, Ali Zarezadea, Bernhard Sch ̈olkopfb, and Manuel Gomez-RodriguezaaNetworks Learning Group, Max Planck Institute for Software Systems, 67663 Kaiserslautern, Germany; andbEmpirical Inference Department, Max PlanckInstitute for Intelligent Systems, 72076 T ̈ubingen, GermanyEdited by Richard M. Shiffrin, Indiana University, Bloomington, IN, and approved December 14, 2018 (received for review September 3, 2018)
- Memorize - Code and real data for "Enhancing Human Learning via Spaced Repetition Optimization", PNAS 2019
-
Hindi Shabdamitra - A Wordnet based Tool for Enhancing Teaching-Learning Process by Hanumant Redkar, Nilesh Joshi, Sayali Khare, Lata Popale, Malhar Kulkarni and Pushpak Bhattacharyya - Center for Indian Language Technology, Indian Institute of Technology Bombay, India.
-
Hindi Shabdamitra - A Wordnet based E-Learning Tool for Language Learning and Teaching by Hanumant Redkar, Sandhya Singh, Meenakshi Somasundaram, Dhara Gorasia, Malhar Kulkarni and Pushpak Bhattacharyya - Center for Indian Language Technology, Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai, India.
-
Using Parallel Corpora for Language Learning by Michael H. Brown, Kanda Institute of Foreign Languages in Tokyo, Japan.
- (another aricle) Language Learning via Parallel Corpora.
-
Learning in Parallel: Using Parallel Corpora to Enhance Written Language Acquisition at the Beginning Level by Brody Bluemel, The Pennsylvania State University.
-
CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences by Devansh Gautam, Prashant Kodali, Kshitij Gupta, Anmol Goel, Manish Shrivastava, Ponnurangam Kumaraguru - International Institute of Information Technology Hyderabad & Indraprastha Institute of Information Technology Delhi & Guru Gobind Singh Indraprastha University, Delhi
-
The LTRC Hindi-Telugu Parallel Corpus by Vandan Mujadia, Dipti Misra Sharma, MT-NLP Lab, LTRC, KCIS, IIIT-Hyderabad, India.
-
Authors: Sofie Kastelli, Napsugár Takács Supervisor: Ulf Linnman Field of research: Informatics Date: 1st of June 2023 Jönköping University 2023
-
The Duolingo Method for App-based Teaching and Learning by Cassie Freeman, Audrey Kittredge, Hope Wilson, and Bozena Pajak - Duolingo Research Report
-
A Novel Approach to Telugu Stemming Using N-gram Process by N.V. Ganapathi Raju (Associate Professor, Dept. of CSE, GRIET, Hyderabad, INDIA.), Chinta Someswara Rao (Assistant Professor, Dept. of CSE, SRKR Engineering College, Bhimavaram, INDIA.) and G. Meghana (P.G. Scholar, GRIET, Hyderabad, INDIA).
-
Telugu OCR Framework using Deeplearning by Rakesh Achanta, and Trevor Hastie - Stanford University.
-
Building specialised corpora for translation studies by Sattar Izwaini, Centre for Computational Linguistics, UMIST, PO Box 88, Manchester M60 1QD, UK.
-
Building parallel corpora for eContent professionals by M. Gavrilidou, P. Labropoulou, E. Desipri, V. Giouli, V. Antonopoulos, S. Piperidis, Institute for Language and Speech Processing.
-
Text Simplification - Building a Monolingual Parallel Corpus for Text Simplification Using Sentence Similarity Based on Alignment between Word Embeddings
-
Text Summarisation - Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation
-
Co-Writing Screenplays and Theatre Scripts with Language Models.
-
Identifying Context-Dependent Translations for Evaluation Set Production
Rachel Wicks 1,2 and Matt Post 1−3 1. Human Language Technology Center of Excellence, Johns Hopkins University 2. Center of Language and Speech Processing, Johns Hopkins University 3. Microsoft [email protected], [email protected]
-
Enabling Code-Mixed Translation: Parallel Corpus Creation and MT Augmentation Approach
By: 1. Mrinal Dhar, IIIT Hyderabad, Gachibowli, Hyderabad, Telangana, India 2. Vaibhav Kumar, IIIT Hyderabad, Gachibowli, Hyderabad, Telangana, India 3. Manish Shrivastava, IIIT Hyderabad, Gachibowli, Hyderabad, Telangana, India
-
Sentiment Analysis in Code-Mixed Telugu-English Text with Unsupervised Data Normalization
By: 1. Kusampudi Siva Subrahamanyam Varma, Language Technologies Research Centre, IIIT Hyderabad, India. 2. Preetham Sathineni, Language Technologies Research Centre, IIIT Hyderabad, India. 3. Radhika Mamidi, Language Technologies Research Centre, IIIT Hyderabad, India.
-
By K. Parameswari, Centre for Applied Linguistics and Translation Studies, University of Hyderabad
-
A Rule-based Dependency Parser for Telugu: An Experiment with Simple Sentences
By: 1. SANGEETHA P., PARAMESWARI K. 2. AMBA KULKARNI
-
Computational Morphology for Telugu
By: 1. B. Srinivasu, Department of Computer Science and Engineering, Stanley College of Engineering and Technology for Women, Hyderabad 500001, India 2. R. Manivannan, Department of Computer Science and Engineering, Stanley College of Engineering and Technology for Women, Hyderabad 500001, India
-
Neural Dependency Parsing of Low-resource Languages: A Case Study on Marathi
-
Telugu dependency parsing using different statistical parsers
-
Two-stage Approach for Hindi Dependency Parsing Using MaltParser
-
Hindi Dependency Parsing using a combined model of Malt and MST
-
Ensembling Various Dependency Parsers: Adopting Turbo Parser for Indian Languages
-
TeSum: Human-Generated Abstractive Summarization Corpus for Telugu
-
The Use of Corpora in Translation Into the Second Language: A Project-Based Approach
-
The Duolingo Method for App-based Teaching and Learning(whitepaper)
-
A CONCEPTUAL FRAMEWORK FOR INTERACTIVE CARTOGRAPHIC STORYTELLING
-
How to build a controllable writing assistant for novel authors - Use Transfer Learning and OpenAI GPT2 to build a state-of-the-art text generation tool, embedded in an open source interface
-
Unsupervised Stemming based Language Model for Telugu Broadcast News Transcription
-
The Linguacuisine Project: A Cookingbased Language Learning Application
-
HOW TO USE YOUR LOVE OF COOKING AND FOOD TO LEARN A NEW LANGUAGE