Skip to content

austin-agronick/CSC211-language-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Design Document for CSC 211 Final Project: Language Detection Daniel Aleksa [email protected] Austin Agronick [email protected]

Problems to Solve: Milestone 1: -Given a string of text, create a vector of integers that stores the frequency of each trigram

Milestone 2: -Create a compile script -Implement file IO -Implement the similarity formula -Distinguish which file's frequency profile is most similar to test file's

Classes Needed: Milestone 2: -LANG class that represents the language data -private instance variable for language data of type std::string -Need a constructor method to create an object from a Language filename -Needs accessors for the instance variables on which frequency evaluations need to be done

Other Functions: Milestone 1: -getFrequency() to get a frequency profile -takes string of language data and starting at first index, incrementing by one and finding the trigram value for every 3 characters -add one to the frequency counter vector at that trigram value index

Milestone 2:
-getData() to convert the language file to a string
    -Increment through language file adding each character to a string with the exception of newline characters
-compareFrequencies() to compare language object's trigram frequencies
    -Implement the cosine similarity formula to determine which training file's frequency profile is most similar to the given test file's

Files Needed: -language.h and language.cpp for LANG -main.cpp to test for language detection

Libraries Needed: -vector -string -iostream -ifstream -cmath

Compile Script -Will need to compile language.cpp and main.cpp -Will need to make sure not to optimize code before debugging by using the flag -g and not -02

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published