-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sai: Lowercase converter in Python #190
base: master
Are you sure you want to change the base?
Changes from all commits
341c74c
84c113a
3aefce8
702f028
5c4a138
ea00c69
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
|
||
def to_lowercase(word: str, language: str): | ||
""" | ||
word: str, the string to be converted to lowercase | ||
language: str, the language of the string, in BCP 47 format | ||
""" | ||
result = "" | ||
|
||
if language.startswith(("zh", "th", "ja")): | ||
return word.lower() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And the point was in these cases to not bother calling lower() as an optimization. |
||
|
||
for idx, letter in enumerate(word): | ||
|
||
lower_letter = letter.lower() | ||
if language == 'tr' or language == 'az': | ||
if letter == 'I': | ||
lower_letter = 'ı' | ||
elif language.startswith(('gd', 'gv', 'ga')): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same issue as above; this won't work. |
||
is_2nd_letter = idx == 1 | ||
is_exception_letter = letter in [ | ||
'A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú', "Ó"] | ||
is_letter_o_latin = ord(letter) in [211] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Magic numbers, here and 771 below! Unreadable. |
||
is_beginning_exception = word[0] in ['n', 't'] | ||
is_not_last = len(word)-idx > 1 | ||
if is_2nd_letter and (is_exception_letter or is_letter_o_latin) and is_beginning_exception and (is_not_last and ord(word[idx+1]) != 771): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Unicode business needs a bit more work; will discuss in class. |
||
lower_letter = "-"+letter.lower() | ||
elif language.startswith('el'): | ||
if letter == 'Σ' and idx == len(word)-1: | ||
lower_letter = 'ς' | ||
|
||
result += lower_letter | ||
|
||
return result | ||
|
||
|
||
with open("tests.tsv", "r", encoding="utf-8") as f: | ||
tests = f.read().splitlines() | ||
|
||
num_correct = 0 | ||
for test in tests: | ||
word, language, actual = test.split("\t") | ||
predicted = to_lowercase(word, language) | ||
if predicted != actual: | ||
print(f"COuldn't convert {word} in {language}!") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Small typo. |
||
print(f"Actual: {actual}") | ||
print(f"Predicted: {predicted}") | ||
else: | ||
num_correct += 1 | ||
|
||
print(f"Successfully completed {num_correct}/{len(tests)} tests") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
### b-sai lowercase converter | ||
|
||
This is a simple tool to convert uppercase letters to lowercase letters in a text file in any language | ||
|
||
To run the python script simply run ```python3 main.py``` from the S23/b-sai/ directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3-letter language code permitted in BCP-47, so "startswith" won't work here, e.g. "jam" is "Jamaican Creole English".