-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sai: Lowercase converter in Python #190
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I had a hard time finding anything to improve on!
S23/b-sai/main.py
Outdated
if letter == 'I': | ||
lower_letter = 'ı' | ||
elif language.startswith(('gd', 'gv', 'ga')): | ||
if idx == 1 and (letter in ['A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú', "Ó"] or ord(letter) in [211]) and word[0] in ['n', 't'] and (len(word)-idx >= 2 and ord(word[idx+1]) != 771): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this conditional statement is a little long - it took me a minute to tease out what it's saying. Consider breaking it up on multiple lines or adding some inline documentation to explain.
S23/b-sai/main.py
Outdated
elif language.startswith('el'): | ||
if letter == 'Σ' and idx == len(word)-1: | ||
lower_letter = 'ς' | ||
elif language.startswith(("zh", "th", "ja")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not going to save a ton of time, but if you aren't making any changes to the letters in these languages -- you don't need to loop through each character. Consider moving this conditional to the top of your loop and exiting early.
""" | ||
result = "" | ||
|
||
if language.startswith(("zh", "th", "ja")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3-letter language code permitted in BCP-47, so "startswith" won't work here, e.g. "jam" is "Jamaican Creole English".
result = "" | ||
|
||
if language.startswith(("zh", "th", "ja")): | ||
return word.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the point was in these cases to not bother calling lower() as an optimization.
if language == 'tr' or language == 'az': | ||
if letter == 'I': | ||
lower_letter = 'ı' | ||
elif language.startswith(('gd', 'gv', 'ga')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above; this won't work.
is_2nd_letter = idx == 1 | ||
is_exception_letter = letter in [ | ||
'A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú', "Ó"] | ||
is_letter_o_latin = ord(letter) in [211] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic numbers, here and 771 below! Unreadable.
word, language, actual = test.split("\t") | ||
predicted = to_lowercase(word, language) | ||
if predicted != actual: | ||
print(f"COuldn't convert {word} in {language}!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo.
is_letter_o_latin = ord(letter) in [211] | ||
is_beginning_exception = word[0] in ['n', 't'] | ||
is_not_last = len(word)-idx > 1 | ||
if is_2nd_letter and (is_exception_letter or is_letter_o_latin) and is_beginning_exception and (is_not_last and ord(word[idx+1]) != 771): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Unicode business needs a bit more work; will discuss in class.
No description provided.