Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sai: Lowercase converter in Python #190

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

b-sai
Copy link

@b-sai b-sai commented Feb 16, 2023

No description provided.

Copy link

@austin-carnahan austin-carnahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I had a hard time finding anything to improve on!

if letter == 'I':
lower_letter = 'ı'
elif language.startswith(('gd', 'gv', 'ga')):
if idx == 1 and (letter in ['A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú', "Ó"] or ord(letter) in [211]) and word[0] in ['n', 't'] and (len(word)-idx >= 2 and ord(word[idx+1]) != 771):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this conditional statement is a little long - it took me a minute to tease out what it's saying. Consider breaking it up on multiple lines or adding some inline documentation to explain.

elif language.startswith('el'):
if letter == 'Σ' and idx == len(word)-1:
lower_letter = 'ς'
elif language.startswith(("zh", "th", "ja")):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not going to save a ton of time, but if you aren't making any changes to the letters in these languages -- you don't need to loop through each character. Consider moving this conditional to the top of your loop and exiting early.

"""
result = ""

if language.startswith(("zh", "th", "ja")):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3-letter language code permitted in BCP-47, so "startswith" won't work here, e.g. "jam" is "Jamaican Creole English".

result = ""

if language.startswith(("zh", "th", "ja")):
return word.lower()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the point was in these cases to not bother calling lower() as an optimization.

if language == 'tr' or language == 'az':
if letter == 'I':
lower_letter = 'ı'
elif language.startswith(('gd', 'gv', 'ga')):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above; this won't work.

is_2nd_letter = idx == 1
is_exception_letter = letter in [
'A', 'E', 'I', 'O', 'U', 'Á', 'É', 'Í', 'Ó', 'Ú', "Ó"]
is_letter_o_latin = ord(letter) in [211]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers, here and 771 below! Unreadable.

word, language, actual = test.split("\t")
predicted = to_lowercase(word, language)
if predicted != actual:
print(f"COuldn't convert {word} in {language}!")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo.

is_letter_o_latin = ord(letter) in [211]
is_beginning_exception = word[0] in ['n', 't']
is_not_last = len(word)-idx > 1
if is_2nd_letter and (is_exception_letter or is_letter_o_latin) and is_beginning_exception and (is_not_last and ord(word[idx+1]) != 771):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Unicode business needs a bit more work; will discuss in class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants