Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: classes and subclasses #4

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

ElmoGeeraerts
Copy link

Hi Mariana,

I started on the assignment. The idea is to write a script that helps Translation Project Managers add vendors to a local database and keep a clear overview of preferred/back-up/potential classes. I have created a class and two subclasses

Best regards,
Elmo

vendor_data.py Outdated
VendorMail should be a string, if not a TypeError is raised.
TargLang should be a string, if not a TypeError is raised.
WordRate should be a float, if not a TypeError is raised.
WordRate should be > 0.15, if not a ValueError is raised.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WordRate should NOT be > 0.15, right?

vendor_data.py Outdated
TargLang should be a string, if not a TypeError is raised.
WordRate should be a float, if not a TypeError is raised.
WordRate should be > 0.15, if not a ValueError is raised.
CatTool should be a string, if not a TypeError is raised.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can probably have a limited range of options, right?

Copy link
Owner

@montesmariana montesmariana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibility of using input():

done = "no"
name = ""
feeling = "super sad"
while done.lower() != "yes":
    new_name = input(f"What's your name? (Previous value: {name}) ")
    name = new_name if new_name else name
    feeling = input(f"What are you feeling? (Previous value: {feeling}) ")
    done = input("Are you done? ")
print(f"{name} says: '{feeling}'")

@ElmoGeeraerts ElmoGeeraerts marked this pull request as draft May 21, 2023 16:39
vendor_data.py Outdated
self.VendorMail = VendorMail
self.CatTool = CatTool

regex_mail = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$" #regex used to validate email
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this? It might need an r before the string to make it a raw string.

vendor_data.py Outdated
raise TypeError("TargLang should be a string!")
if type(WordRate) != float:
raise TypeError("WordRate should be a float!")
if WordRate:
Copy link
Owner

@montesmariana montesmariana May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't have WordRate, line 64 is already going to fail. In any case, WordRate isn't optional, so there will always be WordRate...

vendor_data.py Outdated
raise TypeError("WordRate should be a float!")
if WordRate:
if WordRate > 0.15:
raise ValueError("This vendor is too expensive, consider another one.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error or a warning?
An error stops execution and the vendor cannot be added. A warning lets you add the vendor but just warns the user that something may be wrong. I ask this because the phrasing of the message "consider another one" is more warning-like than error-like.

"WordRate" (float): Vendor's word rate for new words in Euro
"CatTool" (str): Vendor's preferred CAT-tool
"""
self.VendorName = VendorName
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning the arguments to the attributes should be done AFTER validation.

Copy link
Owner

@montesmariana montesmariana May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or during, which is what you are doing at the validation.)

vendor_data.py Outdated
if type(VendorMail) != str:
raise TypeError ("VendorMail should be a string!")
if VendorMail:
if(re.search(regex_mail,VendorMail)):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just: if type(VendorMail) should be inside the if VendorMail part, because it fails if there is no VendorMail)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now that I see a bit below, lines 78-81 can be replaced with self.set_mail(VendorMail), and line 45 can be deleted.

vendor_data.py Outdated
if CatTool in VendorData.CatTools:
self.CatTool = CatTool
else:
raise ValueError("This CAT tool is not valid. Run 'VendorData.CatTools' to check options.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

vendor_data.py Outdated
df.to_excel(f"{filename}.xlsx", index=False, sheet_name="VendorData")
else:
with pd.ExcelWriter(f"{filename}.xlsx", mode="a", engine="openpyxl", if_sheet_exists="overlay") as writer:
df.to_excel(writer, sheet_name = "VendorData", startrow=writer.sheets["VendorData"].max_row, header = None, index=False)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
I would split lines 109 and 110 in different lines like you did with the dictionary right above.
And I would approach this mergin in a different, safer way. (To talk in person)

parser.add_argument("--WordRate", type=float, help="The vendor's word rate in Euro")
parser.add_argument("--VendorMail", type=str, help="The vendor's email address")
parser.add_argument("--CatTool", type=str, help="The CAT-tool the vendor wants to use")
args = parser.parse_args()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create a flag argument with the action attribute: https://docs.python.org/3/library/argparse.html#action
For example:

parser.add_argument('-a', '--add', action='store_true')

Then args.add is True if the user calls -a or --add and False otherwise.

vendor_data.py Outdated
raise ValueError("Word rate cannot be 0.00.")
raise ValueError("This vendor is too expensive, pick another one.")
if WordRate == 0.00:
raise ValueError("Word rate cannot be 0.00.")
else:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else is aligned with if WordRate, meaning that self.WordRate = WordRate only runs if you don't have WordRate.

You should actually remove the else and leave self.WordRate = WordRate aligned with the ifs in lines 77-81.
That way, the logic goes like so:

  • if there is a value for WordRate:
    • if it is not a float, throw an error
    • (otherwise) if it is larger than 0.15, throw an error
    • (otherwise) if it is equal to 0, throw an error
    • (otherwise) assign its value to self.WordRate
      Because the if statements throw an error, you don't need elif or else, you can just assume that if there was no error it will move forward.

README.md Outdated
The repository contains:
- The `README.md` file which describes the repository and illustrates how to use the code as a module and by running the script;
- The `tutorial.md` and `tutorial.ipynb` files which show how to actually use the code and the different functionalities;
- Two `xlsx` files which were generated while creating the `tutorial.ipynb` file. Be aware that if you run the code in the tutorial notebook, some data will be duplicated in the excel if you do not change anything. To avoid this you can delete the two excel files from the directory or you can change the data in the tutorial code to fit your needs.;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add a line of code yourself to make sure the file is deleted when it should be:

import os
os.remove('path/to/the/file.xlsx')

README.md Outdated
```

Or in Jupyter notebook with:

```python
%run vendor_data.py <example.json>
%run vendor_data.py <"-a/--add or -m/--modify">
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both cases, because you have two mutually-exclusive arguments, I would make to chunks to show. That way, a user can copy your code. So: one line showing how to call your script to add an entry, and another line showing how to call your script to modify an entry.

In both cases `example.json` stands for the `filename` argument that the script needs. You can use [the file in this repository](example.json) or a similar file of yours. Find more information on how this script works with:
Running vendor_data.py without either the argument -a (to add a new vendor) or -m (to modify an existing vendor) will not do anything. The argument -a prompts the user to add a new vendor. The data is provided by answering some questions regarding the vendor and the project. The argument -m prompts the user to modify an existing vendor. Again the data is provided by answering some questions regarding the vendor and the project.

For more information you can run the command below, or check in the notebook tuturial.ipynb in this repository under "Running `vendor_data.py`".
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good!

1. Easily create an excel file with the project name and the source language as filename;
2. Write the following data to the excel file: the target language and the translator's name, e-mail, word rate, preferred CAT tool and the status of the translator;
3. Modify the following data for the vendors already in the excel file: e-mail, word rate, preferred CAT tool and status.
The main advantage of this script is that it limits the posibilities of CAT tools and statuses and that it validates e-mail and word rate (betw. 0.01 and 0.15). This prevents the excel file from becoming messy when different people are working on the project.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXCELLENT

3. Modify the following data for the vendors already in the excel file: e-mail, word rate, preferred CAT tool and status.
The main advantage of this script is that it limits the posibilities of CAT tools and statuses and that it validates e-mail and word rate (betw. 0.01 and 0.15). This prevents the excel file from becoming messy when different people are working on the project.

## What this script CANNOT do
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good caveat

vendor_data.py Outdated
import os #os package to check if files exists
import pyinputplus as pyip #pyinputplus package to facilitate providing arguments

#4 methods to make validation easier
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outside of a class: function
inside of a class: method

raise TypeError("WordRate should be a float!")
if WordRate > 0.15:
raise ValueError("This vendor is too expensive, pick another one.")
if WordRate == 0.00:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it's -0.04?

vendor_data.py Outdated
if type(CatTool) != str:
raise TypeError("CatTool should be a string!")
if not CatTool in ["XTM", "Trados Studio", "MemoQ", "Memsource"]:
raise ValueError("This CAT tool is not valid. Run 'VendorData.CatTools' to check options.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation

vendor_data.py Outdated
"""
if type(CatTool) != str:
raise TypeError("CatTool should be a string!")
if not CatTool in ["XTM", "Trados Studio", "MemoQ", "Memsource"]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid repetition (if you wanted to add/remove a CatTool, now you have to do it in two places) either provide the list of CatTools as an argument of this function or call it as VendorData.CatTools.

self.WordRate = WordRate
self.VendorMail = VendorMail
self.CatTool = CatTool
self.Preferred = Preferred
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole __init__() looks much better and cleaner now :)

vendor_data.py Outdated
raise ValueError("Invalid index, run 'self.ReadExcel()' to see options")
if Key == "E-mail": #validate e-mail address if a value for the key E-mail is modified
CheckVendorMail(NewValue)
if Key == "Word Rate": #validate word rate if a value for the key Word Rate is modified
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the Key ==... checks, I would chain them with elif. Otherwise, it will check if Key == "Word Rate" even if it already say that it is "E-mail"

Vndr=VendorData(ProjectName, SourceLang, TargLang, VendorName, VendorMail, WordRate, CatTool, Preferred)
Excel = pyip.inputStr("Do you want to add this vendor to the excel file? ") #user gets the chance to write data to excel
if Excel.lower() == "yes":
Vndr.ToExcel() #data written to excel
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something like:

    print("Vendor was added.")
else:
    print("Action cancelled.")

So the user has confirmation of what happened

vendor_data.py Outdated
else:
raise ValueError(f"There is no file for {self.ProjectName} in {self.SourceLang}")

def ModExcel(self, Key, Index, NewValue):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job with this method.
BUT. This means that if a user has changed two or three values in an entry, you will open and close the Excel file two or three times, one for each value that you want to change.
If you think that users will not change more than one thing at a time, it's ok, but it might make more sense to collect all the new values and update them in one opening-closing of the file.

tutorial.md Outdated

```python
import sys
sys.path.append ('vendor_data.py') #this is the relative path to the script. If the script is not directly next to the notebook, it could be better to copy the full path of the notebook.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, you don't add the path TO the script, but to the folder where the script lives

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It seems to work anyways, but normally you append the directory)

tutorial.md Outdated

### Docstrings
The script is completely documented with docstrings in order to ensure that the script is correclty used.
You can read these docstrings by calling one of the following functions:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not leave this output, it is too much automatic text for a tutorial.
You can show the code without running it by adding it inside the Markdown cell, with ```python right before and ``` afterwards.
For the tutorial, think of what you would like to find as a user that doesn't know about the module and is just starting. If anything, this kind of "here you have more information" could go at the end, but starting a tutorial with this can be a bit offputting for a user.

tutorial.md Outdated



Some arguments had default arguments. When we check the e-mail for VendorNL, for which we didn't provide one, we'll see that it does not return anything:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some arguments have default values.

I wouldn't say an empty string is a default value from the user's perspective. It's rather allowed to be empty.




The value for the argument Preferred can be True, False or None. This has an influence of the vendor's status.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good!

tutorial.md Outdated



### Functions
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions inside a class are called "methods".

tutorial.md Outdated



We can now write the data for the vendors to an excel file. Since every vendor has the value for the argument `ProjectName` and `SourceLang`, they will be written to the same excel file. We should execute this function for every vendor. Be careful because if you run this function multiple times for the same vendor, it will not overwrite but append and you will have duplicates in the excel file.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since every vendor has the same value...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that you can generate duplicates is a weakness in the code.
You're doing a great job, so if you feel that you cannot fix this yet (you could, by checking for the existence of an entry before adding it), at least add it to the list of things that could/should be improved in the future.
You could also make this very explicit in the markdown:

<div class="alert alert-danger">
If you run `.toExcel()` multiple times on the same vendor, it will generate multiple entries!
</div>

Which generates the following:
image



```python
vd.VendorData.Keys
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good!



```python
VendorES.CatTool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm I find this counterintuitive. I would have the code update the class itself as well.

Copy link
Owner

@montesmariana montesmariana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants