Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has api_update_gloss changed? #1360

Open
rem0g opened this issue Oct 29, 2024 · 24 comments
Open

Has api_update_gloss changed? #1360

rem0g opened this issue Oct 29, 2024 · 24 comments

Comments

@rem0g
Copy link
Collaborator

rem0g commented Oct 29, 2024

I am noticing a lot of errors regarding this gloss:

While submitting request to update phonology I am getting error Database locked at HUREN-C.

WHen submitting empty phonology feature such as:

"Handshape Change": ""

The handshape change is still not modified to empty value.

@susanodd
Copy link
Collaborator

It hasn't been changed. It's still the same.

(There are some other issues about the database getting locked.)

You want the empty value to erase the value?
There might be something to do with that, I will check.
You might need to use "-" as the value. Not an empty string.

@rem0g
Copy link
Collaborator Author

rem0g commented Oct 29, 2024

Ok, using "-" works! Thank you very much.

About gloss HUREN-C;

TransactionManagementError at /dictionary/api_update_gloss/5/49249/
An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.

This is what i have sent:

{"Handedness": "N/A", "Strong Hand": "-", "Weak Hand": "B", "Handshape Change": "-", "Relation Between Articulators": "-", "Location": "-", "Contact Type": "-", "Movement Shape": "-", "Movement Direction": "-", "Repeated Movement": "", "Alternating Movement": "", "Relative Orientation: Movement": "-", "Relative Orientation: Location": "-", "Orientation Change": "-", "Virtual Object": "-", "Phonology Other": "-", "Mouth Gesture": "-", "Mouthing": "-", "Phonetic Variation": "-", "Senses": "{"en": [["hire"], ["rent, lease"]], "nl": [["huren"], ["huren"]]}"}

@susanodd
Copy link
Collaborator

susanodd commented Oct 29, 2024

Repeated Movement and Alternating Movement are supposed to be Booleans.

from the code:

if new_value not in ['true', 'True', 'TRUE', 'false', 'False', 'FALSE', 'Neutral', 'None']:

The atomic block error would be caused by other errors. It can't do a save because probably some other things are saved.
(It usually gives that kind of error if it can't do a last save because other objects have not been saved.)
Try modifying those fields to "False" to see if it goes away.

(I'll have to try to simulate your error with extra print statements to see where it is blocking.)
It could be causing that error for example if the error wasn't caught properly (bug) and then it didn't save the gloss, but then tries to update the timestamp. Probably a try-except is missing on a save. It's supposed to roll-back if an exception occurs. But that is also a problem because there are so many objects. That the Boolean field value is wrong should cause the method to report errors and not update at all.

@rem0g
Copy link
Collaborator Author

rem0g commented Oct 29, 2024

Phonology Other fields dont seem to be updated too.

Modifing the values to the ones you mentioned didnt help, it is causing only by this gloss but it could happen at other glosses too.

@susanodd
Copy link
Collaborator

susanodd commented Oct 30, 2024

Phonology Other fields dont seem to be updated too.

Modifing the values to the ones you mentioned didnt help, it is causing only by this gloss but it could happen at other glosses too.

Does your error message mention a specific line of code?

It's true that if it finds an error in the input fields, it will not do the update. (It's implemented as all or nothing, otherwise the transaction would be inconsistent.)

Do you want Handedness to be N/A ? There is the Weak Hand set, so is that Handedness 1 or is it something else?
(Some of the query code excludes Handedness N/A from queries. Like for minimal pairs.)
(I don't know if there's anything for multiple handedness fields. That's not supported yet. Is that needed?)

@susanodd
Copy link
Collaborator

These fields:

"Virtual Object": "-", "Phonology Other": "-", "Mouth Gesture": "-", "Mouthing": "-", "Phonetic Variation": "-",

are strings, so you can just put empty strings (the ones that are choices need to be "-")

If you don't want to set any value, you can exclude them from the json dictionary.
(It is ambiguous as to whether an empty value is a clear or to just ignore the field. So if you want to ignore the field and leave it as it is, don't include it in the update.)
I will check this again based on the logs.

@susanodd
Copy link
Collaborator

susanodd commented Oct 30, 2024

I looked up the gloss on Signbank.
It looks like the gloss doesn't have any Phonology yet.
Can you remove the fields from the update that stay "empty" from the update json fields?
So it only includes fields that need to be set.
Then make Handedness be 1 and set the Strong Hand not the Weak Hand.
As far as I know, if Handedness is 1, then the Strong Hand is set. (@uklomp can confirm this.)
You should also remove the Senses fields since you are not changing those.

@susanodd
Copy link
Collaborator

susanodd commented Oct 30, 2024

I can't find any errors related to the HUREN gloss. So the code thinks it's working correctly.
I can merely see that you tried the update url numerous times. But it is not raising any errors in the log.
Farther back, I find this one:

Internal Server Error: /dictionary/api_update_gloss/5/49249/
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'at[pid: 283|app: 0|req: 5442/11576] 10.208.155.251 () {62 vars in 1118 bytes} [Wed Jul 24 16:56:02 2024] POST /dictionary/api_update_gloss/5/49249/ => generated 282396 bytes in 517 msecs (HTTP/1.1 500) 5 headers in 178 bytes (1 switches on core 3)

(That's all it reports, nothing about what specifically. I'll add that exception to the code in order to catch it explicitly.)

In the logs, there is an error for ANTONI GAUDI

  File "/usr/lib/python3.12/urllib/request.py", line 1344, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.12/http/client.py", line 1331, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.12/http/client.py", line 1342, in _send_request
    self.putrequest(method, url, **skips)
  File "/usr/lib/python3.12/http/client.py", line 1179, in putrequest
    self._output(self._encode_request(request))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/http/client.py", line 1260, in _encode_request
    return request.encode('ascii')
           ^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode character '\xcd' in position 15: ordinal not in range(128)

The only place in the code where it explicitly uses "ascii" is in the __str__ method of GlossVideoHistory.
The code saves the API video operations to the history. So that is what is causing that error. Probably that ought to be utf-8.

For the zip files, you don't need to put the gloss annotation in the zipped filename. It can be something else, like videos_archive.zip
It gets the annotation from inside the unzipped folder.

@rem0g
Copy link
Collaborator Author

rem0g commented Oct 31, 2024

There is definitely something wrong with HUREN-C, as i am trying to send this:

{"Handedness": "1", "Strong Hand": "-", "Weak Hand": "-", "Handshape Change": "-", "Relation Between Articulators": "-", "Location": "-", "Contact Type": "-", "Movement Shape": "-", "Movement Direction": "-", "Repeated Movement": "False", "Alternating Movement": "False", "Relative Orientation: Movement": "-", "Relative Orientation: Location": "-", "Orientation Change": "-", "Virtual Object": "-", "Phonology Other": "-", "Mouth Gesture": "-", "Mouthing": "-", "Phonetic Variation": "-", "Senses": "{"en": [["hire"], ["rent, lease"]], "nl": [["huren"], ["huren"]]}"}

Interestingly, the update goes through when i send like this:

{"Handedness": "1", "Strong Hand": "-", "Weak Hand": "-", "Handshape Change": "-", "Relation Between Articulators": "-", "Location": "-", "Contact Type": "-", "Movement Shape": "-", "Movement Direction": "-", "Repeated Movement": "False", "Alternating Movement": "False", "Relative Orientation: Movement": "-", "Relative Orientation: Location": "-", "Orientation Change": "-", "Virtual Object": "-", "Phonology Other": "-", "Mouth Gesture": "-", "Mouthing": "-", "Phonetic Variation": "-"}

So that's JSON payload without Senses.

Something is going wrong with Senses field?

@susanodd
Copy link
Collaborator

susanodd commented Oct 31, 2024

Okay, that helps us to know!

@susanodd
Copy link
Collaborator

susanodd commented Nov 1, 2024

@rem0g I found the problem. This has to do with the use of two different formats for representing the senses internally.

The json package considers them language-related fields and they are described as "Senses: Dutch"
but other code is using that "nl" dictionary format and storing them together as the right hand of Senses.

I wrote a new function to test whether the senses field in the update differs from that in the database.

You need to omit the quotes around the Senses right-hand side. It's already in json. (It's not a string.)

"Senses": {"en": [["hire"], ["rent, lease"]], "nl": [["huren"], ["huren"]]}

This has been deployed.

susanodd added a commit that referenced this issue Nov 1, 2024
#1360: Fixed check on whether senses are being updated by API.
@rem0g
Copy link
Collaborator Author

rem0g commented Nov 1, 2024

It works again! Thank you

@Jetske
Copy link
Collaborator

Jetske commented Nov 4, 2024

@rem0g One more thing relevant to this issue:
We noticed that you send all the fields in the request to update the gloss. However this gives a lot of non-informative changes in the revision history (see https://signbank.cls.ru.nl/dictionary/gloss/49249/history)

We clearly did not document this well ;)
However the intention for this api was to only include the fields that are changed so it only updates those and leaves the others as is and they are not added to the revision history. Just so you know! You can close this issue again when read.

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 7, 2025

Senses update doesnt work anymore again:

Payload:



{
  "Senses": {
    "en": [
      [
        "string"
      ]
    ],
    "nl": [
      [
        "string"
      ]
    ]
  }
}

AttributeError at /dictionary/api_update_gloss/5/47662/

'dict' object has no attribute 'strip'

@susanodd
Copy link
Collaborator

susanodd commented Jan 8, 2025

The format is wrong that you show above.

@rem0g can you please stop putting "empty" "not changed" things in the "update" request?

The gloss you show has hundreds of entries in the revision history.

https://signbank.cls.ru.nl/dictionary/gloss/47662/history

That causes extreme payload and causes the database to get locked. Hundreds of comparisons are being made to check if things are changed and hundreds of objects are being created for all the revision histories.

It is likely that something else is causing the problem or the operation is failing because the database is getting locked.

Think "thrashing" (too many operations are competing for resources)

Every API request is a transaction. So the more "empty" things you put that need to be processed the more computations need to be done inside that transaction.

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 8, 2025

Can you please explain why is the format wrong? It's the format i got from the API documentation:

https://signbank.github.io/Global-signbank/#/Updating%20Signbank%20data/post_dictionary_api_update_gloss__datasetid___glossid__

With omitted strings as you mentioned before.

About the empty requests, yes I will be soon working on that soon. :)

@susanodd
Copy link
Collaborator

susanodd commented Jan 8, 2025

I looks like you have return characters in it?

The processing needs to parse it, so it should be without pretty printing. (It's a Python library that parses it.)

Normally, "strip" is done to make sure there are no spaces in the input.

Can you post your actual input value for the field? I will find out if there is a parsing error.

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 11, 2025

This is what was sent:

{'Senses': {'en': [['hire'], ['rent, lease']], 'nl': [['huren'], ['huren']]}}

I have tried different variations with slashes around en/nl and arrays, but nothing is working so far.

@Woseseltops I would be glad if you could fix this problem (OR explain in a clear way what formatting is expected with Senses) with alternative approach because this method is clearly not working for API purposes. :)

@Woseseltops
Copy link
Collaborator

Sure. I'll be working for the RU again this Thursday and Friday, let me know if you need it sooner

@susanodd
Copy link
Collaborator

We're missing @Jetske now.

@susanodd
Copy link
Collaborator

@Woseseltops you deleted a large part of the Wiki. There used to be examples of the Senses.
Two different formats are used for them. That is because it did not work to parse them with one.
The "update" actually deletes all the existing senses are replaces them.
There could also be something wrong at the transaction level, since if other things fail, then objects may have already been deleted.

@susanodd
Copy link
Collaborator

susanodd commented Jan 14, 2025

This is what was sent:

{'Senses': {'en': [['hire'], ['rent, lease']], 'nl': [['huren'], ['huren']]}}

I have tried different variations with slashes around en/nl and arrays, but nothing is working so far.

@Woseseltops I would be glad if you could fix this problem (OR explain in a clear way what formatting is expected with Senses) with alternative approach because this method is clearly not working for API purposes. :)

It looks like ['rent, lease'] has a comma inside of a quoted item?
I don't recall whether this is allowed? Did you try it as ['rent', 'lease']

The two "language" lists need to be the same length (that is, one item per sense) but the keywords per sense can be a different length.

To take the elements apart, the ',' comma is used.
In the Signbank Edit interface, a newline is used to separate keywords in the popup.

@Jetske
Copy link
Collaborator

Jetske commented Jan 14, 2025

{"Senses": "{'en': [['hire'], ['rent, lease']], 'nl': [['huren'], ['huren']]}"}

It's probably a dict but given as string, such as above. At the time of implementing it I didn't know how else to make it work. Anyway, just follow the code backwards to check what the input should look like. The documentation was written by different people and can still be wrong sometimes.

@rem0g
Copy link
Collaborator Author

rem0g commented Jan 14, 2025

@Woseseltops can you remove .strip() from

def get_gloss_update_human_readable_value_dict(request):
    post_data = json.loads(request.body)

    value_dict = dict()
    for field in post_data.keys():
        value = post_data.get(field, '')
        value_dict[field] = value.strip()
    return value_dict

Strip() is not required and can cause instability. Were there test units created for this function? I can't find that in the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants