-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and report incorrectly delimited strings #161
Open
rtobar
wants to merge
1
commit into
izimobil:master
Choose a base branch
from
rtobar:delimited-strings
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Strings in pofiles are always delimited on both ends with double-quotes. The POFile parser in polib didn't check for this, and therefore happily accepted invalid msgstr/msgid/etc, potentially loosing some of the contents of the file. In such cases, the first or last character of the string would be lost, as *they* would be considered the string delimiters. This commit adds a check to the POFile parser to ensure strings are always delimited by double quotes on both ends. After adding it, I spotted a couple of offending po file contents in the tests, which have also been fixed. Additionally, a new test had been added to ensure these cases are caught. The new test indeed fails if the new check is removed. This issue was found while investigating an error produced by the "powrap" tool while running it over the po files for the Spanish translation of the CPython documentation. The tool failed check one of our files because gettext's `msgcat` utility failed to parse the file. Upon closer inspection I realised the error in our pofile, which was caught by gettext but not polib. Signed-off-by: Rodrigo Tobar <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #161 +/- ##
==========================================
+ Coverage 95.71% 95.73% +0.01%
==========================================
Files 1 1
Lines 864 868 +4
==========================================
+ Hits 827 831 +4
Misses 37 37 ☔ View full report in Codecov by Sentry. |
rtobar
added a commit
to rtobar/python-docs-es
that referenced
this pull request
Nov 13, 2024
A msgstr was missing its opening double quotes, which gettext's msgcat complained about (but powrap and polib didn't). See https://git.afpy.org/AFPy/powrap/pulls/4 and izimobil/polib#161 for PRs to fix both of those. Signed-off-by: Rodrigo Tobar <[email protected]>
This was referenced Nov 13, 2024
rtobar
added a commit
to python/python-docs-es
that referenced
this pull request
Nov 18, 2024
En `library/re.po` había una entrada que no estaba delineada correctamente con comillas dobles (si ven el diff entero es la última entrada en el diff, o pueden ver simplemente el primer commit de este PR). Esto hacía que `powrap --check` se saltara el archivo y no lo validara. Esto, a su vez, ocurría porque la utilidad `msgcat` de `gettext` identificaba el error de sintaxis, y fallaba al ser ejecutada. `powrap` no consideraba esos errores al momento de calcular el exit code del proceso, y por lo tanto el archivo no sólo seguía siendo inválido, sino que tampoco era verificado. De igual forma, el archivo no podía ser wrapeado correctamente usando `powrap library/re.po`. Ya abrí un PR contra `powrap` para cambiar este comportamiento en https://git.afpy.org/AFPy/powrap/pulls/4 (actualización: el PR ya fue mergeado, y una nuevs versión de powrap fue publicada, pornlo que también actualicé en este PR nuestra dependencia de powrap, además del pre-commit hook de powrap). Por otro lado, el resto de nuestras herramientas *no* consideraban este archivo como inválido, Esto es porque `polib` no hacía la validación correspondiente, e incorrectamente parseaba la entrada. También abrí un PR contra polib para esto en izimobil/polib#161. Actualización: en el intertanto también me di cuenta de que el paquete `babel` sufre del mismo problema, yo incorrectamente había asumido que babel dependía de polib; PR creada contra babel: python-babel/babel#1151. Después de corregir el error de sintaxis, ejecuté powrap de tal manera que ahora `library/re.po` está bien formateado. --------- Signed-off-by: Rodrigo Tobar <[email protected]>
Gentle ping |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Strings in pofiles are always delimited on both ends with double-quotes. The POFile parser in polib didn't check for this, and therefore happily accepted invalid msgstr/msgid/etc, potentially loosing some of the contents of the file. In such cases, the first or last character of the string would be lost, as they would be considered the string delimiters.
This commit adds a check to the POFile parser to ensure strings are always delimited by double quotes on both ends. After adding it, I spotted a couple of offending po file contents in the tests, which have also been fixed. Additionally, a new test had been added to ensure these cases are caught. The new test indeed fails if the new check is removed.
This issue was found while investigating an error produced by the "powrap" tool while running it over the po files for the Spanish translation of the CPython documentation. The tool failed check one of our files because gettext's
msgcat
utility failed to parse the file. Upon closer inspection I realised the error in our pofile, which was caught by gettext but not polib.Update: I also realised the babel package suffers from the same issue, so I've created a PR addressing the same issue there: python-babel/babel#1151