-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mimetypes read from the registry should not overwrite standard mime mappings #54760
Comments
Hi, I am the primary developer of calibre (http:/calibre-ebook.com) and yesterday I released an upgrade of calibre based on python 2.7. Here is a small sampling of all the diverse errors that my users experienced, related to reading mimetypes from the registry: 1. Permission denied if running from non privileged account
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 84, in run_entry_point
File "site-packages\calibre\__init__.py", line 31, in <module>
File "mimetypes.py", line 344, in add_type
File "mimetypes.py", line 355, in init
File "mimetypes.py", line 261, in read_windows_registry
WindowsError: [Error 5] Acceso denegado (Access not allowed) The fix for this is to trap WindowsError and ignore it in mimetypes.py
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 84, in run_entry_point
File "site-packages\calibre\__init__.py", line 31, in <module>
File "mimetypes.py", line 344, in add_type
File "mimetypes.py", line 355, in init
File "mimetypes.py", line 260, in read_windows_registry
File "mimetypes.py", line 250, in enum_types
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte The fix for this is to change except UnicodeEncodeError to except ValueError
Where the output should have been (image/jpeg', None) The fix for this is to load the registry entries before the default entris defined in mimetypes.py Of course, IMHO, the best possible fix is to simply remove the reading of mimetypes from the registry. But that is up to whoever maintains this module. Duplicate (less comprehensive) tickets ont his isuue in your traceker already are: 9291, 10490, 104314 If the maintainer of this module is unable to fix these issues, let me know and I will submit a patch, either removing _winreg or fixing the issues individually. |
The first issue you note appears to be a duplicate of bpo-10162, a fix for which should be available in the 2.7.1 maintenance release. The second issue appears to be a duplicate of bpo-9291. Since that issue is still open, I suggest any further discussion be pursued there. You may want to add yourself to the nosy list of that issue. |
And what about the third issue? Allow me to elaborate: mimetypes are a relatively standard set of mappings from well known file extensions to MIME descriptors. Reading mimetype mappings from the registry, a location that is writable to by random programs the user may have installed on his machine, let alone malware, is a BAD idea. It leads to situations like asking for the mimetype of file.jpg and getting iage/pjpeg back. Or asking for the mimetype of file.png and getting image/x-png back. If you still consider it good to read mimetypes from the registry, at the very least, they should be read before the standard mimetype mappings defined in mimetypes.py are applied. That way at least for that set of mappings, users of python can be assured of sane query results. As it stands now, mimetypes.py is useless and to workaround the problem I essentially had to define the mimetype mappings for all the mimetypes my program knows about by hand. |
(Sorry, I skipped over the third: this is one reason why one should not include multiple problems in one tracker issue.) As to your third point, a quick search of "mimetypes" in the bugtracker shows that looking in the Windows registry for mimetypes was a new feature in 2.7 and the upcoming 3.2 added by bpo-4969. Adding the Windows maintainers and the Nosy List from that issue. |
I apologize for the multiple issue in the ticket. To my mind they were all basically one issue, stemming from the decision to read mimetypes from the registry. Since there are other tickets for the first two issues, I'll change the summary for this issue to reflect only the third. |
Kovid: so essentially what you are saying is that the windows platform is broken with respect to MIME types and with respect to its security model. Why am I not surprised? :) You would have the same problem if software installation altered the /etc/mimetypes file on a unix box and created weird entries. Perhaps unix programmers are just better disciplined? Reading the registry first and having the built in settings override would IMO defeat the purpose of reading the values from the registry: those are (theoretically!!) the settings the user chose to change. However, working around it in your program should be simple: just call mimetypes.init with an empty file list. The windows registry is only read if the files parameter is None. This will also give you consistent behavior on windows and unix: only the default mime types in the mimetypes module will be used. If, on the other hand, you want to retain the Unix behavior, you can pass init mimetypes.knownfiles instead of the empty list. (By they way, thanks very much for calibre, I have used the CLI tools to great benefit, and love the fact that the CLI is the basis of the program.) |
It is, of course, your decision, but IMO, since the mimetypes database in windows appears to be always broken, the default behavior of the mimetypes module in python 2.7 on windows is broken for most (all?) windows installs. For me personally, it doesn't matter anymore, as I have already fixed calibre, but it would be surprising/unexpected behavior for someone new to using mimetypes.py on windows. Certainly, my expectation (perhaps naively) was that guess_type('image.jpg') would always return 'image/jpeg'. Users on windows rarely (ever?) modify the registry to change mimetypes. The only thing that does change mimetypes is installed software, without the users' knowledge/consent. So treating the registry as a reliable store of mime information, is not a good idea. On unix, the knownfiles are system files. I dont know about OS X, but on linux, since most software is installed by package managers, the package managers usually have policies that prevent application installs from clobbering system files. And of course, running userland applications dont have the necessary privileges to modify the files. Out of curiosity, what is the upside of reading mimetypes from the registry, given that it's information cannot be trusted? And you're most welcome, for calibre :) |
I would expect that it would not be people new to mimetypes that would have the issues, but people like you for whom the behavior on Windows has changed. And this is indeed a concern. The people involved in making the windows mimetypes enhancement are nosy on this ticket, perhaps they will have thoughts on the issue of the (in)validity of the windows mime data. |
I actually had in mind people that (like me) develop primarily on unix and assume that mimetypes works the same way on both windows and unix. Of course, the changed behavior is also a concern. At the very least, I would encourage the addition of a warning to the documentation of the mimetypes module. |
This is definitely a real issue, and makes mimetypes.guess_type() useless out of the box on Windows. However, I believe the reason it's broken is that the fix for bpo-4969 doesn't actually work, and I'm not sure this is possible with the Windows registry. You see, "MIME\Database\Content Type" in the Windows registry is a mime type -> file extension mapping, *not the other way around*. But read_windows_registry() tries to use it as a file extension -> mime type mapping, and bad things happen, because there are multiple mime types for certain file extensions. As far as I can tell, there's nothing in the Windows registry that says which is the "canonical" mime type for a given extension. Again, this is because Microsoft intends it (and uses it) as a mime type -> extension mapping. See more here: http://msdn.microsoft.com/en-us/library/ms775148(v=vs.85).aspx For example, in my "MIME\Database\Content Type" we have: image/jpeg -> .jpg And read_windows_registry() picks the last one for .jpg, which in this case is image/pjpeg -- NOT what users expect. In short, I think the fix for bpo-4969 is broken as is, and that you can't actually use the mime types database in the Windows registry in this way. I suggest reverting the fix for bpo-4969. Or, we could get clever and only use the Windows registry value if there's a single mime type -> extension mapping for a given extension, and if there's more than one (meaning it'd be ambiguous), use the mimetypes default from types_map / common_types. |
Mark, are you referring to part 3 of this issue, the image/pjpeg type of problem? This was fixed in Python 2.7.6 -- see changeset http://hg.python.org/cpython/rev/e8cead08c556 and http://bugs.python.org/issue15207 |
From the discussion I conclude that all three issues reported here have been resolved. If nobody objects I will close this issue. |
This issue says "mimetypes read from the registry should not overwrite standard mime mappings". Was this change ever made? the following issue claims that the "HKEY_CLASSES_ROOT\.js\Content Type" registry can still overrides ".js" files: https://bugs.python.org/issue43975? |
I've run into this issue with Python 3.11.3 with Django on Windows O.S. (see django-commons/django-debug-toolbar#2046), so I don't believe this change was ever made (for .js files at least). |
There are too many issues described here, none of which (apart from the latest Django one) apply to the current implementation. Please open a new issue. FWIW, we now use the correct source for MIME associations (unlike what some earlier comments suggest), and as a general rule we prefer to use system configuration (whether it's "correct" or correct), so there is precisely zero change we will prefer the static defaults over user preferences. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: