Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On redirects, middle URL with ø char gets parsed wrongly - leading to a 404 #10047

Open
1 task done
Alekky09 opened this issue Nov 26, 2024 · 1 comment
Open
1 task done
Labels

Comments

@Alekky09
Copy link

Alekky09 commented Nov 26, 2024

Describe the bug

Hello,

If I try to fetch this URL using aiohttp https://cornelius-k.dk/synsproeve/, it will redirect, eventually leading to a 404 when trying to get https://cornelius-k.dk/synspr\udcf8ve at the end of the chain.

Looks like the Location header will be parsed wrongly from b'https://cornelius-k.dk/synspr\xf8ve' which I found in the Response._raw_headers.

To Reproduce

Code block:

import aiohttp
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
}
async def fetch_url(url):
    async with aiohttp.ClientSession(headers=headers) as session:
        async with session.get(url) as response:
            for i in response.history:
                print(i.url)
                print(i._headers)
                print(i._raw_headers)
            return response.status
print(await fetch_url("https://cornelius-k.dk/synsproeve/"))

Final URL in the redirect chain will be https://cornelius-k.dk/synspr�ve instead of https://cornelius-k.dk/synsprøve and 404 will be yielded.

Expected behavior

Parsing URL in the redirects correctly and fetching the correct final URL.

Logs/tracebacks

Output of the code block:

https://cornelius-k.dk/synsproeve/
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:17 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'd-cache': 'from-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'x-content-type-options': 'nosniff', 'strict-transport-security': 'max-age=31536000; preload', 'x-frame-options': 'SAMEORIGIN', 'content-security-policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synsproeve', 'd-geo': 'US')>
((b'server', b'nginx'), (b'date', b'Tue, 26 Nov 2024 16:02:17 GMT'), (b'content-type', b'text/html'), (b'content-length', b'162'), (b'd-cache', b'from-cache'), (b'cache-control', b'no-cache, no-store, must-revalidate'), (b'expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'x-content-type-options', b'nosniff'), (b'strict-transport-security', b'max-age=31536000; preload'), (b'x-frame-options', b'SAMEORIGIN'), (b'content-security-policy', b"frame-ancestors 'self'"), (b'location', b'https://cornelius-k.dk/synsproeve'), (b'd-geo', b'US'))
https://cornelius-k.dk/synsproeve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:18 GMT', 'Content-Type': 'text/html', 'Content-Length': '162', 'Location': 'http://cornelius-k.dk/synspr%C3%B8ve', 'd-cache': 'from-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'x-content-type-options': 'nosniff', 'strict-transport-security': 'max-age=31536000; preload', 'x-frame-options': 'SAMEORIGIN', 'content-security-policy': "frame-ancestors 'self'", 'd-geo': 'US')>
((b'server', b'nginx'), (b'date', b'Tue, 26 Nov 2024 16:02:18 GMT'), (b'content-type', b'text/html'), (b'content-length', b'162'), (b'location', b'http://cornelius-k.dk/synspr%C3%B8ve'), (b'd-cache', b'from-cache'), (b'cache-control', b'no-cache, no-store, must-revalidate'), (b'expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'x-content-type-options', b'nosniff'), (b'strict-transport-security', b'max-age=31536000; preload'), (b'x-frame-options', b'SAMEORIGIN'), (b'content-security-policy', b"frame-ancestors 'self'"), (b'd-geo', b'US'))
http://cornelius-k.dk/synspr%C3%B8ve
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Tue, 26 Nov 2024 16:02:18 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache, no-store, must-revalidate', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Security-Policy': "frame-ancestors 'self'", 'Location': 'https://cornelius-k.dk/synspr\udcf8ve', 'D-Geo': 'US')>
((b'Server', b'nginx'), (b'Date', b'Tue, 26 Nov 2024 16:02:18 GMT'), (b'Content-Length', b'0'), (b'Connection', b'keep-alive'), (b'Cache-Control', b'no-cache, no-store, must-revalidate'), (b'Expires', b'Thu, 01 Jan 1970 00:00:00 GMT'), (b'X-Content-Type-Options', b'nosniff'), (b'X-Frame-Options', b'SAMEORIGIN'), (b'Content-Security-Policy', b"frame-ancestors 'self'"), (b'Location', b'https://cornelius-k.dk/synspr\xf8ve'), (b'D-Geo', b'US'))
(404, URL('https://cornelius-k.dk/synspr�ve'))

Python Version

3.9.20

aiohttp Version

3.11.7

multidict Version

6.1.0

propcache Version

0.2.0

yarl Version

1.17.1

OS

macOS

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@Alekky09 Alekky09 added the bug label Nov 26, 2024
@bdraco
Copy link
Member

bdraco commented Nov 27, 2024

Which setting are you using for requoting of redirects? ClientSession(requote_redirect_url=True) or ClientSession(requote_redirect_url=False) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants