-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for new fsspec-style caching #18
Comments
Huff, OK. --- a/fsspec/implementations/http.py
+++ b/fsspec/implementations/http.py
@@ -12,8 +12,8 @@ from fsspec.asyn import sync_wrapper, sync, AsyncFileSystem, maybe_sync
from ..caching import AllBytes
# https://stackoverflow.com/a/15926317/3821154
-ex = re.compile(r"""<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1""")
-ex2 = re.compile(r"""(http[s]?://[-a-zA-Z0-9@:%_+.~#?&/=]+)""")
+ex = re.compile(b"""<a\\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1""")
+ex2 = re.compile(b"""(http[s]?://[-a-zA-Z0-9@:%_+.~#?&/=]+)""")
async def get_client():
@@ -98,7 +98,7 @@ class HTTPFileSystem(AsyncFileSystem):
kw.update(kwargs)
async with self.session.get(url, **self.kwargs) as r:
r.raise_for_status()
- text = await r.text()
+ text = await r.read()
if self.simple_links:
links = ex2.findall(text) + ex.findall(text)
else:
@@ -108,6 +108,7 @@ class HTTPFileSystem(AsyncFileSystem):
for l in links:
if isinstance(l, tuple):
l = l[1]
+ l = l.decode()
if l.startswith("/") and len(l) > 1:
# absolute URL on this server
l = parts.scheme + "://" + parts.netloc + l |
(PS: the parquet driver should not really be calling isdir, but isfile - and in HTTP, and sometimes s3/gcs, the same path can be both) |
Lest anybody else be confused by this issue, |
As far as I can tell, this driver doesn't support fsspec-style caching at the moment:
Unsure if this is more appropriate to be fixed at the dask layer or here. Is there a roadmap for rolling out
fsspec
caching to various drivers?The text was updated successfully, but these errors were encountered: