Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for httpfs #27

Merged
merged 4 commits into from
Dec 8, 2020
Merged

add support for httpfs #27

merged 4 commits into from
Dec 8, 2020

Conversation

cdoron
Copy link
Collaborator

@cdoron cdoron commented Dec 2, 2020

Signed-off-by: [email protected] [email protected]

Hi Roee,
I accidentally found a way to support httpfs.
Please review and check whether we want this support in the-mesh-for-data-flight-module.
Thanks!

@cdoron cdoron requested a review from roee88 December 2, 2020 13:21
afm/asset.py Outdated
import os

from afm.config import Config
from afm.pep import registry, consolidate_actions
from afm.s3 import s3filesystem_from_config

from pyarrow.fs import LocalFileSystem
from pyarrow.fs import LocalFileSystem, PyFileSystem, FSSpecHandler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move out of asset.py. We can create a filesystem package if needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@roee88 roee88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

httpfs seems like a 1 person project with zero adoption. Not saying that we can't use it but we need a requirement here and I'm not sure that we have one.

Also, technically, can you explain how this works? I don't see implementation of any of the FSSpecHandler methods in HttpFs or fs.base.FS

@roee88
Copy link
Collaborator

roee88 commented Dec 6, 2020

httpfs seems like a 1 person project with zero adoption. Not saying that we can't use it but we need a requirement here and I'm not sure that we have one.

Also, technically, can you explain how this works? I don't see implementation of any of the FSSpecHandler methods in HttpFs or fs.base.FS

Looking around it seems like pyarrow is compatible with fsspec. See http://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems.

fsspec has an HTTPFileSystem implementation so you can use it instead (note that it also requires the aiohttp package):

from fsspec.implementations.http import HTTPFileSystem

@cdoron
Copy link
Collaborator Author

cdoron commented Dec 6, 2020

I tried working with the fsspec HTTPFileSystem implementation.
I got the following error:

Traceback (most recent call last):
  File "sample/sample.py", line 39, in <module>
    main(args.port, args.repeat)
  File "sample/sample.py", line 22, in main
    info = client.get_flight_info(
  File "pyarrow/_flight.pyx", line 1237, in pyarrow._flight.FlightClient.get_flight_info
  File "pyarrow/_flight.pyx", line 80, in pyarrow._flight.check_flight_status
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: 'utf-8' codec can't decode byte 0x86 in position 38: invalid start byte. Detail: Python exception: UnicodeDecodeError

It appears that fsspec HTTPFileSystem is not a friend of the parquet format. I could only get it to work when I applied the following patch to fsspec/implementations/http.py

intake/intake-parquet#18 (comment)

Pipfile Outdated
Comment on lines 19 to 21
requests = "*"
fsspec = "*"
aiohttp = "*"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we set these to fixed versions?

from fsspec.implementations.http import HTTPFileSystem
from pyarrow.fs import PyFileSystem, FSSpecHandler

def httpfs_from_config(httpfs_config):
Copy link
Collaborator

@roee88 roee88 Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we don't use config. Is it a placeholder for authentication or something? If so please add a comment

@roee88 roee88 merged commit 90b8f8a into master Dec 8, 2020
@roee88 roee88 deleted the httpfs branch December 8, 2020 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants