Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on mapping between amrex, numpy.ndarray, and torch.tensor data types #9

Open
JBlaschke opened this issue Feb 13, 2021 · 8 comments

Comments

@JBlaschke
Copy link
Contributor

Hey, this is not so much an issue as a place to solicit public feedback.

I think we should implement type conversion from the amrex FArrayBox (or more precisely the Array4) data type to numpy.ndarray and torch.tensor. As well a suitable python CUDA variants.

I also think that this type conversion should have s copying and a referencing variant.

This shouldn't be hard to implement (NO! This won't support python 2... I have a life you know), and I volunteer my time. But first I want to run this past all y'all to see if anyone is already working on it and what you think.

Tagging @ax3l @maxpkatz @drummerdoc

@JBlaschke
Copy link
Contributor Author

JBlaschke commented Feb 13, 2021

I think this would be a good basis for more complex amrex types. Since torch and python don't have a standardize framework for expressing amr, this is (in my opinion) the lowers common denominator.

We should also keep in mind how we deal with boxes whose indices don't start at 0. @ax3l's box type already has what we need I think. So we might need to implement a thin wrapper around numpy and torch that map amrex-style indexing to python indices.

Also tagging @sayerhs

@ax3l
Copy link
Member

ax3l commented Feb 13, 2021

Thanks for starting a sticky thread so we can collect the approaches. Let me start with what I am using so far:

General arrays (incl. numpy):

Device memory:

  • with @namehta4 we passed around device memory recently by using the buffer protocol and passing a non-owned device pointer with meta-data
  • I recently opened this cupy issue to ask how to do it right: Creating a cupy device array from GPU Pointer cupy/cupy#4644 - they also recommend to standardize on __cuda_array_interface__ - going directly to the emerging DLPack APIs

Compatibility:

Screenshot from 2021-02-13 13-36-11

@JBlaschke
Copy link
Contributor Author

Thanks @ax3l that list is a good starting point. I would vote for the python buffer protocol strategy as a starting point. This seems to work well PyCUDA also. We could then also implement some of the alternatives, depending on how much demand from applications there is, what benefits there are in each, and how much bandwidth we all have.

I'll do some reading to see if there is a benefit that would entice me to change my vote. (thanks for the references)

@ax3l
Copy link
Member

ax3l commented Feb 13, 2021

Agreed, I think after going through all the material again:

  • buffer (array) protocol for CPU memory (ND, strides)
  • __cuda_array_interface__ v3 (C-example) for transporting device-side memory w/o host-device copies

to start with. This will give us exposure to exactly the libraries and communities we want to interface with.

@ax3l ax3l mentioned this issue Mar 26, 2022
@ax3l
Copy link
Member

ax3l commented Mar 26, 2022

FArrayBox for CPU via the array interface is now implemented via #19.

Next is either the __cuda_array_interface__ or DLPack. Should not be too hard to add both.

@ax3l ax3l mentioned this issue Mar 30, 2022
@ax3l
Copy link
Member

ax3l commented Oct 17, 2022

CUDA bindings for multifabs including cupy, numba and pytorch coming in via #30

@ax3l
Copy link
Member

ax3l commented Jul 20, 2023

Did some more DLPack deep diving with @scothalverson.

What we want to implement here is primarily the producer, __dlpack__. This one creates a PyCapsule, essentially a transport of a void*. The data behind this pointer is laid out in the spec of DLPack (C/Python).

Relatively easy to read implementations are:

More involved or less documented are:

The DLManagedTensor is essentially:

  • a tensor description (similar to [cuda] array interface)
  • a context: device type & id
  • a deleter that can clean up the two other structs

This object is referred to in the capsule we produce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants