Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Calling cudf.dataframe.apply from c++ or porting to libcudf #14628

Open
sapcode opened this issue Dec 14, 2023 · 2 comments
Open

[QST] Calling cudf.dataframe.apply from c++ or porting to libcudf #14628

sapcode opened this issue Dec 14, 2023 · 2 comments
Labels
question Further information is requested

Comments

@sapcode
Copy link

sapcode commented Dec 14, 2023

Dear Rapids.Ai Team,

in the cuDF python API documentation there are several methods which are not in libcudf for c++:
cudf.dataframe.apply
cudf.dataframe.applymap
cudf.dataframe.apply_rows
cudf.dataframe.apply_chunks

  1. Is there any chance that those functions will be made available in libcudf for c++ ?
  2. Is there a way we could call the cuDF python functions from libcudf c++ context or from a general c++ context using pybind11 or python c-api ?
  3. Could you enhance the examples section with such a code which shows how to call python cuDF from c++ ?

Best regards
Developer

@sapcode sapcode added Needs Triage Need team to review and classify question Further information is requested labels Dec 14, 2023
@bdice
Copy link
Contributor

bdice commented Dec 14, 2023

The apply functionality of the cudf Python package is implemented using Numba. You can read more about UDF (user-defined function) support here: https://docs.rapids.ai/api/cudf/stable/user_guide/guide-to-udfs/

Because this requires JIT compilation with Numba, we don't have a way to expose this in libcudf C++ code. There are two features that are pretty close, however.

cudf::transform achieves a similar kind of thing as DataFrame.apply. I haven't used this feature so I can't speak very much to its limitations, but there are tests here that demonstrate using a device function passed as a string, or a precompiled PTX input: https://github.com/rapidsai/cudf/blob/branch-24.02/cpp/tests/transform/integration/unary_transform_test.cpp

There is also cudf::compute_column which takes an AST expression. The general idea is that you can take column references within a table (or literal inputs like "3") and create expressions, like col_0 + col_1 * col_2 + 3. Then you can execute that AST expression over a table to make a new column. You can see examples in the tests here: https://github.com/rapidsai/cudf/blob/branch-24.02/cpp/tests/ast/transform_tests.cpp

If you can give us more information about the kinds of user-defined functions you want to execute, that would be very helpful for future library design.

@wence-
Copy link
Contributor

wence- commented Dec 14, 2023

To add a bit more to @bdice's comment:

Can you explain your use case in a bit more detail? Would you like to use cudf.DataFrame.apply on a libcudf table_view because you aren't sure how to replicate the behaviour just using libcudf operations? Or do you have some other reason to want to do this? In almost all cases, the high-level cudf API calls (like DataFrame.apply) translate into a (sequence of) calls to libcudf primitives. If you're already in C++ you would, generally speaking, be better off calling those primitives directly. For example cudf::transform.

We have not put effort into making cudf interoperate bidirectionally with libcudf at the level of API calls: only the data structure level. So to date there is no way to turn a table_view into a DataFrame from C++. Indeed, the DataFrame contains a significant amount of extra metadata that you would need to construct (for example, libcudf doesn't have the concept of row and column indexes).

So it might be possible to go bidirectionally between libcudf and cudf, but there are many caveats because the translation from cudf to libcudf objects is a lossy one at the metadata level. We are currently making an effort to have a closer mapping between libcudf types/algorithms and cython-wrapped types/functions in the pylibcudf wrapping functions (you can see progress here #13921).

@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants