Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type stubs for pylibcudf #17258

Merged
merged 23 commits into from
Nov 12, 2024

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Nov 6, 2024

Description

Having looked at a bunch of the automation options, I just did it by hand.

A followup will add some automation to add docstrings (so we can see those via LSP integration in editors) and do some simple validation.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wence- wence- requested review from a team as code owners November 6, 2024 18:42
@github-actions github-actions bot added Python Affects Python cuDF API. cudf.polars Issues specific to cudf.polars pylibcudf Issues specific to the pylibcudf package labels Nov 6, 2024
class gpumemoryview:
def __init__(self, data: Any) -> None: ...
@property
def __cuda_array_interface__(self) -> Mapping[str, Any]: ...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use TypedDict here if we really care about advertising the key names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am agnostic to this and will defer to you on whether you think the extra work is worthwhile.

Comment on lines 13 to 15
class ColumnMetadata:
name: str
children_meta: list[ColumnMetadata]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add __init__

Comment on lines 20 to 29
class OrcColumnStatistics:
@property
def number_of_values(self) -> int | None: ...
@property
def has_null(self) -> bool | None: ...
def __getitem__(self, item: str) -> Any: ...
def __contains__(self, item: str) -> bool: ...
def get[T](self, item: str, default: None | T = None) -> T | None: ...

class ParsedOrcStatistics:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't construct these directly, so no __init__ annotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you added these later anyway? Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my automated checking, so I don't need too many carveouts


class SourceInfo:
def __init__(
self, sources: list[str] | list[os.PathLike] | list[Datasource]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lie, it's actually list[str | os.PathLike], but list is invariant, so the type checkers complain. If we had Sequence[str | os.PathLike] that would be fine, because Sequence is contravariant, but these functions really do want lists, and will complain for some Sequence that is not a list.

@wence- wence- force-pushed the wence/fea/pylibcudf-typing branch 2 times, most recently from ae4421a to 111ff09 Compare November 6, 2024 19:01
@github-actions github-actions bot added the CMake CMake build issue label Nov 6, 2024
@wence- wence- force-pushed the wence/fea/pylibcudf-typing branch from 111ff09 to 1801379 Compare November 7, 2024 17:33
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great. I left some inline comments while I skimmed the whole thing under the assumption that I would probably spot some issues in some files while missing them in others. The big points that might apply to many places are:

  1. Can we remove default parameter values from stub files?
  2. Can we get rid of enum values in stub files (I expect no, but it would be nice)?
  3. We should try and make absolute vs relative import styles consistent, but I'm OK with doing that in a follow-up to avoid cluttering this PR further.

python/pylibcudf/pylibcudf/__init__.pyi Outdated Show resolved Hide resolved
python/pylibcudf/pylibcudf/__init__.pyi Outdated Show resolved Hide resolved
def product() -> Aggregation: ...
def min() -> Aggregation: ...
def max() -> Aggregation: ...
def count(null_handling: NullPolicy = NullPolicy.INCLUDE) -> Aggregation: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we elide default parameters? They don't add anything for typing, and that is mypy's recommendation:

Stub files are written in normal Python syntax, but generally leaving out runtime logic like variable initializers, function bodies, and default arguments.

It might also simplify the requirements for the automation scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can elide default values by doing def count(null_handling: NullPolicy = ...).

However, type stubs are not just for type checking, but also LSP. In the latter case, having the default value encoded is useful since I can immediately see from the signature what behaviour I get if I don't provide the argument.

I will see if I can automatically put the right value in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this round, I am not going to put defaults in automatically.

python/pylibcudf/pylibcudf/binaryop.pyi Outdated Show resolved Hide resolved
class gpumemoryview:
def __init__(self, data: Any) -> None: ...
@property
def __cuda_array_interface__(self) -> Mapping[str, Any]: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am agnostic to this and will defer to you on whether you think the extra work is worthwhile.

python/pylibcudf/pylibcudf/null_mask.pyi Show resolved Hide resolved
from pylibcudf.scalar import Scalar

class BPEMergePairs:
def __init__(self, merge_pairs: Column) -> None: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to default the return type annotation for __init__ (and perhaps other special methods if they make sense)? It's redundant otherwise.

Copy link
Contributor Author

@wence- wence- Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so: pyright complains in the untyped circumstance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool let's do that then.

python/pylibcudf/pyproject.toml Show resolved Hide resolved
python/cudf_polars/cudf_polars/dsl/ir.py Show resolved Hide resolved
@vyasr vyasr added feature request New feature or request non-breaking Non-breaking change labels Nov 8, 2024
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the open threads and resolved everything that looked done. There is still some outstanding work but I'm happy with the current state so feel free to resolve everything as you see fit and then merge.

cudf::aggregation::kind and cudf::type_id are not handled correctly,
but most of the rest seem to be.
@wence-
Copy link
Contributor Author

wence- commented Nov 11, 2024

I hacked up some stuff so that the renamed enums are now documented and enumerate their members in the docs

@wence-
Copy link
Contributor Author

wence- commented Nov 11, 2024

/merge

Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks fixing lists too! I think we need to update the tests

diff --git a/python/pylibcudf/pylibcudf/tests/test_lists.py b/python/pylibcudf/pylibcudf/tests/test_lists.py
index f3ef555f11..33714707b4 100644
--- a/python/pylibcudf/pylibcudf/tests/test_lists.py
+++ b/python/pylibcudf/pylibcudf/tests/test_lists.py
@@ -62,14 +62,14 @@ def test_concatenate_rows(test_data):
     [
         (
             [[[1, 2], [3, 4], [5]], [[6], None, [7, 8, 9]]],
-            False,
-            [[1, 2, 3, 4, 5], None],
+            plc.lists.ConcatenateNullPolicy.IGNORE,
+            [[1, 2, 3, 4, 5], [6, 7, 8, 9]],
         ),
         (
-            [[[1, 2], [3, 4], [5, None]], [[6], [None], [7, 8, 9]]],
-            True,
-            [[1, 2, 3, 4, 5, None], [6, None, 7, 8, 9]],
-        ),
+            [[[1, 2], [3, 4], [5]], [[6], None, [7, 8, 9]]],
+            plc.lists.ConcatenateNullPolicy.NULLIFY_OUTPUT_ROW,
+            [[1, 2, 3, 4, 5], None],
+        )
     ],
 )

etc. etc.

@wence-
Copy link
Contributor Author

wence- commented Nov 12, 2024

Thanks fixing lists too! I think we need to update the tests

diff --git a/python/pylibcudf/pylibcudf/tests/test_lists.py b/python/pylibcudf/pylibcudf/tests/test_lists.py
index f3ef555f11..33714707b4 100644
--- a/python/pylibcudf/pylibcudf/tests/test_lists.py
+++ b/python/pylibcudf/pylibcudf/tests/test_lists.py
@@ -62,14 +62,14 @@ def test_concatenate_rows(test_data):
     [
         (
             [[[1, 2], [3, 4], [5]], [[6], None, [7, 8, 9]]],
-            False,
-            [[1, 2, 3, 4, 5], None],
+            plc.lists.ConcatenateNullPolicy.IGNORE,
+            [[1, 2, 3, 4, 5], [6, 7, 8, 9]],
         ),
         (
-            [[[1, 2], [3, 4], [5, None]], [[6], [None], [7, 8, 9]]],
-            True,
-            [[1, 2, 3, 4, 5, None], [6, None, 7, 8, 9]],
-        ),
+            [[[1, 2], [3, 4], [5]], [[6], None, [7, 8, 9]]],
+            plc.lists.ConcatenateNullPolicy.NULLIFY_OUTPUT_ROW,
+            [[1, 2, 3, 4, 5], None],
+        )
     ],
 )

etc. etc.

Ah, good spot, thanks.

@rapids-bot rapids-bot bot merged commit 7682edb into rapidsai:branch-24.12 Nov 12, 2024
102 checks passed
@wence- wence- deleted the wence/fea/pylibcudf-typing branch November 12, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue cudf.polars Issues specific to cudf.polars feature request New feature or request non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] Provide type stubs for pylibcudf package
3 participants