Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memcheck error in ReplaceTest.NormalizeNansAndZerosMutable gtest #17610

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Dec 17, 2024

Description

Fixes memcheck error found in nightly build checks in the STREAM_REPLACE_TEST's ReplaceTest.NormalizeNansAndZerosMutable gtest. The mutable-view passed to the cudf::normalize_nans_and_zeros API was pointing to invalidated data.

The following line created the invalid view

cudf::mutable_column_view mutable_view = cudf::column(input, cudf::test::get_default_stream());

The temporary cudf::column is destroyed once the mutable_view is created so this view would now point to a freed column. The view must be created from a non-temporary column and also must be non-temporary itself so that it is not implicitly converted to a column_view.

Error introduced by #17436

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Dec 17, 2024
@davidwendt davidwendt self-assigned this Dec 17, 2024
@davidwendt davidwendt requested a review from a team as a code owner December 17, 2024 15:34
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks fine. Not wanting a temporary column makes sense. I don't completely follow this part though:

also must be non-temporary itself so that it is not implicitly converted to a column_view.

Would this operator be invoked via contextual conversion if the mutable_column_view is a temporary and is passed to a function that has overloads for both column_view and mutable_column_view? That doesn't sound quite right to me but I don't know how else to interpret that statement.

@davidwendt
Copy link
Contributor Author

davidwendt commented Dec 17, 2024

The change looks fine. Not wanting a temporary column makes sense. I don't completely follow this part though:

also must be non-temporary itself so that it is not implicitly converted to a column_view.

Would this operator be invoked via contextual conversion if the mutable_column_view is a temporary and is passed to a function that has overloads for both column_view and mutable_column_view? That doesn't sound quite right to me but I don't know how else to interpret that statement.

That is what appears to be happening and a fix was attempted in #17436. When passing a temporary mutable_column_view, the compiler instead mapped to the column_view const& API since it had the conversion operator and appears to be triggered by passing a non-const reference of a temporary.

https://godbolt.org/z/hW7cnKPxW

@davidwendt
Copy link
Contributor Author

This is a better example: https://godbolt.org/z/hW7cnKPxW

@vyasr
Copy link
Contributor

vyasr commented Dec 17, 2024

Isn't this more representative though? It's not just about const qualifiers, there's an overload of normalize_nans_and_zeros that accepts a mutable_column_view, so why would it ever be cast to a column_view regardless of const qualifiers (since the function doesn't require a const ref so it can take a const or non-const mutable_column_view as input)? I'm trying to understand why the original line 111 in this PR

cudf::normalize_nans_and_zeros(mutable_view, cudf::test::get_default_stream());

would lead to mutable_view being converted to a column_view. Maybe that's not what you're saying happens? Maybe it was the code before #17436 that was causing the column_view code path to be triggered?

Comment on lines +110 to +111
auto view = input->mutable_view();
cudf::normalize_nans_and_zeros(view, cudf::test::get_default_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not mind this option, but nothing wrong with current code either

Suggested change
auto view = input->mutable_view();
cudf::normalize_nans_and_zeros(view, cudf::test::get_default_stream());
cudf::normalize_nans_and_zeros(input->mutable_view(), cudf::test::get_default_stream());

Copy link
Contributor

@vyasr vyasr Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwendt this is what I was getting at above. My interpretation of

The view must be created from a non-temporary column and also must be non-temporary itself so that it is not implicitly converted to a column_view.

was that this change would somehow break things because the mutable_column_view would be a temporary and that was not permissible here. Perhaps I was misunderstanding though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly what happens. The temporary created here

cudf::normalize_nans_and_zeros(input->mutable_view(), cudf::test::get_default_stream());

causes the compiler to call the column_view const& API instead of the mutable_column_view& API.
The current code insures the appropriate API is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original change made here: #17436 was an attempt correct the API call by creating a mutable_column_view variable but inadvertently created the view to a destroyed temp column.

Copy link
Contributor

@vyasr vyasr Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that happen? I wouldn't have expected that overload to ever be selected in this way unless the overload for the same type was actually impossible to call, but input->mutable_view() returns a (non-const) mutable_column_view that should be totally fine for this function signature.

Copy link
Contributor Author

@davidwendt davidwendt Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm guessing because it is a temporary and passing a non-const temporary usually makes no sense since any modifications to the object that occur inside the function are just thrown away. I suppose the compiler is trying hard to help here by finding a better API candidate to call.
I feel like this https://godbolt.org/z/hW7cnKPxW illustrates that as well.

Perhaps we should not have an implicit operator conversion from mutable_column_view to column_view
I would not expect that to be a common thing and making it explicit not be a big deal in our code base.

Copy link
Contributor

@vyasr vyasr Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this https://godbolt.org/z/hW7cnKPxW illustrates that as well.

OK I put together a slightly modified version of your example that helped me. I found your example a bit different since there is no overload of the function that actually accepts an instance of hello itself. I would have thought that would always be preferred. Your explanation of why it wouldn't be makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the changes in #17436, we were passing a mutable_column_view rvalue to cudf::normalize_nans_and_zeros. Since we cannot bind a rvalue to a non-const lvalue reference, the cudf::normalize_nans_and_zeros(mutable_column_view &) overload could not be called, and the compiler instead converted the mutable_column_view to column_view so that the overload (cudf::normalize_nans_and_zeros(column_view const &)) with the const reference parameter could be invoked.
However, while trying to create a mutable_column_view lvalue, I accidentally created the view to a rvalue which does not make sense. Thank you for the fix, @davidwendt!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we cannot bind a rvalue to a non-const lvalue reference

I forgot that this was a rule, thanks for stating it out explicitly. I guess the compiler prevents this since there's no sensible reason to allow this and it protects against user error modifying a parameter in a way that would have no effect.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 267c7f2 into rapidsai:branch-25.02 Dec 17, 2024
105 checks passed
@davidwendt davidwendt deleted the memcheck-norm-nans-test branch December 17, 2024 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants