-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix async synchronization issues in json_column.cu #15497
Conversation
@@ -371,7 +371,7 @@ std::vector<std::string> copy_strings_to_host(device_span<SymbolT const> input, | |||
auto to_host = [stream](auto const& col) { | |||
if (col.is_empty()) return std::vector<std::string>{}; | |||
auto const scv = cudf::strings_column_view(col); | |||
auto const h_chars = cudf::detail::make_std_vector_sync<char>( | |||
auto const h_chars = cudf::detail::make_std_vector_async<char>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of async here then sync, IMO the cleaner way is to use async
for both, then explicitly call stream.synchronize()
with comment why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I made similar decision in other comment too. I will updated it.
@@ -530,13 +530,15 @@ void make_device_json_column(device_span<SymbolT const> input, | |||
auto max_row_offsets = cudf::detail::make_std_vector_async(d_max_row_offsets, stream); | |||
std::vector<std::string> column_names = copy_strings_to_host( | |||
input, d_column_tree.node_range_begin, d_column_tree.node_range_end, stream); | |||
stream.synchronize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? copy_strings_to_host
sync inside it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we should rename copy_strings_to_host
into copy_strings_to_host_sync
to explicitly tell that we have a sync there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I was thinking of adding _sync
too.
We need sync because it's copied to host and it will be used later as column names in host.
// array of arrays column names | ||
if (is_array_of_arrays) { | ||
TreeDepthT const row_array_children_level = is_enabled_lines ? 1 : 2; | ||
auto values_column_indices = | ||
get_values_column_indices(row_array_children_level, tree, col_ids, num_columns, stream); | ||
auto h_values_column_indices = | ||
cudf::detail::make_std_vector_async(values_column_indices, stream); | ||
stream.synchronize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then make_std_vector_sync
above this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was bit sceptical about this because, if any optimization in future is done inside make_std_vector_sync
to return empty vector without sync, or if that make_std_vector_sync
code line is moved, this will cause async issues. So, to be safe, it added stream.synchronize();
outside. Anyway, the sync code does the same.
/merge |
Description
Fixes #15390
This change fixes async synchronization issues in json_column.cu.
Related file json_tree.cu does not have async synchronization issues.
Summary of changes:
changed debug print async to sync,
added synchronize after multiple async calls
changed h_chars to async since subsequent call is sync (it will also help because chars array is usually large).
changed is_str_column_all_nulls to sync.
Checklist