Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an explicit sort function #247

Open
dougbrn opened this issue Sep 28, 2023 · 2 comments
Open

Add an explicit sort function #247

dougbrn opened this issue Sep 28, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@dougbrn
Copy link
Collaborator

dougbrn commented Sep 28, 2023

Certain user workflows will require their data sorted in a certain manner. The most obvious being their data sorted by time, per lightcurve (So a first-order sort by object_id, and second-order by timestamp). We should provide a sort function that gives them the ability to do this. There is some risk that this is resolved automatically by the refactor, but the need to keep lightcurves in object_id groups with each sort suggests this might need to be a function beyond the scope of a dd.sort_values call

@dougbrn dougbrn added the enhancement New feature or request label Sep 28, 2023
@dougbrn
Copy link
Collaborator Author

dougbrn commented Oct 3, 2023

https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.sort_values.html

sort_values is able to sort by multiple columns. I'm not 100% sure if it's happy with one column being the index, as will be our use case most of the time. But there's a strong possibility inheriting sort_values via the refactor (#191) will resolve this automatically.

@dougbrn
Copy link
Collaborator Author

dougbrn commented Dec 22, 2023

We have sort_values now, but there may still be an argument for a sort function that always sorts first on index (to preserve divisions), but then sorts by a second-order key within each index grouping. You can do this with sort_values, but asking users to use it for these use-cases may often result in the user just blowing up divisions by doing something silly. Perhaps even overriding sort_values for the source table is the right move here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant