Skip to content

Dropping duplicates without using Pandas #1353

Answered by jpivarski
blairium asked this question in Q&A
Discussion options

You must be logged in to vote

Dropping duplicates is the sort of thing that Pandas does better than NumPy or Awkward Array. (The expressions/cuts interface just applies NumPy or Awkward Array cuts—it doesn't do anything special internally.) There's a variety of ways you can cobble together a "drop duplicates" operation in NumPy (see this StackOverflow question), and ak.run_lengths is helpful for doing that in Awkward Array if you're looking for duplicates within each variable-length list, but I assume that "trackID", "eventID", "flagParticle" are not within nested lists.

Probably the best way to go about it would be to leverage Pandas's functionality—use its strengths—while minimizing the memory it holds. Nobody says …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@blairium
Comment options

Answer selected by blairium
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants