Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output results as a dataframe + return short names, hctsa names and values as standard. #31

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

joshuabmoore
Copy link
Collaborator

Modifying the original changes proposed by @anniegbryant in PR #21, this PR updates the catch22 output to a DataFrame and marks a transition away from long (HCTSA) feature names toward what has previously been called "short" feature names, as default, i.e., mode_5 instead of DN_HistogramMode_5, etc.

Breaking Changes:

Since these modifications will introduce breaking changes for the existing user base, this PR will constitute a new major version release (catch22 v1.0.0), with docs + README updated to reflect the new output format. Users will need to be made aware of the new output via clear documentation and a migration guide in the changelogs to avoid confusion.

Major changes

  • Removal of short_names as an optional parameter in the catch22_all() function. Three columns will now be returned as standard: feature, hctsa_name and value. That is, catch22_all() now accepts only two arguments:
catch22_all(data, catch24=False)
  • < = v0.4.5 catch22 features_short are now called features (or feature in the output DataFrame).
  • < = v0.4.5 catch22 featureis now features_hctsa (or hctsa_name in the output DataFrame).
  • catch22 results are now returned as a pandas DataFrame instead of a dict for improved readability:
df = catch22_all(data, catch24=False)

# print the first feature name
print(df.feature[0])

# print the first feature value
print(df.value[0])

# print the first feature HCTSA (long) name
print(df.hctsa_name[0])
  • Added pandas and numpy dependencies.

Minor changes

  • Added a security policy, SECURITY.md.
  • Added a code of conduct, CODE_OF_CONDUCT.md
  • Added a darkmode logo to the README.
  • Added python unit testing + python version support badges to README.
  • Updated usage guide in README to notify users of DataFrame output.
  • Included support for python 3.12 unit test runners.
  • Updated unit tests to support new DataFrame output.

@joshuabmoore joshuabmoore requested a review from benfulcher June 4, 2024 09:48
@joshuabmoore
Copy link
Collaborator Author

Also, the changelogs will be more extensive and clearer about the breaking changes for users + new naming conventions with old short_names and names essentially swapping places.

@benfulcher benfulcher requested a review from anniegbryant June 7, 2024 01:56
@benfulcher
Copy link
Contributor

@anniegbryant can you do a quick test?

@benfulcher benfulcher requested a review from KieranOwens June 20, 2024 00:58
@KieranOwens
Copy link

I tried the new catch22_all function with my workflow. The change in the dictionary/dataframe key from 'values' (old version) to 'value' (new version) breaks my code. Otherwise, it works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants