Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: evaluate-busco brakes when input mags > ~1,300 #140

Closed
Sann5 opened this issue Feb 21, 2024 · 8 comments · Fixed by #142 or #148
Closed

BUG: evaluate-busco brakes when input mags > ~1,300 #140

Sann5 opened this issue Feb 21, 2024 · 8 comments · Fixed by #142 or #148
Assignees
Labels
bug Something isn't working

Comments

@Sann5
Copy link
Contributor

Sann5 commented Feb 21, 2024

The actual problem is vega, the plotting engine. See this post.

Error message:

altair.utils.data.MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).
Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
transformations in Python.
    >> import altair as alt
    >> alt.data_transformers.enable("vegafusion")
Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
on how to plot large datasets.
Plugin error from moshpit:
  The number of rows in your dataset is greater than the maximum allowed (5000).
  Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
  transformations in Python.
      >> import altair as alt
      >> alt.data_transformers.enable("vegafusion")
  Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
  on how to plot large datasets.
@Sann5 Sann5 added the bug Something isn't working label Feb 21, 2024
@Sann5 Sann5 self-assigned this Feb 21, 2024
@Sann5
Copy link
Contributor Author

Sann5 commented Feb 22, 2024

  • So the plotting breaks not with 5,000 mags but actually with around 1,360.
  • setting alt.data_transformers.disable_max_rows() will increase the limit to around 1,580.
  • Above this threshold, it will still output a blank plot.
  • I say "around" because depending on the number of mags * samples this number might vary.

Possible solution?

  • Make new plots for every ~1,000 mags and put them in different tabs in the HTML? Anyway, 1,000 mags is too much data to actually look at and get insight from.

@misialq
Copy link
Contributor

misialq commented Feb 22, 2024

Hey @Sann5, thanks for investigating! I like your solution of splitting those into tabs. Maybe we can even go a bit lower? 500? 100? (could be configurable as well, with a max set to, say, 500?)

@Sann5
Copy link
Contributor Author

Sann5 commented Feb 22, 2024

Just as a reference what Pau is trying to plot are 5,260 MAGs, distributed among 150 samples which contain a median of 30 MAGs, (min 1, max 127).

@Sann5 Sann5 changed the title BUG: evaluate-busco brakes when input mags > 5000 BUG: evaluate-busco brakes when input mags > ~1,300 Feb 22, 2024
@Sann5
Copy link
Contributor Author

Sann5 commented Feb 22, 2024

@misialq this would in Pau use case create around 11 tabs. Would that be ok?

@misialq
Copy link
Contributor

misialq commented Feb 22, 2024

Hmmm, I'm thinking now that actually it's more complicated than saying how many MAGs you want to display - they are grouped per sample, right? We don't want to split MAGs from the same sample between two tabs - would this be possible?

I don't think the number of tabs is a problem per se... It's either that or one thousand of rows in less tabs... That's why I thought to make it configurable, but maybe the default could still be something higher (like 1k) if you think that's better. I have not seen a visualization with so many MAGs so it's a bit tough to imagine what this would look like...

@Sann5
Copy link
Contributor Author

Sann5 commented Feb 22, 2024

@misialq

We don't want to split MAGs from the same sample between two tabs - would this be possible?

I would make it so that doesn't happen. Selecting samples iteratively until the threshold is reached.

I have not seen a visualization with so many MAGs so it's a bit tough to imagine what this would look like...

I can send you the one that is on the limit (~1,300) haha. I ll do it over Slack.

I don't think the number of tabs is a problem per se...

The other option that I am looking into is having a dropdown menu from which users can select which range of samples to look at (like in the second tab of the chekm viz), e.g. the first 500, the second 500, and so on. As you said the number of MAGs per range can be left configurable.

@misialq
Copy link
Contributor

misialq commented Feb 22, 2024

Sounds good - I like the solution with a drop-down or something similar - let me know when you have something to look at 😎

@misialq
Copy link
Contributor

misialq commented Mar 13, 2024

As the solution included the vegafusion transformer which caused trouble at conda installation (also, technically it is not available for all platforms yet) I'm reopening this - we'll need to find a different one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants