Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

77 sparseoptimization pattern discrepancy #125

Draft
wants to merge 1,601 commits into
base: master
Choose a base branch
from

Conversation

dimalvovs
Copy link
Contributor

  • add comments to chi-squared calc
  • add 1 to denominator and numerator to avoid NaN in chi-squared calc

@dimalvovs dimalvovs linked an issue Nov 15, 2024 that may be closed by this pull request
Copy link

@drbergman drbergman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on how the 1+'s will affect calculations and their accuracy for a chisq calculation. I'll only ask: isn't the purpose of the SparseIterator that it will skip all 0 values? So dsq = get<1>(it) * get<1>(it) must be > 0?

Also, though it's clearly outside the scope of this PR, it feels like there's the possibility for computational savings by storing the results of the dot products inside the for loop to be re-used in the while loop. Similarly, I think the dsq values are constant throughout the CoGAPS run (they're just the element-wise square of the genes x sample matrix, right?). Could a static variable be used to store these?

@dimalvovs
Copy link
Contributor Author

I can't comment on how the 1+'s will affect calculations and their accuracy for a chisq calculation. I'll only ask: isn't the purpose of the SparseIterator that it will skip all 0 values? So dsq = get<1>(it) * get<1>(it) must be > 0?

that is exactly right, I think the root cause is the implementation of the sparseIterator, so maybe this draft PR should be more of a proof that that the NaN comes from the denominator and demonstrate that it is curable by +1ing.

@dimalvovs
Copy link
Contributor Author

dimalvovs commented Nov 22, 2024

additional to-dos from a call with @favorov:

  • change comments for clarity: mDMatrix is data (D); mMatrix is A (left mult matrix) if D is samples x genes; and P is D is genes x samples; *mOtherMatrix contans const pointer to vis-a-vis of mMatrix;

Matrix mDMatrix; // samples by genes for A, genes by samples for P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SparseOptimization pattern discrepancy
6 participants