You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rna.obsm['X_pca'] has the dim (4382, 50) while atac_genes.obsm['X_lsi'] has the dim (3166, 49). atac_genes.obsm['X_lsi'] is the output of MultiMAP.TFIDF_LSI() in init.py and MultiMAP.TFIDF_LSI() called tfidf() in matrix.py
I later checked in matrix and I think the dim number = 49 might due to the discarding of the first column of the sklearn.decomposition.TruncatedSVD() output?
# n_components passed to here is 50
def tfidf(X, n_components, binarize=True, random_state=0):
from sklearn.feature_extraction.text import TfidfTransformer
sc_count = np.copy(X)
if binarize:
sc_count = np.where(sc_count < 1, sc_count, 1)
tfidf = TfidfTransformer(norm='l2', sublinear_tf=True)
normed_count = tfidf.fit_transform(sc_count)
lsi = sklearn.decomposition.TruncatedSVD(n_components=n_components, random_state=random_state)
lsi_r = lsi.fit_transform(normed_count)
# Here↓↓↓↓
X_lsi = lsi_r[:, 1:]
return X_lsi
I wonder is the discarding of the column #0 is to remove the PC1 which usually strongly correlated to sequencing depth? In this way, the 2 inputs of MultiMAP.Integration() has PCA dim of 50 and 49 respectively although the function still runs normally and returns a result with dim (7548, 2), but, is that okay to do so? I have an impression reading the preprint that the 2 dataset to be integrated should have the same PC dim number after PCA reduction, because the inter-dataset point distance need to be calculated. Please could you correct me if my understanding is wrong.
The text was updated successfully, but these errors were encountered:
Hi there,
Have read the preprint very nice one.
I am trying to run the example script in the project, and I found that, the input of MultiMAP.integration:
adata = MultiMAP.Integration([rna, atac_genes], ['X_pca', 'X_lsi'])
rna.obsm['X_pca'] has the dim (4382, 50) while atac_genes.obsm['X_lsi'] has the dim (3166, 49). atac_genes.obsm['X_lsi'] is the output of MultiMAP.TFIDF_LSI() in init.py and MultiMAP.TFIDF_LSI() called tfidf() in matrix.py
I later checked in matrix and I think the dim number = 49 might due to the discarding of the first column of the sklearn.decomposition.TruncatedSVD() output?
I wonder is the discarding of the column #0 is to remove the PC1 which usually strongly correlated to sequencing depth? In this way, the 2 inputs of MultiMAP.Integration() has PCA dim of 50 and 49 respectively although the function still runs normally and returns a result with dim (7548, 2), but, is that okay to do so? I have an impression reading the preprint that the 2 dataset to be integrated should have the same PC dim number after PCA reduction, because the inter-dataset point distance need to be calculated. Please could you correct me if my understanding is wrong.
The text was updated successfully, but these errors were encountered: