Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'significant_means' #177

Open
dhairya02 opened this issue Mar 24, 2024 · 2 comments
Open

KeyError: 'significant_means' #177

dhairya02 opened this issue Mar 24, 2024 · 2 comments

Comments

@dhairya02
Copy link

Hi I am running cpdb_statistical_analysis_method of cellphone db. My anndata shape is (6890, 2000) with following parameters:

cpdb_file_path = 'Resources/cellphonedb.zip'
    meta_file_path = f'Data/{subsample}/CPDB_data/metadata.txt'
    counts_file_path = f'Data/{subsample}/CPDB_data/counts.h5ad'
    out_path = f'Data/{subsample}/CPDB_results/'

    os.makedirs(out_path, exist_ok=True)
    metadata = pd.read_csv(meta_file_path, sep = '\t')

    cpdb_results = cpdb_statistical_analysis_method.call(
        cpdb_file_path = cpdb_file_path,                 # mandatory: CellPhoneDB database zip file.
        meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
        counts_file_path = counts_file_path,             # mandatory: normalized count matrix.
        counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
        iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
        threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
        threads = 40,                                    # number of threads to use in the analysis.
        debug_seed = 42,                                 # debug randome seed. To disable >=0.
        result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
        pvalue = 0.05,                                   # P-value threshold to employ for significance.
        subsampling = False,                             # To enable subsampling the data (geometri sketching).
        subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
        subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
        subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
        separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
        debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
        output_path = out_path,                          # Path to save results.
        output_suffix = subsample                        # Replaces the timestamp in the output files by a user defined string in the  (default: None).
    )
I am getting the following error:
Reading user files...
The following user files were loaded successfully:
Data/Control4003/CPDB_data/counts.h5ad
Data/Control4003/CPDB_data/metadata.txt
[ ][CORE][23/03/24-20:24:37][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:40 Precision:3
[ ][CORE][23/03/24-20:24:37][WARNING] Debug random seed enabled. Set to 42
[ ][CORE][23/03/24-20:24:37][INFO] No CellphoneDB interactions found in this input.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[6], line 11
      8 os.makedirs(out_path, exist_ok=True)
      9 metadata = pd.read_csv(meta_file_path, sep = '\t')
---> 11 cpdb_results = cpdb_statistical_analysis_method.call(
     12     cpdb_file_path = cpdb_file_path,                 # mandatory: CellPhoneDB database zip file.
     13     meta_file_path = meta_file_path,                 # mandatory: tsv file defining barcodes to cell label.
     14     counts_file_path = counts_file_path,             # mandatory: normalized count matrix.
     15     counts_data = 'hgnc_symbol',                     # defines the gene annotation in counts matrix.
     16     iterations = 1000,                               # denotes the number of shufflings performed in the analysis.
     17     threshold = 0.1,                                 # defines the min % of cells expressing a gene for this to be employed in the analysis.
     18     threads = 40,                                    # number of threads to use in the analysis.
     19     debug_seed = 42,                                 # debug randome seed. To disable >=0.
     20     result_precision = 3,                            # Sets the rounding for the mean values in significan_means.
     21     pvalue = 0.05,                                   # P-value threshold to employ for significance.
     22     subsampling = False,                             # To enable subsampling the data (geometri sketching).
     23     subsampling_log = False,                         # (mandatory) enable subsampling log1p for non log-transformed data inputs.
     24     subsampling_num_pc = 100,                        # Number of componets to subsample via geometric skectching (dafault: 100).
     25     subsampling_num_cells = 1000,                    # Number of cells to subsample (integer) (default: 1/3 of the dataset).
     26     separator = '|',                                 # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
     27     debug = False,                                   # Saves all intermediate tables employed during the analysis in pkl format.
     28     output_path = out_path,                          # Path to save results.
     29     output_suffix = subsample                        # Replaces the timestamp in the output files by a user defined string in the  (default: None).
     30 )

File /gpfs/share/apps/anaconda3/gpu/5.2.0/envs/conda_tsirigoslab_transloc_env/lib/python3.8/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py:148, in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, active_tfs_file_path, iterations, threshold, threads, debug_seed, result_precision, pvalue, subsampling, subsampling_log, subsampling_num_pc, subsampling_num_cells, separator, debug, output_suffix, score_interactions)
    124     counts = ss.subsample(counts)
    126 analysis_result = cpdb_statistical_analysis_complex_method.call(meta.copy(),
    127                                                                 counts,
    128                                                                 counts_relations,
   (...)
    145                                                                 output_path
    146                                                                 )
--> 148 significant_means = analysis_result['significant_means']
    149 max_rank = significant_means['rank'].max()
    150 significant_means['rank'] = significant_means['rank'].apply(lambda rank: rank if rank != 0 else (1 + max_rank))

KeyError: 'significant_means'

Can you please help as to what it means?

@cakirb
Copy link
Collaborator

cakirb commented Mar 25, 2024

Hi @dhairya02,

To be able to debug the issue, could you send the input files you are using to [email protected]? If the files are too big to share via email, you can also send us the link to access them.

Best,
Batu

@cakirb
Copy link
Collaborator

cakirb commented May 21, 2024

Hi @dhairya02,

Sorry we couldn't help you since we haven't received your inputs. However, as mentioned in #186 with the same reported error, it's possible that your analysis ends up with finding no CellPhoneDB interactions, and this could be related that you might be using genes from a different organism, not human. If this is the case, you should convert the genes to their corresponding human orthologues. You can check details in our documentation: https://cellphonedb.readthedocs.io/en/latest/RESULTS-DOCUMENTATION.html#counts-file

Hope this helps!

Best,
Batu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants