You can find the code on github here.
tSVD-based imputation method:
- Perform dimensionality reduction on the data with tSVD
- and then, Transform the data back to the original space
- copy the value of the 0 part of the original data from the transformed values.
In the inference phase, the model outputs the average of the five predicted target data.
In selecting important genes in CITEseq, the correlation coefficient is calculated for each batch and select only genes with high correlation in many batches.
Genes were selected from those related to the target proteins and pathway.
I use Reactome as pathway database.
In the inference phase, the model outputs the average of the five predicted target data.
I used two evaluation schemes.
- Evaluation with cross validation:
- 5-fold cross validation grouped by donor and day
- Evaluation for hyperparameter optimization with Optuna:
- Training data set is divided into training and validation data sets. ( Training data set: 80%, validation data set: 20%. )
I used the weighted average of predictions of the following models.
- Models trained with changing the seed
- Models fine-tuned on only some batches
- Batch combination pattern examples: males only, female only, Day 4, 7 only, etc.
- Use a model trained on the full training data set as a pre-training model
Download resources
res_dir=src/shuji_suzuki/resources
mkdir -p "$res_dir"
wget https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/hgnc_complete_set.txt -O "$res_dir/hgnc_complete_set.txt"
wget https://reactome.org/download/current/ReactomePathways.gmt.zip -O "$res_dir/ReactomePathways.gmt.zip" &&
unzip "$res_dir/ReactomePathways.gmt.zip" -d "$res_dir" &&
rm "$res_dir/ReactomePathways.gmt.zip"
Clone repo
echo shu65_openproblems > src/shuji_suzuki/.gitignore
git clone https://github.com/shu65/open-problems-multimodal.git src/shuji_suzuki/shu65_openproblems
Run method
viash run src/shuji_suzuki/config.vsh.yaml -- \
--input sample_data \
--output output \
---memory 100GB \
---cpus 30