You must be signed in to change notification settings - Fork 0
Barry edited this page Jun 30, 2024
14 revisions
Raw data were taken from Brimacombe, C. (2023, March 30). Shortcomings of using freely available open species interaction networks produced by different publications. https://doi.org/10.17605/OSF.IO/MY9TV
- π link-predict β
root folder
- π data
- π processed
- π features
- π features_py.csv β
features generated by python script
- π features_R.csv β
features generated by R script
- π features_py.csv β
- π networks
- π subsamples_edge_lists.csv β
sub-sampled networks (inc original networks)
- π subsamples_metadata.csv β
sub-sampled networks metadata
- π subsamples_edge_lists.csv β
- π features
- π processed
- π data
- link_ID - Auto generated ID of link (existing an non-existing)
Other fields are the features themselves, where they differ between the two files as different features are computed by two different scripts, a python script and a R script.
- subsample_ID - Auto generated ID of a sampled network
- name - Name of the network
- community - Ecological community (Plant-Pollinator, Plant-Seed Dispersers, etc..)
- fraction - Represent the proportion of observed links after sub-sampling. currently have only 0.8 (80% observed links) and 1.0 (Original network)
type- Deprecated -
layer- Deprecated -
repetition- Deprecated
- link_ID - Auto generated ID of link (existing an non-existing)
- subsample_ID - Auto generated ID of a sampled network
- higher_level - Name of species of the higher trophic level
- lower_level - Name of species of the lower trophic level
- weight - weight of the link, but currently not used so it is converted to binary so 1.0
- class - link (1), non-links(0), and subsampled-links(-1) which are converted to 1 or 0 depending on the step (0 for feature extraction, 1 in test set..)
/results/ directory is described by the following files tree. The folders and code files are ordered according to the execution steps.
- π results
- π results_preprocess.Rmd β
Loading and processing the results data, so each figure will have its own prepared dataset.
- π results_figs.Rmd β
Loading the output of results_preprocess.Rmd and generating figures
- π raw β
Contains the "raw" results, which are mainly the output of the ML pipeline
- π results_domains.csv β
Contains results from a ML model trained and tested on varying groups (network domains/communities) combinations to assess cross-group generalization.
- π results_models.csv β
Contains results from different ML models
- π results_other_models.csv β
Contains results from different predictive models
- π feature_importance.csv β
Feature importances of all ML models
- π results_domains.csv β
- π intermediate β
Contains intermediate processed files, mainly the output of results_preprocess.Rmd
- π df_pred_heatmap.csv β
Result of a specific network in the test set, intended for demonstration figure.
- π metrics_df_long.csv β
Evaluation metrics of each network, long format
- π metrics_multi_df_long.csv β
Evaluation metrics of each network with multiple models, long format
- π metrics_type_df_long.csv β
Evaluation metrics of each network with varying group, long format
- π compare_other_models_metrics_df.csv β
Results of different predictive models
- π network_lvl_features.csv β
Features (network level only) for EDA
- π pr_df.csv β
Results of precision-recall curve
- π roc_df.csv β
Results of roc curve
- π auc_df.csv β
AUC values of roc and pr curves
- π test_data.csv β
Test set (link ids in test set) with metadata
- π bounds_summary_df.csv β
Results of theoretical bounds of each metric
- π pca_df.csv β
PCA components of network-level features
- π df_pred_heatmap.csv β
- π final β
Contains the final figures and tables, mainly the output of results_figs
- π communities.pdf β
Distributions of performance measures - by community
- π eval_all.pdf β
Distributions of performance measures
- π features.csv β
Information about each feature
- π importance_pres.pdf β
Feature importance for tested ML model (RandomForest)
- π kruskal_wallis.csv β
Results of Kruskal Wallis test, comparing metrics of different communities
- π mann_whitney.csv β
Results of Mann-Whitney U Tests comparing the distributions of some metrics for various training and test combinations
- π networks_table.csv β
Information (source) about each network
- π networks_summary_properties.csv β
Summary of network properties
- π predictions.pdf β
Link prediction example for a host-parasite network
- π ROC.pdf β
ROC curve + PR curve
- π split_set.pdf β
Link prediction within and between community types
- π SI_community.pdf β
Distribution of link probabilities across different ecological communities
- π
SI_completeβComparing learning from complete vs subsampled networks
- π SI_features_hist.pdf β
Histogram of selected network properties
- π SI_importance.pdf β
Feature importance for all tested ML models
- π SI_models.pdf β
ML models performance comparison, multiple evaluation metrics
- π SI_probabilities.pdf β
Distribution of link probabilities obtained from the model
- π SI_sensitivity β
Comparing performance for different fraction of removed links
- π SI_sensitivity_com β
Comparing performance for different fraction of removed links, for each community
- π SI_tradeoff.pdf β
The precision-recall tradeoff as a function of classification threshold
- π communities.pdf β
- π results_preprocess.Rmd β
common fields in csvs:
- link_ID - Auto generated ID of link (existing an non-existing)
- community - Ecological community (Plant-Pollinator, Plant-Seed Dispersers, etc..)
- name - Name of the network
- fold - number of the cv fold the instance are from (usually between 1-5 or 1-3)
- model - name of the ML model used
- y_proba - probability of link of the instance, given by the model
- metric - name of the evaluation metric used
- feature - name of the feature
- importance - importance value of the feature
- SBM_Prob - probability of link of the instance, given by SBM model
- C_Prob - probability of link of the instance, given by connectance model
- type_train - links of which communities are forming the train data
- type_test - links of which communities are forming the test data