-
Notifications
You must be signed in to change notification settings - Fork 0
Data
Barry edited this page Jul 16, 2024
·
14 revisions
Raw data were taken from Brimacombe, C. (2023, March 30). Shortcomings of using freely available open species interaction networks produced by different publications. https://doi.org/10.17605/OSF.IO/MY9TV
The files goes to data/raw/networks.
Data files that exceeded GitHub's size limits were compressed into zip format.
- π link-predict β
root folder
- π data
- π processed
- π features
- π features_py.csv β
features generated by python script
- π features_R.csv β
features generated by R script
- π features_py.csv β
- π networks
- π subsamples_edge_lists.csv β
sub-sampled networks (inc original networks)
- π subsamples_metadata.csv β
sub-sampled networks metadata
- π subsamples_edge_lists.csv β
- π features
- π processed
- π data
Field | Description |
---|---|
link_ID | Auto generated ID of link (existing an non-existing) |
Other fields are the features themselves, where they differ between the two files as different features are computed by two different scripts, a python script and a R script.
Field | Description |
---|---|
subsample_ID | Auto generated ID of a sampled network |
name | Name of the network |
community | Ecological community (e.g., Plant-Pollinator, Plant-Seed Dispersers, etc.) |
fraction | Represents the proportion of observed links after sub-sampling. Currently have only 0.8 (80% observed links) and 1.0 (Original network) |
Deprecated | |
Deprecated | |
Deprecated |
Field | Description |
---|---|
link_ID | Auto generated ID of link (existing and non-existing) |
subsample_ID | Auto generated ID of a sampled network |
higher_level | Name of species of the higher trophic level |
lower_level | Name of species of the lower trophic level |
weight | Weight of the link, but currently not used so it is converted to binary (1.0) |
class | Classifies links (1), non-links (0), and subsampled-links (-1) which are converted to 1 or 0 depending on the step (0 for feature extraction, 1 in test set) |
/results/ directory is described by the following files tree. The folders and code files are ordered according to the execution steps.
- π results
- π results_preprocess.Rmd β
Loading and processing the results data, so each figure will have its own prepared dataset.
- π results_figs.Rmd β
Loading the output of results_preprocess.Rmd and generating figures
- π raw β
Contains the "raw" results, which are mainly the output of the ML pipeline
- π results_domains.csv β
Contains results from a ML model trained and tested on varying groups (network domains/communities) combinations to assess cross-group generalization.
- π results_models.csv β
Contains results from different ML models
- π results_other_models.csv β
Contains results from different predictive models
- π feature_importance.csv β
Feature importances of all ML models
- π params_models.csv β
Parameters space and best parameters selected for each model
- π results_ML_by_single_networks.csv β
Results for ML transductive model
- π results_domains.csv β
- π intermediate β
Contains intermediate processed files, mainly the output of results_preprocess.Rmd
- π df_pred_heatmap.csv β
Result of a specific network in the test set, intended for demonstration figure.
- π metrics_df_long.csv β
Evaluation metrics of each network, long format
- π metrics_multi_df_long.csv β
Evaluation metrics of each network with multiple models, long format
- π metrics_type_df_long.csv β
Evaluation metrics of each network with varying group, long format
- π compare_other_models_metrics_df.csv β
Results of different predictive models
- π network_lvl_features.csv β
Features (network level only) for EDA
- π pr_df.csv β
Results of precision-recall curve
- π roc_df.csv β
Results of roc curve
- π auc_df.csv β
AUC values of roc and pr curves
- π test_data.csv β
Test set (link ids in test set) with metadata
- π bounds_summary_df.csv β
Results of theoretical bounds of each metric
- π bounds_summary_df_transductive.csv β
Results of theoretical bounds of each metric (ML transductive model)
- π pca_df.csv β
PCA components of network-level features
- π df_pred_heatmap.csv β
- π final β
Contains the final figures and tables, mainly the output of results_figs
- π ILP_vs_TLP.pdf β
Comparing inductive and transductive models, multiple evaluation metrics
- π roc_curve.pdf β
ROC curve
- π pr_curve.pdf β
Precision-Recall curve
- π communities.pdf β
Distributions of performance measures - by community
- π cross_community_prediction.pdf β
Heatmap of prediction within and between community types
- π model_bounds_ILP_TLP.pdf β
Bounds of model predictions, comparing inductive and transductive models
- π SI_networks_PCA.pdf β
PCA of networks, separated by network-level topological features
- π ILP_vs_TLP_community.pdf β
Comparing inductive and transductive models, multiple evaluation metrics, per community
- π SI_networks_summary_properties.csv β
Summary of network properties
- π SI_KW_communities.csv β
Results of Kruskal Wallis test, comparing metrics of different communities
- π SI_KW_communities_Dunn.csv β
Dunn post-hoc tests for SI_KW_communities.csv
- π SI_models.pdf β
ML models performance comparison, multiple evaluation metrics
- π SI_predictions.pdf β
Link prediction example for a host-parasite network
- π feature_importance.pdf β
Feature importance for tested ML model (RandomForest)
- π SI_importance.pdf β
Feature importance for all tested ML models
- π SI_probabilities.pdf β
Distribution of link probabilities obtained from the model
- π SI_PR_tradeoff.pdf β
The precision-recall tradeoff as a function of classification threshold
- π SI_probabilities_community.pdf β
Density plot of link probabilities, for each community, by class
- π networks_table.csv β
Information (source) about each network
- π eval_all.pdf β
Distributions of performance measures
- π features.csv β
Information about each feature
- π model_bounds.pdf β
Bounds of model predictions
- π SI_KW_cross.csv β
Results of Kruskal Wallis test, comparing metrics of cross communities
- π SI_KW_cross_Dunn.csv β
Dunn post-hoc tests for SI_KW_cross_Dunn.csv
- π SI_cross_community.pdf β
Link prediction within and between community types
- π SI_community.pdf β
Distribution of link probabilities across different ecological communities
- π SI_features_hist.pdf β
Histogram of selected network properties
- π SI_features_hist_all_nets.pdf β
Histogram of selected network properties, across networks
- π ILP_vs_TLP.pdf β
- π results_preprocess.Rmd β
common fields in csvs:
Field | Description |
---|---|
link_ID | Auto generated ID of link (existing and non-existing) |
community | Ecological community (e.g., Plant-Pollinator, Plant-Seed Dispersers, etc.) |
name | Name of the network |
fold | Number of the CV fold the instance are from (usually between 1-5 or 1-3) |
model | Name of the ML model used |
y_proba | Probability of link of the instance, given by the model |
y_true | True class of the instance |
metric | Name of the evaluation metric used |
feature | Name of the feature |
importance | Importance value of the feature |
SBM_Prob | Probability of link of the instance, given by SBM model |
C_Prob | Probability of link of the instance, given by connectance model |
type_train | Links of which communities are forming the train data |
type_test | Links of which communities are forming the test data |