-
Notifications
You must be signed in to change notification settings - Fork 0
Data
Barry edited this page Jun 30, 2024
·
14 revisions
Raw data were taken from Brimacombe, C. (2023, March 30). Shortcomings of using freely available open species interaction networks produced by different publications. https://doi.org/10.17605/OSF.IO/MY9TV
- π link-predict β
root folder
- π data
- π processed
- π features
- π features_py.csv β
features generated by python script
- π features_R.csv β
features generated by R script
- π features_py.csv β
- π networks
- π subsamples_edge_lists.csv β
sub-sampled networks (inc original networks)
- π subsamples_metadata.csv β
sub-sampled networks metadata
- π subsamples_edge_lists.csv β
- π features
- π processed
- π data
fields:
- link_ID - Auto generated ID of link (existing an non-existing)
Other fields are the features themselves, where they differ between the two files as different features are computed by two different scripts, a python script and a R script.
fields:
- subsample_ID - Auto generated ID of a sampled network
- name - Name of the network
- community - Ecological community (Plant-Pollinator, Plant-Seed Dispersers, etc..)
- fraction - Represent the proportion of observed links after sub-sampling. currently have only 0.8 (80% observed links) and 1.0 (Original network)
-
type- Deprecated -
layer- Deprecated -
repetition- Deprecated
fields:
- link_ID - Auto generated ID of link (existing an non-existing)
- subsample_ID - Auto generated ID of a sampled network
- higher_level - Name of species of the higher trophic level
- lower_level - Name of species of the lower trophic level
- weight - weight of the link, but currently not used so it is converted to binary so 1.0
- class - link (1), non-links(0), and subsampled-links(-1) which are converted to 1 or 0 depending on the step (0 for feature extraction, 1 in test set..)
/results/ directory is described by the following files tree. The folders and code files are ordered according to the execution steps.
- π results
- π results_preprocess.Rmd β
Loading and processing the results data, so each figure will have its own prepared dataset.
- π results_figs.Rmd β
Loading the output of results_preprocess.Rmd and generating figures
- π raw β
Contains the "raw" results, which are mainly the output of the ML pipeline
- π results_domains.csv β
Contains results from a ML model trained and tested on varying groups (network domains/communities) combinations to assess cross-group generalization.
- π results_models.csv β
Contains results from different ML models
- π results_other_models.csv β
Contains results from different predictive models
- π feature_importance.csv β
Feature importances of all ML models
- π results_domains.csv β
- π intermediate β
Contains intermediate processed files, mainly the output of results_preprocess.Rmd
- π df_pred_heatmap.csv β
Result of a specific network in the test set, intended for demonstration figure.
- π metrics_df_long.csv β
Evaluation metrics of each network, long format
- π metrics_multi_df_long.csv β
Evaluation metrics of each network with multiple models, long format
- π metrics_type_df_long.csv β
Evaluation metrics of each network with varying group, long format
- π compare_other_models_metrics_df.csv β
Results of different predictive models
- π network_lvl_features.csv β
Features (network level only) for EDA
- π pr_df.csv β
Results of precision-recall curve
- π roc_df.csv β
Results of roc curve
- π auc_df.csv β
AUC values of roc and pr curves
- π test_data.csv β
Test set (link ids in test set) with metadata
- π bounds_summary_df.csv β
Results of theoretical bounds of each metric
- π pca_df.csv β
PCA components of network-level features
- π df_pred_heatmap.csv β
- π final β
Contains the final figures and tables, mainly the output of results_figs
- π communities.pdf β
Distributions of performance measures - by community
- π eval_all.pdf β
Distributions of performance measures
- π features.csv β
Information about each feature
- π importance_pres.pdf β
Feature importance for tested ML model (RandomForest)
- π kruskal_wallis.csv β
Results of Kruskal Wallis test, comparing metrics of different communities
- π mann_whitney.csv β
Results of Mann-Whitney U Tests comparing the distributions of some metrics for various training and test combinations
- π networks_table.csv β
Information (source) about each network
- π networks_summary_properties.csv β
Summary of network properties
- π predictions.pdf β
Link prediction example for a host-parasite network
- π ROC.pdf β
ROC curve + PR curve
- π split_set.pdf β
Link prediction within and between community types
- π SI_community.pdf β
Distribution of link probabilities across different ecological communities
- π
SI_completeβComparing learning from complete vs subsampled networks
- π SI_features_hist.pdf β
Histogram of selected network properties
- π SI_importance.pdf β
Feature importance for all tested ML models
- π SI_models.pdf β
ML models performance comparison, multiple evaluation metrics
- π SI_probabilities.pdf β
Distribution of link probabilities obtained from the model
- π SI_sensitivity β
Comparing performance for different fraction of removed links
- π SI_sensitivity_com β
Comparing performance for different fraction of removed links, for each community
- π SI_tradeoff.pdf β
The precision-recall tradeoff as a function of classification threshold
- π communities.pdf β
- π results_preprocess.Rmd β
common fields in csvs:
- link_ID - Auto generated ID of link (existing an non-existing)
- community - Ecological community (Plant-Pollinator, Plant-Seed Dispersers, etc..)
- name - Name of the network
- fold - number of the cv fold the instance are from (usually between 1-5 or 1-3)
- model - name of the ML model used
- y_proba - probability of link of the instance, given by the model
- metric - name of the evaluation metric used
- feature - name of the feature
- importance - importance value of the feature
- SBM_Prob - probability of link of the instance, given by SBM model
- C_Prob - probability of link of the instance, given by connectance model
- type_train - links of which communities are forming the train data
- type_test - links of which communities are forming the test data