Network inference performance complexity: a consequence of topological, experimental, and algorithmic determinants
LAST UPDATED: 2018-05-18
_banjo
: BANJO v2.2.0 (accessed February 28, 2017)_genie3
: GENIE3 (accessed October 3, 2017)_mider
: MIDER v2_JAN2015 (accessed October 3, 2017)_tigress
: TIGRESS v2.1 (accessed October 3, 2017)scripts
:.pbs
scripts for submitting jobs on QUESTlib
: library functionspipeline
: main code for simulation, inference, and analysis
Some minor edits were made to the downloaded algorithms. For all cases, extraneous files associated with each of the algorithms (e.g. sample data, documentation) were removed for clarity and simplicity. Note that any associated license files are included.
GENIE3
- commented out
fprintf
calls inGENIE3.m
- commented out
tic
andtoc
inGENIE3.m
- commented out
MIDER
- commented out
fprintf
calls inmider.m
- output of
mider.m
changed from[Output]
tocon_array
to fit pipeline - value of
pb
inmider.m
changed from5
to2
- commented out
TIGRESS
- commented out
fprintf
/disp
calls inscore_edges.m
- commented out
tic
andtoc
intigress.m
- added multiple variable selection check in
lars.m
- added empty add variable check in
lars.m
- added zero padding for early exit in
stability_selection.m
- commented out
Pipeline was written specifically for running on Northwestern's high performance computing core QUEST. QUEST currently has 20-28 cores per node so scripts are written requesting no more than 20 nodes at a time.
The working directory on QUEST contains the following:
~/Matlab/
_genie3
_mider
_tigress
_banjo
logs/ -- standard out and error logs
results/ -- results stored here)
*.m (all Matlab files
*.pbs (all submission scripts)
Simulation is very fast and can be run on the login node. Load Matlab using:
module load matlab/r2016a
matlab -nosplash -nodisplay -singleCompThread
Run the simulations using:
for i = 1:36
LM_CONTROLLER(1, i); % generate in silico data
end
LM_CONTROLLER(2); % compiles results into single array
LM_CONTROLLER(3); % generates null models
Network inference using selection of network inference methods. There are 108 different motif/logic gate/stimulus combinations.
msub -t corr[1-108] run_corr.pbs
msub -t genie3[1-108] run_genie3.pbs
msub -t mider[1-108] run_mider.pbs
msub -t tigress[1-108] run_tigress.pbs
msub -t banjo[1-108] run_banjo.pbs
Run inference algorithms on the null networks. There are 255 different stimulus/noise/parameter A combinations:
msub -t genie3[1-255] run_genie3_nulls.pbs
msub -t corr[1-255] run_corr_nulls.pbs
msub -t mider[1-255] run_mider_nulls.pbs
msub -t tigress[1-255] run_tigress_nulls.pbs
msub -t banjo[1-255] run_banjo_nulls.pbs
Concatenate results from nulls into a single matrix.
for i = 1:15
LM_CONTROLLER(6, i, ALGORITHM);
end
The script LM_summarize.py
can be used to check which network inference runs are missing. Run using:
ls -LRh results/ > summary.txt
python3 LM_summarize.py
Calculation is relatively fast and can be run in using:
for i = 1:108
LM_CONTROLLER(7, i, ALGORITHM);
end
where ALGORITHM
is a string denoting which algorithm to process (GENIE3
, CORR
, TIGRESS
, MIDER
, or BANJO
).
Final results are compiled into .csv
files for the data browser.
LM_CONTROLLER(8); % save simulation data
LM_CONTROLLER(9, 0, ALGORITHM); % save inference data
LM_CONTROLLER(10, 0, ALGORITHM); % save summary data