-
Notifications
You must be signed in to change notification settings - Fork 3
/
PKG-INFO
executable file
·188 lines (145 loc) · 10.2 KB
/
PKG-INFO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
Metadata-Version: 2.1
Name: SIMBA3D
Version: 1.3.0
Summary: Structure Inference from Multiscale BAyes in 3 Dimensions (SIMBA3D)
Home-page: UNKNOWN
Author: Michael M. Rosenthal
Author-email: [email protected]
License: MIT
Description: # About
Structure Inference from Multiscale BAyes in 3 Dimensions (SIMBA3D)
This tool was primarily developed to study the 3D structure Chromatin. It
include analytically derived gradients for a variety of penalty functions to
estimate 3D architecture from pairwise interaction matrices.
# Dependencies
## Required Packages
* numpy - needed for scientic programing
* scipy - needed for the optimization tools and other scientic programing
## Recomended Packages
* matplotlib - needed if you want to plot the results in python
* simplejson - recommended over standard json packaged with python
# Installation
After installing the python dependencies, one can install simba3d using
distutils with the following command
> python setup.py --install
If pip is installed, then it can be installed from this directory using
> pip install .
Pip results in a cleaner install, and it is easier to remove if desired
> pip uninstall simba3d
simba3d can either be used as a python module or as a command line interface.
Using it as a python module has the advantage that a long and tedious sequence
list of tasks can be generated using python. The command line interface has
the appeal of being easy to run from different directories, so that the user
can create a JSON tasklist in any directory and run it with the `<simba3d>`
command.
simba3d can also be run locally, but this can limit the file structure
organization of the project. To do this, copy the 'simba3d' directory (which
contains the '__init__.py') into your project directory. The location of this
directory should be located inside the directory you plan to run python from.
Next create a python script to create the task list, and a python script to
run the simba3d mp_handler.
Examples have been included in the examples directory.
# Command Line Interface
After installing simba3d, the simba3d command should be available.
For help type:
> simba3d -h
If you are using a local install you can also use <'python run_simba3d.py'> just
like the <'simba3d'> command. For example, to get help you can type:
> python run_simba3d.py -h
The help documentation will tell the user that there are two ways to pass in
file parameters. One way is to use the -r option to read in a json file
containing a tasklist. If a single task is to be ran, then one can pass in input
and output instructions and the parameter settings using the options specified.
This feature does not currently have all the available parameters and will most
likely be phased out of future versions. It is recommended either individual
files with different tasklists are used or one big file with a large list of
tasks is used.
## JSON Tasks for simba3d
simba3d requires some information to know what to run. This information can be
efficiently passed in as a list of tasks stored in a formated json file.
There are a lot of options available which can make the process of specifying
the task look complicated at first. There are defaults set for many of the
keys (which are shown in square brackets below).
The individual tasks are dictionaries with the following keys:
* 'taskname' (optional) name of the task [uuid]
* 'uuid' (recommended) unique identifier specific to the task [randomly generated]
* 'store_task' (optional) store the task in the output [True]
* 'randomize_initialization' (optional) should we randomize the initialized curve? [False]
* 'seed' (optional) integer between 0 and 4294967295 to fixed seed for randomizing inititialization
* 'check_jacobian' (optional) to check the alytical gradient with numerical gradient [False]
* 'input_file_names' manage file names
* 'inputdir' parent directory for the input files ['.']
* "pairwise_contact_matrix" the pairwise interacation matrix file (path relative to inputdir )
* "initialized_curve" (optional) file to import initialized curve (path relative to inputdir )
* "population_contact_matrix" (optional) the pairwise interacation matrix file from poptulation data (path relative to inputdir )
* "prior_shape_model" (optional) shape prior curve file (path relative to inputdir )
* 'outputdir' parent directory for the output files ['.']
* 'output_filename' (optional) the file name to output the result it (.mat or .npz) [uuid.npz]
* 'parameters'
* 'a' (optional) the a parameter [-3.0]
* 'b' (optional) not identifiable with unconstrained scale [1.0]
* 'term_weights' dictionary storing the weights for the penalty terms
* 'data' (optional) weight for the data term [1.0]
* 'uniform spacing' (optional) lambda_1 penalty weight [0.0]
* 'smoothing' (optional) lambda_2 penalty weight [0.0]
* 'population prior' (optional) weight for the population matrix prior [0.0]
* 'options'
* 'gtol' (optional) tollerance for the norm of the gradient (stopping criteria) [e-5]
* 'maxitr' (optional) set maximum number of iterations [100000]
* 'display' (optional) display function values at each iteration? [True]
* 'store' (optional) store the iterative curves? [False]
* 'method' (optional) optimizer option 'BFGS'[default],'AGM','Nelder-Mead','CG','Newton-CG'
* 'index_parameters'
* 'missing_rowsum_threshold' (optional) specifies a threshold matrix row sum to treat an entry as missing data (missing nodes are ignored in the optimization)
* 'index_missing' (optional) specifies which entries are treated as missing (missing nodes are ignored in the optimization)
* 'off_diagonal' (optional) offset off of the diagonal entries which is treated as missing (and ignored)
* 'pairwise_combinations' (optional) optionally specify specific pairwise combination to use
You can either write a script to create a list of tasks with the desired
settings and then pass them into the optimzer, or you can create a JSON file
with a list of tasks that can be read in. JSON is used because it can be
read and edited manually by a human being and it is flexible to specify many
settings for a run.
There is also a "data" key which can be used to pass the data matrix and
initilaized curve as an array (without loading it from a file). This is useful
for running simba3d within a python script. This may be useful for more advanced
users who wish to edit the data matrix (or generate simulated data) without
reading and writing the data to file.
To read a json file with these parameters set, use the following command
> simba3d -r path/to/simba3d_task.json
# How to tell simba3d to run sequentially with warmstarts
Warmstarts is a technique where the final result from the previous setting is
used to initialized the next run with different settings. This is useful for
parameter tuning because it can lower the total computation time. It can also
limit space of curves you can converge to, which may not always be ideal since
the previous setting may trap the curve in a local solution.
In any case, the usewarmstarts key must be set to true within a sequentially
ordered task. To create a sequence of task which are to be done in a specific
order, simply nest them inside an array. In each task inside the array, set
the usewarmstarts to true so that simba3d will know that it should initialize
the next task with the solution computed from the previous task.
Several examples can be seen in the examples directory.
# Output File Formats
Currently, can create reports in the following output formats:
* '.npz' zipped NumPy array file
* '.mat' MatLab Mat-File file
* '.txt' text file
* '.json' text file
The default format is '.npz' and it is recommended to use this output because
it is easier to load into python and convert to some other format if desired.
A convertion tool has been packaged with simba3d to assist in converting
from .npz to some other desirable format. This convertion tool can be called
from the <'simba3d-convertion'> command.
For more information on using the convertion tool type
> simba3d-convertion -h
This can also be called on a local install using the command
> python run_simba3d_convertion.py -h
# Veiwing Results
simba3d has several tools to aid in summarizing the results collected.
The <'please-print'> command can be used to get print out specific key data
values stored in the results file.
If the matplotlib module is installed, then simba3d also has a graphical
summary command which can be called by <'simba3d-disp'>
This command will display the final curve and the energy of selected results
Keywords: chromatin
Platform: UNKNOWN
Provides-Extra: ploting_tools