Archived. This functionality was moved to bedboss
in 2023. Please see https://github.com/databio/bedbase for more information
Pipeline designed to create bedsets (sets of BED files) that will be retrieved from bedbase.
Example bedsets:
- Bed files from the AML database.
- Bed files from the EWS database.
- Bed files from ChiP-seq experiments.
Required: A PEP will with an attribute specifying a path to a JSON
file that contains the query to create the bedset
(see the tests
directory for reference).
- Clone the repository
- Install required python packages via
pip install -r requirements/requirements.txt --user
- Submit the pipeline with
looper
looper run project/cfg.yaml
bedbuncher generates the following files as well as links within the Elastic Search bedset
index from which they can be retrieved:
- TAR ball containing the BED files that match the query criteria.
- Dataframe where rows represent individual BED files and columns show statistics generated by GenomicDistributions (for ease of user-needed calculations).
- iGD database created from the bedset.
- Bedset statistics (currenty means and standard deviations).
- PEP for a specific Bedset created using the pipeline (currently under development).
iGD
builds a database that integrates genomic sets from one or more data sources and minimizes the search space for a specific query. Visit the iGD
repository for more information and installation details.