Skip to content
This repository has been archived by the owner on Dec 19, 2023. It is now read-only.

databio/bedbuncher

Repository files navigation

Archived. This functionality was moved to bedboss in 2023. Please see https://github.com/databio/bedbase for more information

bedbuncher

Pipeline designed to create bedsets (sets of BED files) that will be retrieved from bedbase.

Example bedsets:

  • Bed files from the AML database.
  • Bed files from the EWS database.
  • Bed files from ChiP-seq experiments.

Before running the pipeline

Required: A PEP will with an attribute specifying a path to a JSON file that contains the query to create the bedset (see the tests directory for reference).

To run the pipeline

  1. Clone the repository
  2. Install required python packages via
pip install -r requirements/requirements.txt --user
  1. Submit the pipeline with looper
looper run project/cfg.yaml

Pipeline outputs

bedbuncher generates the following files as well as links within the Elastic Search bedset index from which they can be retrieved:

  • TAR ball containing the BED files that match the query criteria.
  • Dataframe where rows represent individual BED files and columns show statistics generated by GenomicDistributions (for ease of user-needed calculations).
  • iGD database created from the bedset.
  • Bedset statistics (currenty means and standard deviations).
  • PEP for a specific Bedset created using the pipeline (currently under development).

Additional CML dependencies

iGD builds a database that integrates genomic sets from one or more data sources and minimizes the search space for a specific query. Visit the iGD repository for more information and installation details.