Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BCDC Cloud Computing Environment Charter #1

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
42 changes: 42 additions & 0 deletions charters/bcdc-cloud-computing-environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# BCDC Cloud Computing Environment (Terra)
## Description
We connect, develop, add value to, and train our community around the BCDC Cloud Computing Environment. This is a cloud-native environment that enables many functionalities including cloud storage, scalable data processing pipelines, and horizontally scalable data-science environments.

# Roles
[Principal Investigator] Timothy Tickle ([email protected])
[Pipelines Development Product Manager] Kylee Degatano ([email protected])
[Customer Success and Education] Salin Thomas ([email protected])
[Project Manager] Cara Mason ([email protected])

## Definitions
**pipeline:** A collection of one or more functional tasks that operate on input data and, from that, transform the input data, or derive features often used to interpret the input data. In a high throughput setting, these tasks are often automated to be performed in a batch setting.

## Objectives
- To create and maintain a cloud native, horizontally scalable environment for use by the BICCN Community.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this limited to processing of molecular *omics data, or could they cover other modalities in the future? (I think the current members know this, but a new member might not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will state and clarify potential other areas.

- To work with community members to bring data processing pipelines used by the community or of value to the community into the cloud computing environment so they may be operated at scale.
- To leverage data-driven feedback from the consortium to update data processing pipelines focusing on improvement to both science and engineering performance. These updates will be versioned and documented.

## In-scope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should something be considered in scope to maintain and upgrade pipelines depending on updates to genomes etc (not sure if this happens regularly enough to impact us for this grant)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consideration:

To support the enrichment of the integrated dataset within the BCDC Cell Registry to facilitate linkage between datasets, modalities and updates through integration of secondary analysis and features extraction pipelines into the computing environment supporting access to data in the BCDC Cell Registry and data archives and publishing results back to the BCDC Cell Registry and data archives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

- Connect data flow from BICCN Ingest services and archives to the BCDC Cloud Computing Environment to enable the processing of sequence-based data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to my question on the first objective.

- Work with community members to translate community pipelines and pipelines of high value to cloud-native, horizontally scalable data processing pipelines.
- Work with the BICCN community to improve pipelines currently found in the BCDC Cloud Computing Environment.
- Perform training on how to use the BCDC Cloud Computing Environment to BICCN community members, those wanting to process consortium data, and those who want to use a consortium derived pipeline (on private or public data).

## Out-of-scope
- Data operations or processing data on behalf of the BICCN community.
- Running or maintaining pipelines not compatible with the BICCN cloud computing environment.

# Communication
## Slack Channels
[BICCN Joint Analysis/pipelines](https://biccn-joint-analysis.slack.com/messages/pipelines)
To speak directly to the implementation team and community members around pipelines in general.

[BICCN Joint Analysis/methylation-pipelines](https://biccn-joint-analysis.slack.com/messages/methylation-pipelines)
To speak directly to the implementation team and community members around methylation pipelines.

## Github repositories
[CEMBA](https://github.com/BICCN/CEMBA)
Contains the implementation for the snmC-seq pipeline.

[Snap-ATAC](https://github.com/HumanCellAtlas/skylab/tree/master/pipelines/snap-atac)
Contains the implementation for the Snap-ATAC scATAC-seq pipeline.