Skip to content

Commit

Permalink
feat: news 2024 04 15 (#3020)
Browse files Browse the repository at this point in the history
  • Loading branch information
EM authored and EM committed Jun 18, 2024
1 parent ab25e74 commit 5a189f1
Showing 1 changed file with 137 additions and 0 deletions.
137 changes: 137 additions & 0 deletions next/docs/news/2024/04/15/comparing-the-nhgri-anvil-platform.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
author: "AnVIL"
date: "2024-04-15"
description: "AnVIL, a leading cloud-native platform for biomedical data storage and analysis, offers features that use infrastructure on both Google Cloud Platform (GCP) and Microsoft Azure. AnVIL on Azure is in active development, and we expect to share updates as additional functionality and optimizations are released."
featured: true
hidden: true
title: "Comparing the NHGRI AnVIL Platform Offerings on Google Cloud Platform (GCP) and Microsoft Azure"
---

<NewsHero {...frontmatter} />

## Introduction

AnVIL, a leading cloud-native platform for biomedical data storage and analysis, offers features that use infrastructure
on both Google Cloud Platform (GCP) and Microsoft Azure. AnVIL on Azure is in active development, and we expect to share
updates as additional functionality and optimizations are released. Most of the features of AnVIL will be very similar
or identical between GCP and Azure, especially the ability to run interactive analyses or large-scale workflows in a
secure collaborative environment. However, there are a few important differences in the features and services the two
cloud environments provide. This comparison aims to explore some differences users can expect when choosing a
cloud-service provider when using the AnVIL platform.

For new users to AnVIL, you can learn more about the platform and its offerings at [anvilproject.org/learn](/learn). If
you have any questions, feel free to engage with us at [help.anvilproject.org](https://help.anvilproject.org/).

**TABLE REMOVED CC FRAN**

X Full functionality is currently available.

`` A reduced subset of AnVIL’s data corpus will remain available on GCP after August 2024.

‡ Plans for AnVIL users to access seqr on Azure are under consideration. Please see below for more
Information.

\# New data submissions will occur on Azure only.

## Offerings

### What is a FedRAMP-certified security perimeter?

The AnVIL analysis environment and stored data are secured in accordance with the industry best practices, the NIST
800-53 Moderate security controls following the FedRAMP standard. Please be aware that users of AnVIL on Azure will need
to cover the additional infrastructure costs associated with logging as per the FedRAMP standard, which may result in
notable additional expenses in certain instances.

Learn more on how
to [host FISMA data on FedRAMP moderate Terra Azure](https://support.terra.bio/hc/en-us/articles/21329019108635-Host-FISMA-data-on-FedRAMP-moderate-Terra-Azure).

AnVIL also supports sharing of clinical research data and HIPAA compliance, when required. As needed, relevant parties
will execute Business Associate Agreements and must follow Good Clinical Practice guidelines to protect patient privacy.

Learn more about [AnVIL’s Data Security, Management, and Access Procedures](/faq/data-security).

### What is AnVIL’s data corpus?

AnVIL is the primary NIH-designated repository for NHGRI-funded data, metadata, and associated documentation.
NHGRI-funded investigators may submit to AnVIL to fulfill the expectations of the NIH Genomic Data Sharing
Policy ([NOT-OD-14-124](https://grants.nih.gov/grants/guide/notice-files/NOT-OD-14-124.html)) and the Final NIH Policy
for Data Management and Sharing ([NOT-OD-21-013](https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html)).
A number of high-value open-access datasets remain available on GCP, which are still being identified. NHGRI
controlled-access data will reside in the Microsoft Azure platform.

- Read
about [AnVIL as an NIH-designated repository (NOT-HG-24-020)](https://grants.nih.gov/grants/guide/notice-files/NOT-HG-24-020.html).
- Discover datasets stored in AnVIL with the [AnVIL Data Explorer](https://explore.anvilproject.org/datasets).
- Learn [how to apply for access to NHGRI controlled-access datasets](/learn/accessing-data/requesting-data-access).

### What are Terra-owned shared central services compared to user-owned central services?

With the varying architectures of cloud platforms, the Terra infrastructure will be managed differently on GCP and on
Azure. For [Terra on GCP](https://support.terra.bio/hc/en-us/sections/23504885621787), users can leverage Terra-owned
shared central services - meaning the cost of Terra infrastructure is sponsored by the Data Sciences Platform at the
Broad Institute. For [Terra on Azure](https://support.terra.bio/hc/en-us/sections/10090806360475), Terra instances are
spun up for each user behind the interface, which shifts the infrastructure run costs to the users. This enables full
control of the regionality of a Terra environment and data residence. This enables AnVIL users who wish to use data that
is subject to data residency laws.

### What are the costs for using the AnVIL platform?

The AnVIL platform operates in the cloud, where users incur costs for cloud storage, data egress, and computing
services. While AnVIL covers the expenses related to storing its data corpus, users are responsible for the costs
associated with storing their own data as well as any user-derived data from AnVIL’s data corpus.

Whether using AnVIL on Azure or GCP, users follow a 'user-pays, pass-through' model, where AnVIL manages cloud compute
resources on their behalf. It's important to clarify that while AnVIL itself does not impose any fees, users are
responsible for the charges levied by cloud service providers. It’s also worth mentioning that both Google Cloud and
Microsoft Azure require separate billing setups in order to access and utilize their services. Additionally, please note
that AnVIL’s Azure users are accountable for cloud infrastructure expenses.

A number of resources are available to users to help understand potential costs for working on the cloud.

- [Preparing a Cloud Cost Budget Justification](/learn/investigators/budget-templates)

For more information on costs and billing on GCP, read more here:

- [Overview: Costs and billing (GCP) from Terra Support](https://support.terra.bio/hc/en-us/articles/6123082826651-Overview-Costs-and-billing-in-Terra)
- [AnVIL on GCP Data Storage & Egress Cost Estimate Calculator](https://docs.google.com/spreadsheets/d/15jvXVymmjWp6m0FhlVXGQlOjDMNAcY1XSUC4pa4kuNM/edit#gid=883296657)
- [Create, edit, or delete budgets and budget alerts (Google Support)](https://cloud.google.com/billing/docs/how-to/budgets)

For more information on costs and billing on Azure, read more here:

- [Overview: Costs and billing (Azure) from Terra Support](https://support.terra.bio/hc/en-us/articles/12029087819291-Overview-Costs-and-billing-Azure)
- [AnVIL on Azure Data Storage & Egress Cost Estimate Calculator](https://docs.google.com/spreadsheets/d/15jvXVymmjWp6m0FhlVXGQlOjDMNAcY1XSUC4pa4kuNM/edit#gid=1519041783)
- [Managing Cloud Costs on Azure from Terra Support](https://support.terra.bio/hc/en-us/sections/10090961589403-Managing-Cloud-Costs)

Creating accounts and connecting funding resources to either Google Cloud Platform or Microsoft Azure requires working
with a cloud reseller. Researchers may also consider connecting with NIH STRIDES to leverage discounts on cloud costs.

### What does AnVIL offer for batch computing with workflows and interactive analysis?

AnVIL enables researchers to have access to various analysis tools. Batch computing options support running automated
tools and workflows over many datasets, while interactive analyses allow step-wise and exploratory analyses in an
interface that supports data exploration and visualization.

Learn more about [AnVIL’s platform components](/overview#platform-components).

### What VM custom images can I use with AnVIL?

Software engineers and genomic analysts rely on docker images to simplify app deployment, make sharing easy, and ensure
consistent behavior across environments. Today, AnVIL users can specify custom docker images in workflow scripts in both
AnVIL on Azure and AnVIL on GCP. Although custom images can be specified in Jupyter Notebook environments in AnVIL on
GCP, the equivalent functionality in AnVIL on Azure is currently deferred.

### Where can I develop workflows, or from where can I bring in workflows?

AnVIL supports running workflows written in the Workflow Description Language (WDL) via the Cromwell workflow engine and
Galaxy workflows run in Galaxy on AnVIL. AnVIL on both GCP and Azure supports integration with Dockstore, a repository
for published and shared workflows in WDL and Galaxy workflow formats.

AnVIL supports access to workflows in public GitHub repositories in both GCP and Azure cloud.

### Where can I run the seqr tool?

AnVIL provides access to a software platform called [seqr](https://seqr.broadinstitute.org/), which was designed for
family-based analysis of rare disease exome and genome data. Although seqr has been deployed by Microsoft on Azure in a
test environment, the primary instance of seqr is currently only available for use on AnVIL on GCP. If a user has data
on Azure, the seqr team recommends that users move the single joint called VCF into a GCP bucket for seqr analysis (the
data should be on the order of GBs). Doing so requires billing accounts on both GCP and Azure.

0 comments on commit 5a189f1

Please sign in to comment.