Skip to content

Latest commit

 

History

History
87 lines (62 loc) · 4.05 KB

cdm-to-synapse.md

File metadata and controls

87 lines (62 loc) · 4.05 KB

Moving data from CDM folder to Azure Synapse

Automatic Method: Using Scripts

We provide the scripts to automate the process of building a pipeline from the CDM folder to a Synapse workspace.

Prerequisite

Have the data exported in a CDM format and stored in ADLS gen2 storage.

Create pipeline

  1. Edit the configuration file by putting in custom values. A config file might look like this:
{
  "ResourceGroup": "",
  "TemplateFilePath": "../Templates/cdmToSynapse.json",
  "TemplateParameters": {
    "DataFactoryName": "",
    "SynapseWorkspace": "",
    "DedicatedSqlPool": "",
    "AdlsAccountForCdm": "",
    "CdmRootLocation": "cdm",
    "StagingContainer": "adfstaging",
    "Entities": ["LocalPatient", "LocalPatientAddress"]
  }
}
  1. Run the scripts with the configuration file:
.\DeployCdmToSynapsePipeline.ps1 -Config: config.json
  1. Add data factory MI as SQL user into SQL DB. Here is a sample SQL script to create a user and an assign role:
CREATE USER [datafactory-name] FROM EXTERNAL PROVIDER
GO
EXEC sp_addrolemember db_owner, [datafactory-name]
GO
  1. Ensure that your database master key is created on the Synapse SQL pool. For more details, read about how to create a database master key.

Additionally, if you want to create a trigger with the pipeline dependency, here is a guideline from Azure Data Factory: Create tumbling window trigger dependencies - Azure Data Factory | Microsoft Docs

Manual Method: Using Azure Portal and Synapse Studio

Prerequisite

  1. A Synapse workspace. Create one if you need.
  2. Managed identity enabled on the Synapse workspace
  3. The managed identity has a Synapse SQL Administrator role. Navigate to Synapse Studio >> Manage >> Access Control to verify. If needed, provide access.
  4. Database master key is created on the Synapse SQL pool. Refer the documentation.
  5. Pipelines are allowed to access SQL Pools. This is the default setting in Synapse. Pipelines allowed to access SQL pools

Create pipeline

  1. Open Synapse Studio. Go to Integrate => + => Pipeline. Provide details.

Synapse new pipeline

  1. Create source in Data Flow Activity. Choose "Common Data Model" for source type. Click New to create a new Linked service.

Synapse new source

  1. Select the storage that contains the CDM folders with FHIR data. Use Managed Identity authentication method. Provide Storage Blob Data Reader access for the storage to the managed identity name. Test the connection.

Synapse new linked service

  1. Under Source options configure the Root location to the CDM container name, and Entity to the table name under CDM folder that you want to copy data from. If you enable debugging you can use Browse button to see the available entities.

Synapse source options

  1. Click on Import Schema under Projection tab. It will import the table schema from the CDM folder.

Synapse import schema

  1. Create sink.

Synapse create new sink

  1. Click on New against the dataset to create a new dataset of type Azure Synapse Analytics. Set properties, and provide name for a new Table.

Synapse create dataset

  1. Go to the dataflow activity and fill-in the Staging linked service under Settings tab.

Synapse staging linked service

  1. Publish and trigger the pipeline. Once the pipeline has run successfully, you can see the data in the table by connecting SQL Server Management Studio.