We provide the scripts to automate the process of building a pipeline from the CDM folder to a Synapse workspace.
Have the data exported in a CDM format and stored in ADLS gen2 storage.
- Edit the configuration file by putting in custom values. A config file might look like this:
{
"ResourceGroup": "",
"TemplateFilePath": "../Templates/cdmToSynapse.json",
"TemplateParameters": {
"DataFactoryName": "",
"SynapseWorkspace": "",
"DedicatedSqlPool": "",
"AdlsAccountForCdm": "",
"CdmRootLocation": "cdm",
"StagingContainer": "adfstaging",
"Entities": ["LocalPatient", "LocalPatientAddress"]
}
}
- Run the scripts with the configuration file:
.\DeployCdmToSynapsePipeline.ps1 -Config: config.json
- Add data factory MI as SQL user into SQL DB. Here is a sample SQL script to create a user and an assign role:
CREATE USER [datafactory-name] FROM EXTERNAL PROVIDER
GO
EXEC sp_addrolemember db_owner, [datafactory-name]
GO
- Ensure that your database master key is created on the Synapse SQL pool. For more details, read about how to create a database master key.
Additionally, if you want to create a trigger with the pipeline dependency, here is a guideline from Azure Data Factory: Create tumbling window trigger dependencies - Azure Data Factory | Microsoft Docs
- A Synapse workspace. Create one if you need.
- Managed identity enabled on the Synapse workspace
- The managed identity has a Synapse SQL Administrator role. Navigate to Synapse Studio >> Manage >> Access Control to verify. If needed, provide access.
- Database master key is created on the Synapse SQL pool. Refer the documentation.
- Pipelines are allowed to access SQL Pools. This is the default setting in Synapse.
- Open Synapse Studio. Go to Integrate => + => Pipeline. Provide details.
- Create source in Data Flow Activity. Choose "Common Data Model" for source type. Click New to create a new Linked service.
- Select the storage that contains the CDM folders with FHIR data. Use Managed Identity authentication method. Provide Storage Blob Data Reader access for the storage to the managed identity name. Test the connection.
- Under Source options configure the Root location to the CDM container name, and Entity to the table name under CDM folder that you want to copy data from. If you enable debugging you can use Browse button to see the available entities.
- Click on Import Schema under Projection tab. It will import the table schema from the CDM folder.
- Create sink.
- Click on New against the dataset to create a new dataset of type Azure Synapse Analytics. Set properties, and provide name for a new Table.
- Go to the dataflow activity and fill-in the Staging linked service under Settings tab.
- Publish and trigger the pipeline. Once the pipeline has run successfully, you can see the data in the table by connecting SQL Server Management Studio.