Upgrade Data Factory CI/CD with YAML Pipelines and Approvals

Bob Blackburn
6 min readAug 12, 2021

--

Last time we discussed ADF Pipelines in the classic (GUI) interface here. Moving forward it is recommended to use YAML Pipelines. I have been getting a few requests on how. We will have to do a few more steps than some of our App Dev counterparts because ADF deployment does not get the love that web apps and other DevOps pipelines get.

Disclaimer: All views/opinions are my own. Code samples are for educational purposes.

If you need a few reasons to move to YAML Pipelines, here are some of them.

  1. History in source control
  2. View differences in changes
  3. Attach work items
  4. Rollback to the previous version
  5. Templating and sharing

Steps

  1. Setup NPM
  2. Install ADF task from the marketplace
  3. Initialize YAML Pipeline
  4. Validate and Publish build
  5. Deploy build to the development environment
  6. Approve deployment to a higher environment.
  7. Deploy to a higher environment if approved.

Setup NPM (Node Package Manager)

Create a package.json file with the following code in the root folder.

Code:

{
"scripts":{
"build":"node node_modules/@microsoft/azure-data-factory-utilities/lib/index"
},
"dependencies":{
"@microsoft/azure-data-factory-utilities":"^0.1.3"
}
}

We will use NPM to build, validate, and publish ADF code.

Install ADF task from the marketplace

We are going to install a DevOps Task from the marketplace. It allows for easy customization of parameters and helps streamline the process. One caution to be aware of, it will automatically create a Data Factory if it does not exist. Something to be mindful of if you miss spell something and cannot find it. Documentation can be found here.

Create a release pipeline to be able to search the marketplace. You can also do this step later after the artifact is published. Search for adf and select Deploy Azure Data Factory by SQLPlayer and complete the installation.

Initialize YAML Pipeline

Now we are ready to get to the YAML code.

From the Project Pipelines, press New pipeline.

Select your repo, repository, and Starter pipeline

It will create a file called azure-pipelines.yml with the following code snippet.

Now we can edit it for our process.

When branches are merged into Master/Main, we will kick off CI/CD to deploy to the development ADF. We will then set an approval to promote to higher environments. Additionally, we will set a time out on the approval, so the pipelines do not sit in a run/wait state for long periods. You will be able to go back into the release you wish to promote and rerun at the approval and complete the promotion of the version of your choice.

Make the following replacements in the code below:

{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx} -> Subscription ID
{ResourceGroup} -> Resource Group name
{ADF-Name} -> Data factory name for development environment
trigger:
- master
pool:
vmImage: windows-latest

variables:
BuildConfiguration: 'Release'
stages:
- stage: 'Build'
jobs:
- job: 'Build'
steps:
- task: NodeTool@0
inputs:
versionSpec: '10.x'
displayName: 'Install Node.js'
- task: Npm@1
inputs:
command: 'install'
verbose: true
displayName: 'Install npm package'


- task: Npm@1
inputs:
command: 'custom'
customCommand: 'run build validate $(Build.Repository.LocalPath) /subscriptions/{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}/resourceGroups/{ResourceGroup}/providers/Microsoft.DataFactory/factories/{ADF-Name}'
displayName: 'Validate'

# Validate and then generate the ARM template into the destination folder.
- task: Npm@1
inputs:
command: 'custom'
customCommand: 'run build export $(Build.Repository.LocalPath) /subscriptions/{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}/resourceGroups/{ResourceGroup}/providers/Microsoft.DataFactory/factories/{ADF-Name} "ArmTemplate"'
displayName: 'Validate and Generate ARM template'
# Publish the Artifact - task: PublishPipelineArtifact@1
inputs:
targetPath: '$(Build.Repository.LocalPath)'
artifact: 'ArmTemplates'
publishLocation: 'pipeline'

The deployed code below was generated from the Task we installed in the prerequisites. You can customize the parameters by using the GUI Create Release then view YAML. See previous blog post for GUI release creation. Make sure your location matches the target environment.

- stage: 'Dev_Deploy'
jobs:
- job: 'Dev_Deploy'
steps:
- task: SQLPlayer.DataFactoryTools.PublishADF.PublishADFTask@1 displayName: 'Publish ADF {ADF-Name} from JSON files' inputs: azureSubscription: 'Visual Studio Enterprise Subscription – MPN ({xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx})' ResourceGroupName: {ResourceGroup} DataFactoryName: '{ADF-Name}' DataFactoryCodePath: '$(Build.Repository.LocalPath)' Location: 'East US'

Approvals

In your Project Pipelines, go to Environments and click New environment. Since I already have QA, we will use UAT as an example. After it is created, click on the ellipses to set properties.

Click on Approvals and settings blade will open.

Enter your approvers and if they can approve their run. Set the timeout. This will terminate the release if you do not manually reject or approve it. You can always go back into the pipelines and select the release you want to deploy to a higher environment as we will see later. If you will always manually release to a higher environment and just want to save the artifact with each pipeline, you can set the timeout to 1 minute.

Since the publish task needs a job and the environment variable needs a deployment in YAML, we will separate the two stages and create a dependency.

- stage: 'QA_Approval'
jobs:
- deployment: 'QA_Approval'
environment: 'QA'
# Set timeout in Pipelines, Environments
- stage: 'QA_Deploy'
dependsOn: QA_Approval
jobs:
- job: 'QA_Deploy'
steps:
- task: SQLPlayer.DataFactoryTools.PublishADF.PublishADFTask@1
inputs: azureSubscription: 'Visual Studio Enterprise Subscription – MPN ({xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx})' ResourceGroupName: {ResourceGroup} DataFactoryName: '{QA-ADF-Name}' DataFactoryCodePath: '$(Build.Repository.LocalPath)' Location: 'East US' #StageCode: QA

If you reviewed the code carefully, you will have noticed the last line of #StageCode: QA. This is used to update configuration settings in a higher environment (i.e. Key Vault). You can uncomment this line and set the config file. Again, refer to the previous post to set that up.

Review the Pipeline Run

Let’s look at the pipelines:

The most recent one timed out on the approval stage. The previous one originally timed out but then was manually rerun and approved.

Click on the most recent run to see the detail.

Retry the Stage and press Review.

Approve the release.

Now you can see the QA deploy stage is complete.

We can always see what the QA release is in the Environments tab.

Conclusion

Now you can get the benefits of using YAML for CI/CD and control releases to QA and Production through approvals within one YAML pipeline.

--

--

Bob Blackburn

Principal Azure Data Platform Engineer, Certified Azure Data Engineer, volunteer firefighter/EMT