Introduction: This blog explains how to export CDS (Microsoft Dataverse) data to Azure data lake storage Gen2.
Azure Data Lake has the capability to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It can handle unstructured, semistructured, and structured data.
Features:
- It continuously exports data from CDS(Microsoft Dataverse) to Azure Data Lake Storage Gen2, which can be consumed by Power BI, Azure Data Factory, Azure Databricks, and Azure Machine Learning.
- It performs Initial write, followed by incremental writes for data and metadata.
- It replicates(copy) both standard and custom entities(tables).
- It also replicates CRUD operations performed on the entity(table).
- Changes made in data or metadata are pushed automatically without any refresh.
Steps to be followed:
Create Storage Account in Azure
- Login to Azure (https://portal.azure.com/)
- Go to Storage accounts

- Click on “+Add”

- Enter Details.
- Select the Subscription, Select or create new Resource group
- Give Name to your Storage Account
- Select Location
- Account Kind: Storagev2 (general purpose v2)
- Replication: Read-access geo-redundant storage (RA-GRS)

- Go to advance section and enable Hierarchical namespace feature.

- Now click on “Review+Create”
- Click on create.

NOTE:
- The storage account must be created in the same Azure AD tenant as your PowerApps tenant.
- The storage account should be created in the same region as the PowerApps environment you plan to use it in.
Steps to Configure in PowerApps studio
- Login to PowerApps (https://make.powerapps.com/)
- Go to Data –> Export to data lake and click on “+New link to data lake”

- Select Subscription, Resource group, and Storage account
- Click Next

- Select all the entities(tables) you want to export to data lake.

NOTE: If you cannot see your entity(table) in the list make sure “Change Tracking” of that particular entity(table) is enabled.
To enable change tracking follow the steps given below:
- Go to Data –> Tables
- Select the entity(table)
- Go to settings

- Go to “Create and update settings” and enable the Change tracking option.

- To see the status of data lake synchronization
- select the data lake
- Click on more commands(…) –> Tables

- You can see the status(initial sync status, count of records replicated and last synchronized timestamp) for each of the entities.

Viewing your data in Azure data lake
- Login to Azure
- Select the Storage account and then in the leftmost navigation pane, select Storage Explorer
- Expand CONTAINERS and select the container with “commondataservice-environmentName-org-Id ” name.
- You can see a folder for each of the entities you chose to replicate to the data lake along with the model.json file
- model.json, contains the schema of all your Data lake entities

If we click on one of the folders, let say account, we will see a CSV file that contains the data from that entity as well as a snapshot folder.

In the Snapshot folder, regular snapshots are taken to provide timely changes in your data.
It has a view of the data at that particular time before any changes are made, based on create, update, or delete within Common Data Service. (snapshot copy is created only after the hour cycle is completed)
