This 1-click deployment allows the user to deploy an environment with Synapse Link, you can directly connect to your Azure Cosmos DB containers from Azure Synapse Analytics and access the analytical store with no separate connectors. This provides complete end-to-end scenario to "Ingest data into Cosmos DB containers", "Setup Spark tables", "Join & aggregate operational data across Cosmos DB containers", along with "two pre-loaded PySpark Notebooks" to Perform Sales Forecasting on Retail dataset and Anomaly Detection on streaming dataset using Azure Synapse Link, Azure Automated Machine Learning and Azure Cognitive Services on Synapse Spark (MMLSpark).
Owner role (or Contributor roles) for the Azure Subscription the template being deployed in. This is for the creation of a separate Resource Group and to delegate roles necessary for this proof of concept. Refer to this official documentation for RBAC role-assignments.
-
Fork Out This GitHub Repository into your GitHub account.
If you don't fork repo:
- Notebook will not be deployed
- You will get a Github publishing error
-
While in your forked repo,Click 'Deploy To Azure' button given below to deploy all the resources.
-
Provide the values for:
- Resource group (create new)
- Region
- Company Tla
- Option (true or false) for Allow All Connections
- Option (true or false) for Spark Deployment
- Spark Node Size (Small, Medium, large) if Spark Deployment is set to true
- Sql Administrator Login
- Sql Administrator Login Password
- Sku
- Option (true or false) for Metadata Sync
- Frequency
- Time Zone
- Resume Time
- Pause Time
- CosmosDB Account Name
- CosmosDB Throughput Policy
- CosmosDB Manual Provisioned Throughput
- CosmosDB Autoscale Max Throughput
- CosmosDB Analytical Store TTL (Default -1)
- Github Username (username for the account where This GitHub repository was forked out into)
-
Click 'Review + Create'.
-
On successful validation, click 'Create'.
-
This template deploys necessary resources to support an Azure Synapse link for CosmosDB which includes the following resources along with some RBAC role assignments. Approximate cost of running this package would be around around $500-$700 per month (Synapse SQL pool at 100 DWU).
- An Azure Synapse Workspace
- An Azure Synapse SQL Pool
- An optional Apache Spark Pool
- Azure Data Lake Storage Gen2 account
- A new File System inside the Storage Account to be used by Azure Synapse
- A Logic App to Pause the SQL Pool at a defined schedule
- A Logic App to Resume the SQL Pool at a defined schedule
- A key vault to store the secrets
- CosmosDB Database (CosmosDemoDB)
- CosmosDB Containers with Analytical Store Enabled
- Products
- StoreDemoGraphics
- RetailSales
- IoTDeviceInfo
- IoTSignals
- AML workspace
- Azure Cognitive Service
- Pyspark Notebook to ingest batch data into CosmosDB containers, Fetch data from CosmosDB,Join dataset together,Perform Sales Forecasting using Azure Synapse Link and Azure Automated Machine Learning on Synapse Spark
- Pyspark Notebook to ingest stream and batch data into CosmosDB containers, Fetch data from CosmosDB,Join dataset together,Perform Anomaly Detection using Azure Synapse Link and Azure Cognitive Services on Synapse Spark (MMLSpark)
- Current Azure user needs to have "Storage Blob Data Contributor" role access to recently created Azure Data Lake Storage Gen2 account to avoid 403 type permission errors.
- After the deployment is complete, click 'Go to resource group'.
- You'll see all the resources deployed in the resource group.
- Click on the newly deployed Synapse workspace.
- Click on the link 'Open' inside the box labeled as 'Open Synapse Studio'.
- Click on 'Log into Github' after the workspace is opened. Provide your credentials for the Github account holding the forked-out repository.
- After logging to your GitHub account, click on the 'Notebook' icon in the left panel. A blade will appear from the right side of the screen.
- Make sure that the 'main' branch is selected as 'Working branch' and click 'Save'.
- In Synapse Studio click on the 'Manage' icon in the left panel and navigate to 'Linked Services' menu option.
- In Synapse Studio click on 'CosmosDBLink' linked service to open up configuration settings.
- Select 'Connection String' under Authentication Method.
- Select 'From Azure subscription' under 'Account Selection Method'.
- Select Subscription under which the package is deployed.
- Select CosmosDB account
- Select CosmosDB Databasename 'CosmosDemoDB'
- Click 'Apply' to save the changes
- To verify CosmosDB Analytical store and containers
- In Cosmos DB account navigate to 'Features' tab.
- Synapse Link will appear as Enabled option.
- To check the pre-created database and containers with Analytical store navigate to 'Data Explorer' tab.
- In Synapse Studio you can see the same CosmosDB containers configured
- Navigate to 'Data' section in the left panel and then to the 'Linked' menu option.
- Expand 'Azure CosmosDB', There will be five containers listed with 'Analytical Store' enabled.
- In Synapse Studio click on the 'Develop' icon in the left panel and navigate to the 'Notebooks' section.
- There are two Notebooks available '1-SalesForecastingWithAML' and '2-AnomalyDetectionWithMML'.
- Follow the instructions in each cell of Notebook to execute:
- Ingest Data into CosmosDB containers using Synapse Link for CosmosDB
- Create Spark tables out of CosmosDB using Synapse link for CosmosDB
- Join spark tables using Synapse spark serverless
- Execute Machine learning model on this dataset using Azure ML
- On completion of Notebook CosmosDB containers will be populated with a sample dataset.
- Navigate to CosmosDB account,Under 'Data Explorer' tab you can see containers populated with data.
- In Synapse Studio navigate to 'Data' tab in the left menu,Once you are here click on 'Linked' tab.
- Expand the 'Azure CosmosDB' option and right click container you want to load the dataframe for.
- On-Right click section select 'New Notebook' then 'Load to DataFrame'.
- Attach the already created Spark pool to Notebook and execute.
- On successful completion,You should be able to see the resultset in Synapse Studio.
- Once published all the resources will now be available in the live mode.
- To switch to the live mode from git mode, click the drop-down at the top left corner and select 'Switch to live mode'.