Connect Snowflake to Azure Storage
Goal of this tutorial is to load JSON and CSV data from a Azure Storage using the Copy into sql command and Snowpipe to automate the ingestion process.
Requirements
- Snowflake account, you can use a free trial. We also assume no complex security needs.
- Azure account, you can setup a free account to get started.
Video
Download
- Sample data (Link)
Manual Loading
Lets start by setting up a Snowflake connection to Azure storage and load json data. After that use snowpipe to automate the ingestion of CSV files.
Azure
Sign into your azure account.
Create Storage account
Create the stoage account you intend to use.
In our case we'll call the stoage account danielwilczak and make it's region the same as the one our Snowflake account. It's okay if it's not with the same provider or region, it's just the quickest. We will also use the default for everything else. If creating a resource group just enter a name and take the defaults.
We will then navigate into the storage account.
Create Container
In our case we will call it data.
Navigate into the data container.
Upload sample data
Upload the sample data to your azure storage bucket (json/csv) provided in the data folder.
Get teanent ID
Using the search bar, look up tenant properties.
Copy your Tenant ID. We will use this in two places.
Snowflake
Let's transition to Snowflake by creating a worksheet and adding the code below with your information:
property | property_value |
---|---|
AZURE_CONSENT_URL | https://login.microsoftonline.com/9a2d... |
AZURE_MULTI_TENANT_APP_NAME | c9pnugsnowflakepacint_1700096201187 |
Please Ctrl+Click the URL or copy and paste it into your browser.
Grant Access in Azure
Lets navigate to IAM so that we can give snowflake access to our storage account.
Search for Storage Blob Data Contributor, select the role and click next.
Lets add the Snowflake member by selecting "select member" and search for our AZURE_MULTI_TENANT_APP_NAME that we got earlier from snowflake.
Once selected, click Review and Assign.
Load the data
If you get an error when creating the stage.
Just wait 30 seconds to a minute and try again. Sometimes it takes Azure a bit to update security.
Lets setup the stage, file format, warehouse and finally load some json data.
file | status |
---|---|
azure://danielwilczak.blob.core.windows.net/data/json/sample.json | LOADED |
Look at the data you just loaded.
Automatic Loading
Warning
If you have not manually loaded data yet from azure storage. Please go back and complete that section first.
Azure
Lets start with azure by creating a queue. All you need is a name. I've named mine danielwilczak-snowflake-queue
.
Store the URL. It will be used later.
Next lets create an event subscription.
If you get an error with the topic follow this prosess to enable it
Search "resource provider" and select it.
Search "EventGrid", select it, and the select register. This will enable it in your account.
Enter in the highlighted fields. I've named mine danielwilczak-snowflake-event
. Once entered click "configure endpoint".
Enter the storage account and selct your queue that we setup prior.
To finish the event, click "create".
Snowflake
Open a worksheet and enter in your queue url
and tenant_id
.
property | PROPERTY_VALUE |
---|---|
AZURE_CONSENT_URL | https://login.microsoftonline.com/9a2d7... |
AZURE_MULTI_TENANT_APP_NAME | tzpbnzsnowflakepacint_1700112906001 |
Please ctrl+click the url or enter it into your browser you will have to accept the integration on azure.
Grant Access in Azure
Using the left side search "iam" and select Access Control and "add role assignment".
Search for "storage queue data contributor" and select it and click next.
Click "Select members" and add your AZURE_MULTI_TENANT_APP_NAME
that we recieved from Snowflake in the prior snowflake step.
Final Azure step, select review and assign and your done with azure.
Load the data
Note
Sometimes it may take 1-2 minutes before you see data in the table. This depends on how Azure is feeling today.
If you get an error when creating the pipe.
Just wait 30 seconds to a minute and try again. Sometimes it takes Azure a bit to update security.
We'll load a csv file by automating the creation of the table and infering the names in the csv pipe.
File | Status |
---|---|
/sample_1.csv | SENT |
Result
Note
Sometimes it may take 1-2 minutes before you see data in the table. This depends on how Azure is feeling today.
Lets add more sample data into the azure csv folder and see it added in snowflake ~30 seconds later. We can see this by doing a count on our table and see 20 records where th original csv only has 10 records.