Skip to content

Connect Snowflake to Azure Storage

Goal of this tutorial is to load JSON and CSV data from a Azure Storage using the Copy into sql command and Snowpipe to automate the ingestion process.

Requirements

  • Snowflake account, you can use a free trial. We also assume no complex security needs.
  • Azure account, you can setup a free account to get started.

Video

Download

  • Sample data (Link)

Manual Loading

Lets start by setting up a Snowflake connection to Azure storage and load json data. After that use snowpipe to automate the ingestion of CSV files.

Azure

Sign into your azure account.

Create Storage account

Create the stoage account you intend to use. Azure create container

In our case we'll call the stoage account danielwilczak and make it's region the same as the one our Snowflake account. It's okay if it's not with the same provider or region, it's just the quickest. We will also use the default for everything else. If creating a resource group just enter a name and take the defaults. Azure create container options

We will then navigate into the storage account. Navigate into storage account

Create Container

Create a storage container. Click create container

In our case we will call it data. Container name

Navigate into the data container. Navigate into container

Upload sample data

Lets upload some sample data Navigate into container

Upload the sample data to your azure storage bucket (json/csv) provided in the data folder.

Upload example data

Get teanent ID

Using the search bar, look up tenant properties. Tenant Properties

Copy your Tenant ID. We will use this in two places. Tenant Properties

Snowflake

Let's transition to Snowflake by creating a worksheet and adding the code below with your information:

/*
    We switch to "sysadmin" to create an object
    because it will be owned by that role.
*/
use role sysadmin;

-- Create a database to store our schemas.
create database if not exists raw ;

-- Create the schema. The schema stores all objects.
create schema if not exists raw.azure;

/*
    Warehouses are synonymous with the idea of
    compute resources in other systems.
*/
create warehouse if not exists development 
    warehouse_size = xsmall
    auto_suspend = 30
    initially_suspended = true;

/*
    Integrations are on of those important features that account admins
    should do because it's allowing outside snowflake connections to your data.
*/
use role accountadmin;

create or replace storage integration azure_integration
    type = external_stage
    storage_provider = 'azure'
    enabled = true 
    azure_tenant_id = '<TENANT_ID>' /* (1)! */
        storage_allowed_locations = (
'azure://<STORAGE ACCOUNT NAME>.blob.core.windows.net/<CONTAINER NAME>' /* (2)! */ /* (3)! */
        );

-- Give the sysadmin access to use the integration later.
grant usage on integration azure_integration to role sysadmin;

-- Get the URL to authenticate with azure and the app name to use later.
describe storage integration azure_integration;
select "property", 
case  when "property" = 'AZURE_MULTI_TENANT_APP_NAME' then split_part("property_value", '_', 1) else "property_value"end as "property_value"
from table(result_scan(last_query_id()))
where "property" in ('AZURE_CONSENT_URL', 'AZURE_MULTI_TENANT_APP_NAME');

  1. Using the search bar, look up tenant properties. Tenant Properties

    Copy your Tenant ID. Tenant Properties

  2. Navigate into storage account

  3. Navigate into container

/*
    We switch to "sysadmin" to create an object
    because it will be owned by that role.
*/
use role sysadmin;

-- Create a database to store our schemas.
create database if not exists raw;

-- Create the schema. The schema stores all objects.
create schema if not exists raw.azure;

/*
    Warehouses are synonymous with the idea of
    compute resources in other systems.
*/
create or replace warehouse development 
    warehouse_size = xsmall
    auto_suspend = 30
    initially_suspended = true;

/*
    Integrations are on of those important features that account admins
    should do because it's allowing outside snowflake connections to your data.
*/
use role accountadmin;

create or replace storage integration azure_integration
    type = external_stage
    storage_provider = 'azure'
    enabled = true 
    azure_tenant_id = '9a2dkd8cb-73e9-40ee-a558-fcdnj5ef57a7'
        storage_allowed_locations = (
            'azure://danielwilczak.blob.core.windows.net/data'
        );

-- Give the sysadmin access to use the integration later.
grant usage on integration azure_integration to role sysadmin;

-- Get the URL to authenticate with azure and the app name to use later.
describe storage integration azure_integration;
select "property", 
case  when "property" = 'AZURE_MULTI_TENANT_APP_NAME' then split_part("property_value", '_', 1) else "property_value"end as "property_value"
from table(result_scan(last_query_id()))
where "property" in ('AZURE_CONSENT_URL', 'AZURE_MULTI_TENANT_APP_NAME');
property property_value
AZURE_CONSENT_URL https://login.microsoftonline.com/9a2d...
AZURE_MULTI_TENANT_APP_NAME c9pnugsnowflakepacint_1700096201187

Please Ctrl+Click the URL or copy and paste it into your browser. Azure Approval

Grant Access in Azure

Lets navigate to IAM so that we can give snowflake access to our storage account. Navigate IAM

Search for Storage Blob Data Contributor, select the role and click next. Storage Blob Contributor

Lets add the Snowflake member by selecting "select member" and search for our AZURE_MULTI_TENANT_APP_NAME that we got earlier from snowflake.

Add snowflake memeber

Once selected, click Review and Assign. Review and assign

Load the data

If you get an error when creating the stage.

Just wait 30 seconds to a minute and try again. Sometimes it takes Azure a bit to update security.

Lets setup the stage, file format, warehouse and finally load some json data.

use database raw;
use schema azure;
use role sysadmin;
use warehouse development;

/*
   Stages are synonymous with the idea of folders
   that can be either internal or external.
*/
create or replace stage azure
storage_integration = azure_integration
url = 'azure://<STORAGE NAME>.blob.core.windows.net/<CONTAINER NAME>' /* (1)! */ /* (2)! */
directory = ( enable = true);

/* 
    Create a file format so the "copy into"
    command knows how to copy the data.
*/
create or replace file format raw.azure.json
    type = 'json';

-- Create the table to load into.
create or replace table json (
    file_name varchar,
    data variant
);

-- Load the json file from the json folder.
copy into json(file_name,data)
from (
    select 
        metadata$filename,
        $1
    from
        @azure/json
        (file_format => json)
);

  1. Navigate into storage account

  2. Navigate into container

use database raw;
use schema azure;
use role sysadmin;
use warehouse development;

/*
   Stages are synonymous with the idea of folders
   that can be either internal or external.
*/
create or replace stage azure
    storage_integration = azure_integration
    url = 'azure://danielwilczak.blob.core.windows.net/data'
    directory = ( enable = true);

/* 
    Create a file format so the "copy into"
    command knows how to copy the data.
*/
create or replace file format raw.azure.json
    type = 'json';

-- Create the table to load into.
create or replace table json (
    file_name varchar,
    data variant
);

-- Load the json file from the json folder.
copy into json(file_name,data)
from (
    select 
        metadata$filename,
        $1
    from
        @azure/json
        (file_format => json)
);
file status
azure://danielwilczak.blob.core.windows.net/data/json/sample.json LOADED

Look at the data you just loaded.

select * from raw.azure.json; 

Result

Automatic Loading

Warning

If you have not manually loaded data yet from azure storage. Please go back and complete that section first.

Azure

Lets start with azure by creating a queue. All you need is a name. I've named mine danielwilczak-snowflake-queue. Create queue

Store the URL. It will be used later. Queue

Next lets create an event subscription. Event Subscription

If you get an error with the topic follow this prosess to enable it

Search for subscription. Subscription

Select your subscription. Subscription

Search "resource provider" and select it. Subscription

Search "EventGrid", select it, and the select register. This will enable it in your account. Subscription

Wait for it to say registered. Subscription

Enter in the highlighted fields. I've named mine danielwilczak-snowflake-event. Once entered click "configure endpoint". Event Input

Enter the storage account and selct your queue that we setup prior. Event Input

To finish the event, click "create". Create

Result Create

Snowflake

Open a worksheet and enter in your queue url and tenant_id.

use role accountadmin;

-- Setup the storage notification to know when storage events happen.
create or replace notification integration azure_snowpipe_integration
    enabled = true
    type = queue
    notification_provider = azure_storage_queue
    azure_storage_queue_primary_uri = '<QUEUE URL>' /* (1)! */
    azure_tenant_id = '<TENANT ID>'; /* (2)! */

-- Give the sysadmin access to use the integration.
grant usage on integration azure_snowpipe_integration to role sysadmin;

describe storage integration azure_integration;
select "property", 
case  when "property" = 'AZURE_MULTI_TENANT_APP_NAME' then split_part("property_value", '_', 1) else "property_value"end as "property_value"
from table(result_scan(last_query_id()))
where "property" in ('AZURE_CONSENT_URL', 'AZURE_MULTI_TENANT_APP_NAME');

  1. Queue URL

  2. Using the search bar, look up tenant properties. Tenant Properties

    Copy your Tenant ID. Tenant Properties

use role accountadmin;

-- Setup the storage notification to know when storage events happen.
create or replace notification integration azure_snowpipe_integration
    enabled = true
    type = queue
    notification_provider = azure_storage_queue
    azure_storage_queue_primary_uri = 'https://danielwilczak.queue.core.windows.net/danielwilczak-snowflake-queue'
    azure_tenant_id = '9a2d78cb-73e9-40ee-a558-fc1adfff57a7';

-- Give the sysadmin access to use the integration.
grant usage on integration azure_snowpipe_integration to role sysadmin;

describe notification integration azure_snowpipe_integration;
select "property", 
case  when "property" = 'AZURE_MULTI_TENANT_APP_NAME' then split_part("property_value", '_', 1) else "property_value"end as "property_value"
from table(result_scan(last_query_id()))
where "property" in ('AZURE_CONSENT_URL', 'AZURE_MULTI_TENANT_APP_NAME');
property PROPERTY_VALUE
AZURE_CONSENT_URL https://login.microsoftonline.com/9a2d7...
AZURE_MULTI_TENANT_APP_NAME tzpbnzsnowflakepacint_1700112906001

Please ctrl+click the url or enter it into your browser you will have to accept the integration on azure. Accept Integration

Grant Access in Azure

Using the left side search "iam" and select Access Control and "add role assignment". Navigate to role

Search for "storage queue data contributor" and select it and click next. NRole assignment

Click "Select members" and add your AZURE_MULTI_TENANT_APP_NAME that we recieved from Snowflake in the prior snowflake step. NRole assignment

Final Azure step, select review and assign and your done with azure. NRole assignment

Load the data

Note

Sometimes it may take 1-2 minutes before you see data in the table. This depends on how Azure is feeling today.

If you get an error when creating the pipe.

Just wait 30 seconds to a minute and try again. Sometimes it takes Azure a bit to update security.

We'll load a csv file by automating the creation of the table and infering the names in the csv pipe.

use role sysadmin;
use database raw;
use schema azure;
use warehouse development;

/*
    Copy CSV data using a pipe without having
    to write out the column names.
*/
create or replace file format infer
    type = csv
    parse_header = true
    skip_blank_lines = true
    field_optionally_enclosed_by ='"'
    trim_space = true
    error_on_column_count_mismatch = false;

/*
    Creat the table with the column names
    generated for us.
*/
create or replace table csv
    using template (
        select array_agg(object_construct(*))
        within group (order by order_id)
        from table(
            infer_schema(        
            LOCATION=>'@azure/csv'
        , file_format => 'infer')
        )
    );

/*
    Load the data and assign the pipe notification
    to know when a file is added.
*/
create or replace pipe csv 
    auto_ingest = true 
    integration = 'AZURE_SNOWPIPE_INTEGRATION' 
    as

    COPY into
        csv
    from
        @azure/csv

    file_format = (format_name= 'infer')
    match_by_column_name=case_insensitive;

/* 
    Refresh the state of the pipe to make
    sure it's updated with all files.
*/
alter pipe csv refresh;
File Status
/sample_1.csv SENT

Result

Note

Sometimes it may take 1-2 minutes before you see data in the table. This depends on how Azure is feeling today.

Lets add more sample data into the azure csv folder and see it added in snowflake ~30 seconds later. We can see this by doing a count on our table and see 20 records where th original csv only has 10 records.