Skip to content

External Tables - Google Cloud Storage

Goal of this tutorial is to setup a Snowflake external table on files that are stored in an external google cloud storage bucket.

Video

Video in development.

Requirements

  • Snowflake account, you can use a free trial. We also assume no complex security needs.
  • Google cloud account, you can setup a free account to get started.

Download

Setup

Warning

This tutorial assumes you have already setup a stage with read/write privligies. If you have not please follow this tutorial.

In this section we will upload the new sample files to a GCP folder and then setup Snowflake.

Add to bucket

External Table

Let's not start to create our table and see the options we can have with regard to our table.

Simple

Lets create the table on top of the sample files we have stored in the bucket / stage.

create or replace file format customers_csv_format
    type = 'csv'
    field_optionally_enclosed_by = '"'
    skip_header = 1;

create or replace external table ext_customers (
    customer_id number as (value:c1::number),
    first_name string as (value:c2::string),
    last_name string as (value:c3::string),
    email string as (value:c4::string)
)
    location = @gcp/customers
    file_format = customers_csv_format
    auto_refresh = false;
status
Table EXT_CUSTOMERS successfully created.

Now you'll be able to query the data in the table as if it were a native Snowflake table. The data will not be refreshed automaticlly in this setup, it has to be triggered manually via:

alter external table external_customers refresh;
file status description
customers/2025-04-21T17-20-44-970197.csv REGISTERED_NEW File registered successfully.
customers/2025-04-21T17-20-44-932488.csv REGISTERED_NEW File registered successfully.
customers/2025-04-21T17-20-44-919172.csv REGISTERED_NEW File registered successfully.
customers/2025-04-21T17-20-44-944873.csv REGISTERED_NEW File registered successfully.
customers/2025-04-21T17-20-44-957653.csv REGISTERED_NEW File registered successfully.

Partition by

After setting up our file format and querying larger, more complex datasets, we can improve performance by partitioning the external table. Partitioning by a column like file_created_at helps Snowflake prune files during queries, reducing scan time.

create or replace external table external_customers (
    customer_id number as (value:c1::number),
    first_name string as (value:c2::string),
    last_name string as (value:c3::string),
    email string as (value:c4::string),
    row_created_at timestamp_ntz as (value:c5::timestamp),
    file_created_at date as cast(split_part(split_part(metadata$filename, '/', -1), 'T', 1) as date)
)
    partition by (file_created_at)
    location = @gcp/customers
    file_format = customers_csv_format
    auto_refresh = false;
status
Table EXTERNAL_CUSTOMERS successfully created.

Auto-Refreshing

Warning

This sections assumes you have already setup event notification system via pub/sub on the stage/bucket. If you have not please follow this tutorial.

Now that we have the data partitioned correctly we probably want to be able to see new data as it is loaded into the folder. To do se we will change the autorefresh to true. This assumes we already have event notifications setup similar to Snowpipe.

create or replace external table external_customers (
    customer_id number as (value:c1::number),
    first_name string as (value:c2::string),
    last_name string as (value:c3::string),
    email string as (value:c4::string),
    row_created_at timestamp_ntz as (value:c5::timestamp),
    file_created_at date as cast(split_part(split_part(metadata$filename, '/', -1), 'T', 1) as date)
)
    partition by (file_created_at)
    location = @gcp/customers
    file_format = customers_csv_format
    auto_refresh = true;
status
Table EXTERNAL_CUSTOMERS successfully created.