Skip to content

Openflow - Google Sheets

Goal of this tutorial is to load data from a google sheet into Snowflake via openflow.

Video

Video still in development

Requirements

  • You can NOT be on a trial account. (Link)
  • Snowflake account has to be in an AWS region.(Link)
  • Google cloud account, you can setup a free account to get started.

Snowflake

Lets start the snowflake setup by going into a worksheet (1) and creating the nesseray objects for openflow and the connector.

  1. Worksheet
If you don't have a database, schema or warehouse yet.
use role sysadmin;

-- Create a database to store our schemas.
create database if not exists raw;

-- Create the schema. The schema stores all objects.
create schema if not exists raw.google;
create schema if not exists raw.network;

/*
    Warehouses are synonymous with the idea of compute
    resources in other systems. We will use this
    warehouse to query our integration and to load data.
*/
create warehouse if not exists openflow 
    warehouse_size = xsmall
    auto_suspend = 30
    initially_suspended = true;

Only required if your hosting openflow in Snowflake (SPCS)

Lets create the network rule and external access that will allow openflow/snowflake to talk with google sheets.

-- create network rule for google apis
create or replace network rule google_api_network_rule
    mode = egress
    type = host_port
    value_list = (
        'admin.googleapis.com',
        'oauth2.googleapis.com',
        'www.googleapis.com',
        'sheets.googleapis.com'
    );

-- Create one external access integration with all network rules.
create or replace external access integration openflow_external_access
    allowed_network_rules = (google_api_network_rule)
    enabled = true;

Statement executed successfully.

Openflow

In this tutorial we assume you already have a deployment. If you have not it only takes a few minutes. Please follow one of this three options.

  • SPCS also know as Snowflake hosted. (Link)
  • AWS with Snowflake managed VPC (Link)
  • AWS self hosted VPC (link)

Runtime

Inside openflow click runtimes. UPDATE

Click "+ Create runtime". UPDATE

Select your deployment, next give your runtime a name, and select accountadmin as the role. Scroll down to extrnal acess. UPDATE

Only required if your hosting openflow in Snowflake (SPCS)

select the external access we create in our worksheet and finally click "Create". UPDATE

Now your runtime will start being created. Lets head to the connectors UPDATE

Install Connector

On the connectors list, install the google sheets connector. UPDATE

Select your runtime to install, you may have to wait for the runtime to finilize. UPDATE

Once selected click "add". The connector will be added and you'll be required to login to get into the underliying container. UPDATE

If you get the error 'Invalid consent request' or 'TOTP Invalid'

You will have to change your default role to a role that is not an admin role. Example default would be public. UPDATE

Click allow. UPDATE

Now you should see the openflow canvas with the google sheets connector block. We will switch to Google sheets here to get some information to setup our connector. UPDATE

Google Sheets

Note

To setup google make sure you have Super Admin permissions, we will need this to create a service account.

Ensure that you are in the project associated with your organization, not the project in your organization. UPDATE

Orginization Policy

Search "Orginization Policies", next click in the fliter box and search "Disable service account key creation". Select the managed constraint. UPDATE

Click Mange policy. UPDATE

Click "Override parent policy", edit rule and select off for enforcement. Next click done and set policy. UPDATE

Set the policy. UPDATE

Service account

Search "Service account" and once on the page click "+ Create service account". UPDATE

Give your service account a name and description and click done. UPDATE

Now copy your email, we will use this later. Once copied click the service account. UPDATE

Go to keys, add key and create a new key. UPDATE

We'll want the JSON key. This key will be added to openflow later. Click Create. UPDATE

It will download a JSON file. UPDATE

Enable Sheets API

Next we'll want to enable the API so that Snowflake can talk with the API. Search "google sheets api". UPDATE

Once on the API page, click enable. UPDATE

Once enabled you will be able to see usage metrics. UPDATE

Share with service account

Next we'll share the google sheet with the service account email (1) we copied earlier. Click the share button.

  1. service account email

UPDATE

Insert the email and click send. UPDATE

Click share anyway. UPDATE

Connector Configuration

Lets head back to openflow and right click the google sheet connector and then parameters. UPDATE

Here we will see three sections where we will have to enter in our configiration paramaters into. UPDATE

Destination Parameters

Lets click the three dots on the right side of the destination paramters. UPDATE

If your using SPCS deployment

As an example we'll click the three dots and click edit. We'll put the database, schema, role and warehouse. UPDATE

One special paramter is the "Snowflake Authentication Strategy" with container service you can put in "SNOWFLAKE_SESSION_TOKEN" and it will not require a key_pair. UPDATE

This is an example input if you used the configurations given at the beginning of the tutorial. UPDATE

If your using BYOC deployemnt

These are the paramaters you'll need to be filled out. We will see how to get them below. UPDATE

To get the Snowflake Account Identifier, you'll go to the bottom left of the homepage and click account details. UPDATE

You'll copy your account identifier and paste it in openflow. UPDATE

Next to get your key we'll have to generate a public and private key and apply it to our user. To generate the key run this bash script.

1
2
3
openssl genrsa 2048 > rsa_key_pkcs1.pem
openssl pkcs8 -topk8 -inform PEM -in rsa_key_pkcs1.pem -out rsa_key.p8 -nocrypt
openssl rsa -in rsa_key_pkcs1.pem -pubout -out rsa_key.pub
rsa_key_pkcs1.pem
rsa_key.p8
rsa_key.pub

This will generate three file. We will apply the content of the .pub file to our Snowflake user using the alter user command.

alter user danielwilczak set rsa_public_key='PUBLIC KEY';
alter user danielwilczak set rsa_public_key='MIIBIjANBgkqhki...6VIhVnTviwDXXcm558uMrJQIDAQAB';

Statement executed successfully.

Now that we have our public key on our user we'll move to uploading the .pem file to openflow. UPDATE

Click the upload button. UPDATE

Click the upload button again and select your .pem file. UPDATE

Once uploaded select the key and click "ok". UPDATE

Ingestion Parameters

Next for the "Ingestion Parameters" we'll need the sheet name and the sheet range. You'll select the range you want and copy it and the sheet name. UPDATE

Next we'll want to copy the Spreadsheet ID from the URL. UPDATE

Now we'll enter in what we want the table name to be. The sheet name and range (1) in the format "Sheet_name!range" and finally the speadsheet ID (2) we got from the URL. Click Apply.

  1. Range and sheet name

  2. Speadsheet ID

UPDATE

Source Parameters

Next we'll want to copy the contents of our service account JSON file. I opened my file by dragging it to the browser. UPDATE

We'll enter that in to our service account json form. Once entered it will show as "Sensitive value set". UPDATE

Now we're done with the paramaters. We can go back to the process group to get it started. UPDATE

Run the connector

Lets right click the connector, enable "All controller services" and then start the connector. Now since the data is small it will load into Snowflake in seconds. UPDATE

And we're done, once loaded you will be able to see it in your database/schema with the table name we set prior. UPDATE

Scenarios

Here we will cover some what if scenarios with the data.

Multiple Spreadsheets / Sheets

Most users will want to ingest multiple spreedsheets with potentially multiple sheets in a spreadsheet. To accomplish this you'll want to go back and add anther Google sheet connector to the runtime. UPDATE

You'll want to right click your new connector. Click configure. UPDATE

This is where you can rename both to give them unique names to differentiate them. UPDATE

After we've named them we'll want to go into our new connector and go to paramaters. UPDATE

We'll select "Inheritance" in the top menu and select the destination and google source parameters because these are going to be the same for all our google sheets connectors. UPDATE

Next we'll go back to our "parameters" and add our next google spreadsheet which has two sheets so we'll add both names with both ranges by seperating them with a comma. UPDATE

Finally we'll start the second connector by right click start and then we'll be able to see the result in our Snowflake table. UPDATE

Empty Cells

In this scenario the question is what happen when cells are empty but you include them in your range. UPDATE

If you add the cells and they are blank it will not show up in the google table. UPDATE

Now we'll add values to the cells and run the pipeline again. UPDATE

You'll see the pipeline refreshes the whole table and includes the values that are not included. UPDATE

Schema Change

In this scenario we'll see what will happen when we have schema changes. UPDATE

When we change the cell range to include the cells the empty C column it will not be included. UPDATE

Here we we fill in the C column but keep the bottom row empty. UPDATE

We see the column get added and the cell will be set to null. UPDATE