Openflow - Google Sheets

Goal of this tutorial is to load data from a google sheet into Snowflake via openflow.

Video

Video still in development

Requirements

You can NOT be on a trial account. (Link)
Snowflake account has to be in an AWS region.(Link)
Google cloud account, you can setup a free account to get started.

Snowflake

Lets start the snowflake setup by going into a worksheet (1) and creating the nesseray objects for openflow and the connector.

If you don't have a database, schema or warehouse yet.

Database, schema and warehouse

use role sysadmin;

-- Create a database to store our schemas.
create database if not exists raw;

-- Create the schema. The schema stores all objects.
create schema if not exists raw.google;
create schema if not exists raw.network;

/*
    Warehouses are synonymous with the idea of compute
    resources in other systems. We will use this
    warehouse to query our integration and to load data.
*/
create warehouse if not exists openflow 
    warehouse_size = xsmall
    auto_suspend = 30
    initially_suspended = true;

Only required if your hosting openflow in Snowflake (SPCS)

Lets create the network rule and external access that will allow openflow/snowflake to talk with google sheets.

Example Result

-- create network rule for google apis
create or replace network rule google_api_network_rule
    mode = egress
    type = host_port
    value_list = (
        'admin.googleapis.com',
        'oauth2.googleapis.com',
        'www.googleapis.com',
        'sheets.googleapis.com'
    );

-- Create one external access integration with all network rules.
create or replace external access integration openflow_external_access
    allowed_network_rules = (google_api_network_rule)
    enabled = true;

Statement executed successfully.

Openflow

In this tutorial we assume you already have a deployment. If you have not it only takes a few minutes. Please follow one of this three options.

SPCS also know as Snowflake hosted. (Link)
AWS with Snowflake managed VPC (Link)
AWS self hosted VPC (link)

Runtime

Inside openflow click runtimes.

Click "+ Create runtime".

Select your deployment, next give your runtime a name, and select accountadmin as the role. Scroll down to extrnal acess.

Only required if your hosting openflow in Snowflake (SPCS)

select the external access we create in our worksheet and finally click "Create".

Now your runtime will start being created. Lets head to the connectors

Install Connector

On the connectors list, install the google sheets connector.

Select your runtime to install, you may have to wait for the runtime to finilize.

Once selected click "add". The connector will be added and you'll be required to login to get into the underliying container.

If you get the error 'Invalid consent request' or 'TOTP Invalid'

You will have to change your default role to a role that is not an admin role. Example default would be public.

Click allow.

Now you should see the openflow canvas with the google sheets connector block. We will switch to Google sheets here to get some information to setup our connector.

Google Sheets

Note

To setup google make sure you have Super Admin permissions, we will need this to create a service account.

Ensure that you are in the project associated with your organization, not the project in your organization.

Orginization Policy

Search "Orginization Policies", next click in the fliter box and search "Disable service account key creation". Select the managed constraint.

Click Mange policy.

Click "Override parent policy", edit rule and select off for enforcement. Next click done and set policy.

Set the policy.

Service account

Search "Service account" and once on the page click "+ Create service account".

Give your service account a name and description and click done.

Now copy your email, we will use this later. Once copied click the service account.

Go to keys, add key and create a new key.

We'll want the JSON key. This key will be added to openflow later. Click Create.

It will download a JSON file.

Enable Sheets API

Next we'll want to enable the API so that Snowflake can talk with the API. Search "google sheets api".

Once on the API page, click enable.

Once enabled you will be able to see usage metrics.

Next we'll share the google sheet with the service account email (1) we copied earlier. Click the share button.

Insert the email and click send.

Click share anyway.

Connector Configuration

Lets head back to openflow and right click the google sheet connector and then parameters.

Here we will see three sections where we will have to enter in our configiration paramaters into.

Destination Parameters

Lets click the three dots on the right side of the destination paramters.

If your using SPCS deployment

As an example we'll click the three dots and click edit. We'll put the database, schema, role and warehouse.

One special paramter is the "Snowflake Authentication Strategy" with container service you can put in "SNOWFLAKE_SESSION_TOKEN" and it will not require a key_pair.

This is an example input if you used the configurations given at the beginning of the tutorial.

If your using BYOC deployemnt

These are the paramaters you'll need to be filled out. We will see how to get them below.

To get the Snowflake Account Identifier, you'll go to the bottom left of the homepage and click account details.

You'll copy your account identifier and paste it in openflow.

Next to get your key we'll have to generate a public and private key and apply it to our user. To generate the key run this bash script.

Code Result

openssl genrsa 2048 > rsa_key_pkcs1.pem
openssl pkcs8 -topk8 -inform PEM -in rsa_key_pkcs1.pem -out rsa_key.p8 -nocrypt
openssl rsa -in rsa_key_pkcs1.pem -pubout -out rsa_key.pub

rsa_key_pkcs1.pem
rsa_key.p8
rsa_key.pub

This will generate three file. We will apply the content of the .pub file to our Snowflake user using the alter user command.

Code Example Result

alter user danielwilczak set rsa_public_key='PUBLIC KEY';

alter user danielwilczak set rsa_public_key='MIIBIjANBgkqhki...6VIhVnTviwDXXcm558uMrJQIDAQAB';

Statement executed successfully.

Now that we have our public key on our user we'll move to uploading the .pem file to openflow.

Click the upload button.

Click the upload button again and select your .pem file.

Once uploaded select the key and click "ok".

Ingestion Parameters

Next for the "Ingestion Parameters" we'll need the sheet name and the sheet range. You'll select the range you want and copy it and the sheet name.

Next we'll want to copy the Spreadsheet ID from the URL.

Now we'll enter in what we want the table name to be. The sheet name and range (1) in the format "Sheet_name!range" and finally the speadsheet ID (2) we got from the URL. Click Apply.

Source Parameters

Next we'll want to copy the contents of our service account JSON file. I opened my file by dragging it to the browser.

We'll enter that in to our service account json form. Once entered it will show as "Sensitive value set".

Now we're done with the paramaters. We can go back to the process group to get it started.

Run the connector

Lets right click the connector, enable "All controller services" and then start the connector. Now since the data is small it will load into Snowflake in seconds.

And we're done, once loaded you will be able to see it in your database/schema with the table name we set prior.

Scenarios

Here we will cover some what if scenarios with the data.

Multiple Spreadsheets / Sheets

Most users will want to ingest multiple spreedsheets with potentially multiple sheets in a spreadsheet. To accomplish this you'll want to go back and add anther Google sheet connector to the runtime.

You'll want to right click your new connector. Click configure.

This is where you can rename both to give them unique names to differentiate them.

After we've named them we'll want to go into our new connector and go to paramaters.

We'll select "Inheritance" in the top menu and select the destination and google source parameters because these are going to be the same for all our google sheets connectors.

Next we'll go back to our "parameters" and add our next google spreadsheet which has two sheets so we'll add both names with both ranges by seperating them with a comma.

Finally we'll start the second connector by right click start and then we'll be able to see the result in our Snowflake table.

Empty Cells

In this scenario the question is what happen when cells are empty but you include them in your range.

If you add the cells and they are blank it will not show up in the google table.

Now we'll add values to the cells and run the pipeline again.

You'll see the pipeline refreshes the whole table and includes the values that are not included.

Schema Change

In this scenario we'll see what will happen when we have schema changes.

When we change the cell range to include the cells the empty C column it will not be included.

Here we we fill in the C column but keep the bottom row empty.

We see the column get added and the cell will be set to null.