Iceberg - Google Cloud Storage
Goal of this tutorial is to setup the an external volume in Snowflake to be able to create and manage iceberg table.
Video
Video still in development.
Requirements
- Snowflake account, you can use a free trial. We also assume no complex security needs.
- Google cloud account, you can setup a free account to get started.
Setup
Warning
Your GCP bucket and Snowflake acount have to be in the same region to be able to create iceberg tables.
Lets start by setting up a Snowflake connection to Google Cloud Storage. After that we'll create and load data into some Iceberg tables.
Google Cloud
Sign into your google account.
Create project
If you don't have a project, start by selecting/creating a project.
In our case we'll call the project danielwilczak
and select the default no orginization
for the locaition.
Create cloud storage
We will select our new project and click cloud storage
to create a storage bucket.
Click create
or create bucket
.
I'm going to name the bucket danielwilczak
as well. Copy this name, we will use it later.
Snowflake
Let's setup snowflake by creating a worksheet in snowflake and add the code below with your bucket name from earlier and hit run.
If you don't have a database, schema or warehouse yet.
property | URL |
---|---|
STORAGE_LOCATION_1 | jtongh...k@prod3-f617.iam.gserviceaccount.com |
Grant Access in Google Cloud
Lets navigate to IAM so that we can give snowflake access to our storage account.
Fill in the role information. We will call it snowflake
. After that click Add Permissions
.
The permissions to select can be found on Snowflake's documentation. In this tutorial I have choosen Data loading and unloading
. I have also provided a gif to show how to select the permissions because the user interface is terrible.
Navigate back to our bucket. Click permissions
, followed by add principle
.
In the new principles section, add your STORAGE_GCP_SERVICE_ACCOUNT given by Snowflake earlier.
Now add your role by clicking select role
-> custom
-> snowflake
. The last one will be your role name.
If you get a 'Domain restricted sharing' error when you click 'Save'.
If you run into this error it's because google cloud has updated thier policy as of March 2024. We'll have to update them. First select your orginization (not your project), then go to IAM in the search, followed by clicking "grant access".
Next we'll add our user email into the new principals area. We'll search and click on "Organization Policy Administrator".
Next we'll want to update the policy. By searching IAM, selecting orgianization policies, searching domain and clicking on "Domain restricted sharing".
Note
"Allow All" is the simple approach but feel free to use more fine grain approach via Snowflake documentation.
We'll want to overide the parent policy with a new rule. Select replace the policy and then select "Allow All". Click done and "Set Polcy." and your good to go.
The policy has been updated and you can retry adding the role to the new principal.
Click Save
and your finished with Google Cloud for manual loading.
Iceberg Table
Lets create a table and add data to it.
Once the table is created we can query data living in the table that really lives in our GCP bucekt.
X | Y |
---|---|
1 | 2 |
3 | 4 |
5 | 6 |
We can also look in our bucket under the folder "iceberg" to see our metadata and data stored in parquet files.