SFTP to Snowflake Stage
In this tutorial we will show how you can takes files from an sftp and load them into a Snowflake stage to later be loaded into a table.
Video
Requirement
You must have a Snowflake account that is not on a trial. This is a requirement of container services, not the tutorial.
Download needed files:
- Notebook (Link)
Setup
Lets start the network setup prcoess in Snowflake.
If you don't have a database, schema or warehouse yet.
use role sysadmin;
-- Create a database to store our schemas.
create database if not exists raw;
-- Create the schema. The schema stores all our objectss.
create schema if not exists raw.sftp;
/*
Warehouses are synonymous with the idea of compute
resources in other systems. We will use this
warehouse to call our user defined function.
*/
create warehouse if not exists development
warehouse_size = xsmall
initially_suspended = true;
use database raw;
use schema sftp;
use warehouse development;
First lets start by setting up the network rules, stage and compute pool in a worksheet to allow our Snowflake Notebook to talk with our external source.
Notebook Creation
Next lets import the example notebook in our database / schema.
Next lets assign it to our database/schema
Enable external access
To enable our notebook to talk outside of Snowflake we'll have to enable the notebook to use that external access we created earlier. We can do this by going to Notebooks settings.
Next clicking "external access" and then checking our external access.
Check your SFTP external access and pypi external access to give the notebook access to talk with the SFTP endpoint and PyPI to download the sftp python package.
Secret Management
Once the example is working I suggest to keep the password stored as a secret instead of plane text via the secret object.
Update SFTP parameters
We'll want to update our sftp python function input parameters with the hostname,username and password. After that put in your file you want to load into a stage. It can be either a single file or a ZIP which will be unzipped and loaded into a folder in the stage.
Finally we'll click "run all" and see our notebook start, it may take up to 5 minutes to start the compute pool, we can see the status in the bottom right.
Result
Once finished we will see that the file we selected is loaded into the stage via the ls
command to the stage
We can also see that the file is in the stage via the UI. The idea here is that we can schedule the notebook to load the files from the stftp and then use a child task to load the file via a copy into coammand on a schedule.