diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index 944150f..50ba613 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -783,5 +783,7 @@ AVRO Clob INOUT WSS - +executability +NTZ +snowflakecomputing diff --git a/docs/connectors/supported-data-sources.md b/docs/connectors/supported-data-sources.md index dc148bd..2ffa1af 100644 --- a/docs/connectors/supported-data-sources.md +++ b/docs/connectors/supported-data-sources.md @@ -412,7 +412,16 @@ The beta version of the data sources is in public preview and has passed the bas ➖ 2.0.13 and above - + + Snowflake + ✅ + ➖ + ➖ + ✅ + ➖ + N/A + + StarRocks ➖ ➖ diff --git a/docs/connectors/warehouses-and-lake/snowflake.md b/docs/connectors/warehouses-and-lake/snowflake.md new file mode 100644 index 0000000..66cdf07 --- /dev/null +++ b/docs/connectors/warehouses-and-lake/snowflake.md @@ -0,0 +1,137 @@ +# Snowflake + +[Snowflake](https://www.snowflake.com/) is a fully managed cloud-native data warehouse that provides elastic, scalable compute and storage capabilities. Tapdata currently provides the Snowflake connector, which supports using it as a **source or target** database. This helps you quickly centralize data from multiple sources to the cloud, providing real-time data flow support for building enterprise cloud data warehouses, sharing data, and enabling agile data analytics. + + +```mdx-code-block +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +``` + +## Supported Data Types + +| Category | Data Types | +| ----- | ---------------------------- | +| Numeric | NUMBER, FLOAT | +| String | TEXT | +| Binary | BINARY, FILE | +| Boolean | BOOLEAN | +| Date/Time | DATE, TIME, TIMESTAMP_NTZ, TIMESTAMP_TZ | +| Complex Types | OBJECT, ARRAY | + +## SQL Operations for Sync + +INSERT, UPDATE, DELETE + +:::tip + +- When used as a source database, incremental data synchronization needs to be implemented through field polling and does not support capturing DDL operations. For more details, see [Change Data Capture (CDC)](../../introduction/change-data-capture-mechanism.md). +- When used as a target database, you can also configure DML write strategies through the advanced settings of the task node, such as whether to convert insert conflicts to updates. + +::: + +## Preparations + +1. Ensure that the server where Tapdata is deployed can access the Snowflake service, specifically the domain: `snowflakecomputing.com`. + +2. Log in to the Snowflake database and execute the following commands to create an account and role for data synchronization. + + ```sql + -- Please replace role_name, username, password, warehouse_name, database_name, schema_name with actual values + CREATE ROLE IF NOT EXISTS ; + + CREATE USER + PASSWORD = '' + DEFAULT_ROLE = + DEFAULT_WAREHOUSE = + DEFAULT_NAMESPACE = . + MUST_CHANGE_PASSWORD = FALSE; + + GRANT ROLE TO USER ; + ``` + +3. Grant permissions to the account we just created. You can also set more granular permissions control based on business needs. + + ```mdx-code-block + + + ``` + + ```sql + -- Please replace warehouse_name, database_name, schema_name, role_name according to the tips below + + -- Grant access to the compute resource, database, and schema + GRANT USAGE ON WAREHOUSE TO ROLE ; + GRANT USAGE ON DATABASE TO ROLE ; + GRANT USAGE ON SCHEMA . TO ROLE ; + + -- Grant query permissions on existing and future tables in the schema + GRANT SELECT ON ALL TABLES IN SCHEMA . TO ROLE ; + GRANT SELECT ON FUTURE TABLES IN SCHEMA . TO ROLE ; + ``` + + + + + ```sql + -- Please replace warehouse_name, database_name, schema_name, role_name according to the tips below + -- Grant access to the compute resource, database, and schema + GRANT USAGE ON WAREHOUSE TO ROLE ; + GRANT USAGE ON DATABASE TO ROLE ; + GRANT USAGE ON SCHEMA . TO ROLE ; + + -- Grant permission to create tables in the schema (used for automatic table creation during sync) + GRANT CREATE TABLE ON SCHEMA . TO ROLE ; + + -- Grant DML permissions on existing tables in the schema (TRUNCATE is used for full refresh scenarios) + GRANT SELECT, INSERT, UPDATE, DELETE, TRUNCATE + ON ALL TABLES IN SCHEMA . + TO ROLE ; + + -- Grant DML permissions on future tables to ensure new tables can be written without re-authorization + GRANT SELECT, INSERT, UPDATE, DELETE, TRUNCATE + ON FUTURE TABLES IN SCHEMA . + TO ROLE ; + ``` + + + +## Connect to Snowflake + +1. Log into the TapData platform. + +2. In the left navigation bar, click **Connections**. + +3. On the right side of the page, click **Create**. + +4. In the pop-up dialog, search for and select **Snowflake**. + +5. On the page that redirects, fill in the Snowflake connection information as described below. + + ![Connect to Snowflake](../../images/connect_snowflake.png) + + - **Basic Settings** + - **Name**: Enter a meaningful and unique name. + - **Type**: Supports using Snowflake as a source or target database. + - **Account**: The Snowflake account identifier. For how to obtain it, see the [Snowflake documentation](https://docs.snowflake.com/en/user-guide/admin-account-identifier). + - **User**: The Snowflake username with connection privileges. + - **Password**: The password for the username. + - **Warehouse**: The name of the compute warehouse to use for the connection. + - **Database**: The name of the database to connect to. + - **Schema**: The schema name in the database. Defaults to **PUBLIC**. Manually modify it if you need to use another schema. + - **Role**: Optional. If left empty, the default role configured for the user in Snowflake will be used. + - **Timezone**: The default timezone is 0 UTC. Changing to a different timezone will affect the synchronization of fields without timezone information. + + - **Advanced Settings** + - **Include Tables**: By default, all tables are included. You can choose to customize and specify the tables to include, separated by commas. + - **Exclude Tables**: When enabled, you can specify tables to exclude, separated by commas. + - **Agent Settings**: The default is automatic assignment by the platform. You can also manually specify an Agent. + - **Model Load Time**: If there are less than 10,000 models in the data source, their schema will be updated every hour. But if the number of models exceeds 10,000, the refresh will take place daily at the time you have specified. + +6. Click **Test** at the bottom of the page. After passing the test, click **Save**. + + :::tip + + If the connection test fails, please follow the prompts on the page to resolve the issue. + + ::: \ No newline at end of file diff --git a/docs/images/connect_snowflake.png b/docs/images/connect_snowflake.png new file mode 100644 index 0000000..9d9899b Binary files /dev/null and b/docs/images/connect_snowflake.png differ diff --git a/sidebars.js b/sidebars.js index 5893cd0..d5d7c58 100644 --- a/sidebars.js +++ b/sidebars.js @@ -75,6 +75,7 @@ const sidebars = { 'connectors/warehouses-and-lake/hudi', 'connectors/warehouses-and-lake/paimon', 'connectors/warehouses-and-lake/selectdb', + 'connectors/warehouses-and-lake/snowflake', 'connectors/warehouses-and-lake/starrocks', 'connectors/warehouses-and-lake/tablestore', 'connectors/warehouses-and-lake/yashandb',