Add dbt-cloud integration command to dp cli#99
Add dbt-cloud integration command to dp cli#99rdziadosz wants to merge 10 commits intogetindata:developfrom
Conversation
|
We will also need upgrade documentation docs dir in this repo |
|
I don't see any documentation. I have no idea how dbtcloud.yml should look like. |
| credentials_id = client.create_credentials(environment["dataset"], project_id) | ||
| else: | ||
| credentials_id = None | ||
| environment_id = client.create_environment(project_id, environment["type"], environment["name"], |
There was a problem hiding this comment.
I would extract the IF + env creation to separate method create_environment
docs/configuration.rst
Outdated
| * - schedule_interval | ||
| - string | ||
| - The cron expression with which the example job will be run | ||
| * - default_gcp_project |
There was a problem hiding this comment.
I think there shouldn't be a default one, it always should be taken from bigquery.yml
| * - dataset | ||
| - string | ||
| - Target dataset for this environment | ||
| * - dbt_version |
There was a problem hiding this comment.
why is it per environment? Can we make it global?
There was a problem hiding this comment.
In dbt Cloud this is set for each environment: https://docs.getdbt.com/docs/collaborate/environments/dbt-cloud-environments#common-environment-settings We could make the setting the same for all environments, but this way we would be limited to one version. I assume that someone might want to test the code on a separate environment e.g. before upgrading dbt for the whole project. Are you sure I should make such a change?
docs/configuration.rst
Outdated
| * - dbt_version | ||
| - string | ||
| - The dbt version used in this environment | ||
| * - bq_config_dir |
There was a problem hiding this comment.
I would either remove "bq_" prefix or use env name as dev/prod etc.
docs/configuration.rst
Outdated
| * - name | ||
| - string | ||
| - Name of the environment that will be created in dbt Cloud | ||
| * - dataset |
There was a problem hiding this comment.
dataset should also be taken from bigquery.yml
| - Array | ||
| - Details of the environments to be created in dbt Cloud | ||
|
|
||
| Configuration of the environments: |
| bq_config = read_bigquery_config(environment["bq_config_dir"]) | ||
| environments_projects[environment["name"]] = bq_config["project"] | ||
|
|
||
| client.create_environment_variable(project_id, dbtcloud_config["default_gcp_project"], |
There was a problem hiding this comment.
you don't need default_gcp_project. This default is in "base" config dir and you already read it in read_bigquery_config method.
There was a problem hiding this comment.
Okay, I changed it to be taken from "base." config I assume that the only use of this value would be if someone added another environment and did not update this environment variable.
| dbtcloud_config = read_dbtcloud_config() | ||
| file = open(keyfile) | ||
| keyfile_data = json.load(file) | ||
| project_id = client.create_project(dbtcloud_config["project_name"]) |
There was a problem hiding this comment.
I would change the name of vaialble into dbtcloud_project_id
| for environment in dbtcloud_config["environments"]: | ||
| environment_id = create_environment(client, environment, project_id) | ||
| if environment["type"] == "deployment": | ||
| client.create_job(project_id, environment_id, dbtcloud_config["schedule_interval"], |
There was a problem hiding this comment.
schedule interval could be per environment I think.
| new_env = { | ||
| "env_var": env_var | ||
| } | ||
| print(new_env) |
There was a problem hiding this comment.
please remove print or replace with logging
| client.create_job(project_id, environment_id, dbtcloud_config["schedule_interval"], | ||
| "Job - " + environment["name"]) | ||
| bq_config = read_bigquery_config(environment["bq_config_dir"]) | ||
| environments_projects[environment["name"]] = bq_config["project"] |
There was a problem hiding this comment.
does it resolve the project properly? In bigquery.yml we have project: "{{ env_var('GCP_PROJECT') }}" currently. It should be taken from env durring deployment isn't it?
There was a problem hiding this comment.
I added resolving env vars using dbt show command.
# Environment variable
# Code formatting
# Documentation
# remove resolving jinja / env vars
# code formatting
Adds a command configure-cloud to dp cli that creates dbt Cloud project with configured connection to BigQuery.