Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dbt clone operator #1326

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Add dbt clone operator #1326

wants to merge 3 commits into from

Conversation

pankajastro
Copy link
Contributor

@pankajastro pankajastro commented Nov 15, 2024

Description

This PR introduces the DbtCloneOperator. For more details, refer to the dbt documentation: https://docs.getdbt.com/reference/commands/clone.

Testing

Airflow DAG

from datetime import datetime

from airflow import DAG
from cosmos import DbtSeedLocalOperator, DbtRunLocalOperator, DbtCloneLocalOperator, ProfileConfig

DBT_PROJ_DIR="/usr/local/airflow/dbt/jaffle_shop"

profile_config1=ProfileConfig(
profile_name="bigquery_dev",
target_name="dev",
profiles_yml_filepath="/usr/local/airflow/dbt/jaffle_shop/profiles.yml"
)

profile_config2=ProfileConfig(
profile_name="bigquery_clone",
target_name="dev",
profiles_yml_filepath="/usr/local/airflow/dbt/jaffle_shop/profiles.yml"
)


with DAG("test-id-1", start_date=datetime(2024, 1, 1), catchup=False) as dag:
    seed_operator = DbtSeedLocalOperator(
        profile_config=profile_config1,
        project_dir=DBT_PROJ_DIR,
        task_id="seed",
        dbt_cmd_flags=["--select", "raw_customers"],
        install_deps=True,
        append_env=True,
    )
    run_operator = DbtRunLocalOperator(
        profile_config=profile_config1,
        project_dir=DBT_PROJ_DIR,
        task_id="run",
        dbt_cmd_flags=["--models", "stg_customers"],
        install_deps=True,
        append_env=True,
    )

    clone_operator = DbtCloneLocalOperator(
        profile_config=profile_config2,
        project_dir=DBT_PROJ_DIR,
        task_id="clone",
        dbt_cmd_flags=["--models", "stg_customers", "--state", "/usr/local/airflow/dbt/jaffle_shop/target"],
        install_deps=True,
        append_env=True,
    )

    seed_operator >> run_operator >> clone_operator

DBT Profile

bigquery_dev:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service-account
      project: astronomer-dag-authoring
      dataset: bq_dev
      threads: 4 # Must be a value of 1 or greater
      keyfile: /usr/local/airflow/include/key.json
      location: US


bigquery_clone:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service-account
      project: astronomer-dag-authoring
      dataset: bq_clone
      threads: 4 # Must be a value of 1 or greater
      keyfile: /usr/local/airflow/include/key.json
      location: US

Airflow DAG Run

Screenshot 2024-11-15 at 6 06 50 PM

BQ data WH

Screenshot 2024-11-15 at 6 04 29 PM

Related Issue(s)

closes: #1268
closes: #878

Breaking Change?

No

Limitation

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

Copy link

cloudflare-workers-and-pages bot commented Nov 15, 2024

Deploying astronomer-cosmos with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6db1c27
Status: ✅  Deploy successful!
Preview URL: https://c7235df6.astronomer-cosmos.pages.dev
Branch Preview URL: https://dbt-clone.astronomer-cosmos.pages.dev

View logs

Copy link

codecov bot commented Nov 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.88%. Comparing base (8ec46d2) to head (2ecf1e1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1326      +/-   ##
==========================================
+ Coverage   95.85%   95.88%   +0.02%     
==========================================
  Files          67       67              
  Lines        3983     4009      +26     
==========================================
+ Hits         3818     3844      +26     
  Misses        165      165              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pankajastro pankajastro marked this pull request as ready for review November 15, 2024 13:36
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 15, 2024
@dosubot dosubot bot added area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc execution:local Related to Local execution environment profile:bigquery Related to BigQuery ProfileConfig labels Nov 15, 2024
Comment on lines +166 to +172
class DbtCloneAwsEksOperator(DbtAwsEksBaseOperator, DbtCloneKubernetesOperator):
"""
Executes a dbt core clone command.
"""

def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value is the constructor giving us? Would it make sense to do something like this:

Suggested change
class DbtCloneAwsEksOperator(DbtAwsEksBaseOperator, DbtCloneKubernetesOperator):
"""
Executes a dbt core clone command.
"""
def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
class DbtCloneAwsEksOperator(DbtAwsEksBaseOperator, DbtCloneKubernetesOperator):
"""
Executes a dbt core clone command.
"""
pass

Executes a dbt core clone command.
"""

def __init__(self, *args: Any, **kwargs: Any):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same feedback given on https://github.com/astronomer/astronomer-cosmos/pull/1326/files#r1843904192 applies here and in the other DbtClone operators

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankajastro it's great to have you back!

Would it make sense to add the example DAG you used to test as one of Cosmos' integration DAGs?

Please, could you add documentation about this feature? I believe we may be lacking docs on how people can use DbtBuild operators - we could cover both in the same place.

Last but not least, it may be worth to have a test related to full refresh. Some Cosmos operators support full refresh, others don't. It seems clone supports this, so it would be great to make sure the interfaces are consistent with running models and seeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc execution:local Related to Local execution environment profile:bigquery Related to BigQuery ProfileConfig size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support dbt clone [feature] support for dbt clone or dbt build ? DbtCloneOperator
2 participants