Releases · astronomer/astro-sdk

16 Sep 23:45

kaxil

1.1.0b2

0a39c84

1.1.0b2 Pre-release

Pre-release

Features

Add native autodetect schema feature (#780)
Allow users to disable auto addition of inlets/outlets via airflow.cfg (#858)

Improvements

Avoid loading whole file into memory with load_operator for schema detection (#805)
Directly pass the file to native library when native support is enabled (#802)

Bug fixes

Add compat module for typing execute context in operators (#770)
Fix sql injection issues (#807)
Stop generating Datasets for temp tables (#862)(#871)

Docs

Update quick start example (#819)
Add links to docs from README (#832)
Fix Astro CLI doc link (#842)
Add docs on Dataset (AIP-48) support (#852)
Add configuration details from settings.py (#861)

Assets 2

10 Sep 00:48

kaxil

1.1.0b1

c91756e

1.1.0b1 Pre-release

Pre-release

Features

Add support for Redshift (#639, #753, #700)

Support for Datasets introduced in Airflow 2.4 (#786, #808)

inlets and outlets will be automatically set for all the operators.

Users can now schedule DAGs on File and Table objects. Example:

input_file = File(
    path="https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb_v2.csv"
)
imdb_movies_table = Table(name="imdb_movies", conn_id="sqlite_default")
top_animations_table = Table(name="top_animation", conn_id="sqlite_default")
START_DATE = datetime(2022, 9, 1)


@aql.transform()
def get_top_five_animations(input_table: Table):
    return """
        SELECT title, rating
        FROM {{input_table}}
        WHERE genre1='Animation'
        ORDER BY rating desc
        LIMIT 5;
    """


with DAG(
    dag_id="example_dataset_producer",
    schedule=None,
    start_date=START_DATE,
    catchup=False,
) as load_dag:
    imdb_movies = aql.load_file(
        input_file=input_file,
        task_id="load_csv",
        output_table=imdb_movies_table,
    )

with DAG(
    dag_id="example_dataset_consumer",
    schedule=[imdb_movies_table],
    start_date=START_DATE,
    catchup=False,
) as transform_dag:
    top_five_animations = get_top_five_animations(
        input_table=imdb_movies_table,
        output_table=top_animations_table,
    )

Dynamic Task Templates: Tasks that can be used with Dynamic Task Mapping (Airflow 2.3+)
- Get list of files from a Bucket - get_file_list (#596)
- Get list of values from a DB - get_value_list (#673)
Create upstream_tasks parameter for dependencies independent of data transfers (#585)

Bug fixes

Add response_size to run_raw_sql and warn about db thrashing (#815)

Docs

Add section explaining table metadata (#774)
Fix docstring for run_raw_sql (#817)
Add missing docs for Table class (#788)
Add the readme.md example dag to example dags folder (#681)
Add reason for enabling XCOM pickling (#747)

Assets 2

24 Aug 17:49

sunank200

1.0.2

f894dfd

1.0.2

Bug fixes

Skip folders while processing paths in load_file operator when file pattern is passed. #733

Misc

Limit Google Protobuf for compatibility with bigquery client. #742

Assets 2

23 Aug 19:54

sunank200

1.0.1

aab94ff

1.0.1

Bug fixes

Added a check to create table only when if_exists is replace in aql.load_file for snowflake. #729
Fix the file type for NDJSON file in Data transfer job in AWS S3 to Google BigQuery. #724
Create a new version of imdb.csv with lowercase column names and update the examples to use it, so this change is backwards-compatible. #721, #727
Skip folders while processing paths in load_file operator when file patterns is passed. #733

Docs

Updated the Benchmark docs for GCS to Snowflake and S3 to Snowflake of aql.load_file #712 #707
Restructured the documentation in the project.toml, quickstart, readthedocs and README.md #698, #704, #706
Make astro-sdk-python compatible with major version of Google Providers. #703

Misc

Consolidate the documentation requirements for sphinx. #699
Add CI/CD triggers on release branches with dependency on tests. #672

cc: @kaxil @tatiana @dimberman @utkarsharma2 @sunank200 @pankajastro @pankajkoti @vikramkoka

Contributors

tatiana, dimberman, and 6 other contributors

Assets 2

18 Aug 19:24

kaxil

1.0.0

579b87f

1.0.0

Summary

Features

Improved the performance of aql.load_file by supporting database-specific (native) load methods.
This is now the default behaviour. Previously, the Astro SDK Python would always use Pandas to load files to
SQL databases which passed the data to worker node which slowed the performance. (#557, #481)

Introduced new arguments to aql.load_file:
- use_native_support for data transfer if available on the destination (defaults to use_native_support=True)
- native_support_kwargs is a keyword argument to be used by method involved in native support flow.
- enable_native_fallback can be used to fall back to default transfer(defaults to enable_native_fallback=True).
Now, there are three modes:
- Native: Default, uses Bigquery Load Job in the
  case of BigQuery and Snowflake COPY INTO
  using external stage in the case of Snowflake.
- Pandas: This is how datasets were previously loaded. To enable this mode, use the argument
  use_native_support=False in aql.load_file.
- Hybrid: This attempts to use the native strategy to load a file to the database and if native strategy(i)
  fails , fallback to Pandas (ii) with relevant log warnings. #557
Allow users to specify the table schema (column types) in which a file is being loaded by using table.columns.
If this table attribute is not set, the Astro SDK still tries to infer the schema by using Pandas
(which is previous behaviour).#532
Add Example DAG for Dynamic Map Task with Astro-SDK.
#377,airflow-2.3.0

Breaking Change

The aql.dataframe argument identifiers_as_lower (which was boolean, with default set to False)
was replaced by the argument columns_names_capitalization (string within possible values
["upper", "lower", "original"], default is lower).#564
The aql.load_file before would change the capitalization of all column titles to be uppercase, by default,
now it makes them lowercase, by default. The old behaviour can be achieved by using the argument
columns_names_capitalization="upper". #564
aql.load_file attempts to load files to BigQuery and Snowflake by using native methods, which may have
pre-requirements to work. To disable this mode, use the argument use_native_support=False in aql.load_file.
#557, #481
aql.dataframe will raise an exception if the default Airflow XCom backend is being used.
To solve this, either use an external XCom backend, such as S3 or GCS
or set the configuration AIRFLOW__ASTRO_SDK__DATAFRAME_ALLOW_UNSAFE_STORAGE=True. #444
Change the declaration for the default Astro SDK temporary schema from using AIRFLOW__ASTRO__SQL_SCHEMA
to AIRFLOW__ASTRO_SDK__SQL_SCHEMA #503
Renamed aql.truncate to aql.drop_table #554

Bug fixes

Fix missing airflow's task terminal states to CleanupOperator #525
Allow chaining aql.drop_table (previously truncate) tasks using the Task Flow API syntax. #554, #515

Enhancements

Improved the performance of aql.load_file for files for below:
- From AWS S3 to Google BigQuery up to 94%. #429, #568
- From Google Cloud Storage to Google BigQuery up to 93%. #429, #562
- From AWS S3/Google Cloud Storage to Snowflake up to 76%. #430, #544
- From GCS to Postgres in K8s up to 93%. #428, #531
Get configurations via Airflow Configuration manager. #503
Change catching ValueError and AttributeError to DatabaseCustomError #595
Unpin pandas upperbound dependency #620
Remove markupsafe from dependencies #623
Added extend_existing to Sqla Table object #626
Move config to store DF in XCom to settings file #537
Make the operator names consistent #634
Use exc_info for exception logging #643
Update query for getting bigquery table schema #661
Use lazy evaluated Type Annotations from PEP 563 #650
Provide Google Cloud Credentials env var for bigquery #679
Handle breaking changes for Snowflake provide version 3.2.0 and 3.1.0 #686

What's Changed (Full Changelog)

Get configs via Airflow Configuration manager by @utkarsharma2 in #505
Load files from GCS to Bigquery using BigqueryHook by @utkarsharma2 in #489
Fix benchmark permissions by @tatiana in #513
Fix benchmark to work without table metadata by @tatiana in #514
Add performance result to load_file to snowflake using Python SDK 0.11.0 by @sunank200 in #480
Fix docs sidebar by @dimberman in #517
Benchmark postgres by @dimberman in #510
Rename BaseSQLOperator class by @dimberman in #518
Read default config if airflow's isn't defined by @utkarsharma2 in #520
Add Snowflake stage methods by @tatiana in #523
Fix broken link to tutorial in README by @jlaneve in #526
Simplify debugging issues when building docs by @tatiana in #527
create a possible solution to users passing large dataframes between … by @dimberman in #522
Use native path for S3 to Bigquery in load_file operator by @utkarsharma2 in #519
Update links in Contribution Guidelines section in README by @josh-fell in #536
Add CI job to check for dead links by @kaxil in #528
Adjust storage integration so it is consistent for AWS and GCP by @tatiana in #539
improve docs to @kaxil and @tatiana's comments by @dimberman in #521
Refactor table creation in load_file by @tatiana in #538
Add missing task terminal states to CleanUp Operator by @utkarsharma2 in #540
Optimize postgres performance by @dimberman in #531
Benchmark reporting to expose the information from GCS to Markdown by @sunank200 in #547
Refactor db.load_file_to_table, make json config optional by @tatiana in #549
Allow running tests on PRs from forks + label by @kaxil in #546
Add native path from local to bigquery by @utkarsharma2 in #535
Fix MyPY issue of 'path' and 'conn_id' property of Class File by @utkarsharma2 in #545
Improve (Sphinx) gitignore by @tatiana in #548
Handle nrows for export_to_dataframe() by @utkarsharma2 in #559
Add benchmarking result from GCS to Bigquery after optimization by @sunank200 in #563
Optimize Snowflake load_file using native COPY INTO by @tatiana in #544
Fix DistutilsOptionError #570 by @tatiana in #571
Add benchmarking results for S3 to Bigquery transfer by @utkarsharma2 in #568
...

Contributors

tatiana, dimberman, and 6 other contributors

Assets 2

17 Aug 11:27

utkarsharma2

0.11.1

dc989ce

0.11.1

Bug fix:

Pass operator kwargs to dataframe decorator #630

Assets 2

27 Jul 10:48

sunank200

1.0.0b1

3cd86da

1.0.0b1

Feature:

Improved the performance of aql.load_file by supporting database-specific (native) load methods. This is now the default behaviour. Previously, the Astro SDK Python would always use Pandas to load files to SQL databases which passed the data to worker node which slowed the performance. #557, #481

Introduced new arguments to aql.load_file:
- use_native_support for data transfer if available on the destination (defaults to use_native_support=True)
- native_support_kwargs is a keyword argument to be used by method involved in native support flow.
- enable_native_fallback can be used to fall back to default transfer(defaults to enable_native_fallback=True).
Now, there are three modes:
- Native: Default, uses Bigquery Load Job in the case of BigQuery and Snowflake COPY INTO using external stage in the case of Snowflake.
- Pandas: This is how datasets were previously loaded. To enable this mode, use the argument use_native_support=False in aql.load_file.
- Hybrid: This attempts to use the native strategy to load a file to the database and if native strategy(i) fails , fallback to Pandas (ii) with relevant log warnings.
Allow users to specify the table schema (column types) in which a file is being loaded by using table.columns. If this table attribute is not set, the Astro SDK still tries to infer the schema by using Pandas (which is previous behaviour).#532
Implement fallback mechanism in case native support fails to default option with log warning for problem with native support. #557
Add Example DAG for Dynamic Map Task with Astro-SDK. #377,airflow-2.3.0

Community:

Allow running tests on PRs from forks + label #179

Breaking Change:

The aql.dataframe argument identifiers_as_lower (which was boolean, with default set to False) was replaced by the argument columns_names_capitalization (string within possible values ["upper", "lower", "original"], default is lower).#564
The aql.load_file before would change the capitalization of all column titles to be uppercase, by default, now it makes them lowercase, by default. The old behaviour can be achieved by using the argument columns_names_capitalization="upper". #564
aql.load_file attempts to load files to BigQuery and Snowflake by using native methods, which may have pre-requirements to work. To disable this mode, use the argument use_native_support=False in aql.load_file. #557, #481
aql.dataframe will raise an exception if the default Airflow XCom backend is being used. To solve this, either use an external XCom backend, such as S3 or GCS or set the configuration AIRFLOW__ASTRO_SDK__DATAFRAME_ALLOW_UNSAFE_STORAGE=True. #444
Change the declaration for the default Astro SDK temporary schema from using AIRFLOW__ASTRO__SQL_SCHEMA to AIRFLOW__ASTRO_SDK__SQL_SCHEMA #503
Renamed aql.truncate to aql.drop_table #554

Bug fix:

Fix missing airflow's task terminal states to CleanupOperator #525
Allow chaining aql.drop_table (previously truncate) tasks using the Task Flow API syntax. #554, #515

Enhancement:

Improved the performance of aql.load_file for files from AWS S3 to Google BigQuery up to 94%. #429, #568
Improved the performance of aql.load_file for files from Google Cloud Storage to Google BigQuery up to 93%. #429, #562
Improved the performance of aql.load_file for files from AWS S3/Google Cloud Storage to Snowflake up to 76%. #430, #544
Improved the performance of aql.load_file for files from GCS to Postgres in K8s up to 93%. #428, #531
Fix sphinx docs sidebar #472
Get configurations via Airflow Configuration manager. #503
Add CI job to check for dead links #526

@tatiana @kaxil @dimberman @utkarsharma2 @sunank200 @pankajastro @jlaneve @guohui-gao @mikeshwe @vikramkoka

Contributors

tatiana, dimberman, and 8 other contributors

Assets 2

05 Jul 10:39

utkarsharma2

0.11.0

be6280d

0.11.0

Feature:

Added Cleanup operator to clean temporary tables #187 #436

Internals:

Added a Pull Request template #205
Added sphinx documentation for readthedocs #276 #472

Enhancement:

Fail LoadFile operator when input_file does not exist #467
Create scripts to launch benchmark testing to Google cloud #432
Bump Google Provider for google extra #294

Assets 2

21 Jun 11:31

sunank200

0.10.0

f9cd9c2

0.10.0

Feature:

Allow list and tuples as columns names in Append & Merge Operators #343, #435

Breaking Change:

aql.merge interface changed. Argument merge_table changed to target_table, target_columns and merge_column combined to column argument, merge_keys is changed to target_conflict_columns, conflict_strategy is changed to if_conflicts. More details can be found at 422, #466

Enhancement:

Document (new) load_file benchmark datasets #449
Made improvement to benchmark scripts and configurations #458, #434, #461, #460, #437, #462
Performance evaluation for loading datasets with Astro Python SDK 0.9.2 into BigQuery #437

@tatiana @kaxil @utkarsharma2 @dimberman @sunank200 @mikeshwe @vikramkoka

Contributors

tatiana, dimberman, and 5 other contributors

Assets 2

13 Jun 14:18

tatiana

0.9.2

52bd322

0.9.2

Bug fix:

Change export_file to return File object #454 . Reported by @jlaneve, fixed by @dimberman

Contributors

dimberman and jlaneve

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features

Improvements

Bug fixes

Docs

Features

Bug fixes

Docs

Bug fixes

Misc

Bug fixes

Docs

Misc

Contributors

Summary

Features

Breaking Change

Bug fixes

Enhancements

What's Changed (Full Changelog)

Contributors

Contributors

Contributors

Contributors

Releases: astronomer/astro-sdk

1.1.0b2

Features

Improvements

Bug fixes

Docs

1.1.0b1

Features

Bug fixes

Docs

1.0.2

Bug fixes

Misc

1.0.1

Bug fixes

Docs

Misc

Contributors

1.0.0

Summary

Features

Breaking Change

Bug fixes

Enhancements

What's Changed (Full Changelog)

Contributors

0.11.1

1.0.0b1

Contributors

0.11.0

0.10.0

Contributors

0.9.2

Contributors