Skip to content
This repository has been archived by the owner on Apr 11, 2024. It is now read-only.

Support for API dataset #14

Open
9 tasks
sunank200 opened this issue Feb 17, 2023 · 1 comment
Open
9 tasks

Support for API dataset #14

sunank200 opened this issue Feb 17, 2023 · 1 comment

Comments

@sunank200
Copy link
Collaborator

sunank200 commented Feb 17, 2023

Please describe the feature you'd like to see

  • Update interfaces

    • Use Airflow 2.4 Dataset concept to build more types of Datasets:
      • API
    • API DataProviders
      • Add interface for API DataProvider.
      • Add interface for APIProviders.
      • Add read and write methods in APIDataProviders with the context manager.
  • Non-native transfers

    • Add a transfer workflow for API to S3/GCS using a non-native approach.
    • Add a transfer workflow for API to Pandas Dataframe using a non-native approach.
    • Add a transfer workflow for API to Database (Snowflake/Sqlite) using a non-native approach.
    • Add example DAG for all above

Acceptance Criteria

  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)
@utkarsharma2
Copy link
Collaborator

We are not handling the pagination or the auth in this iteration. We will make a GET call to the API and save the data to GCS/S3/Local filesystems.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants