Skip to content

Commit

Permalink
✨ Source Azure Blob Storage: migrate to File-based CDK (#31336)
Browse files Browse the repository at this point in the history
Co-authored-by: davydov-d <[email protected]>
  • Loading branch information
davydov-d and davydov-d authored Oct 13, 2023
1 parent e757fb8 commit 31e8170
Show file tree
Hide file tree
Showing 57 changed files with 1,391 additions and 1,125 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*
!Dockerfile
!main.py
!source_azure_blob_storage
!setup.py
!secrets
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FROM python:3.9-slim as base

# build and load all requirements
FROM base as builder
RUN apt-get update
WORKDIR /airbyte/integration_code

COPY setup.py ./
# install necessary packages to a temporary folder
RUN pip install --prefix=/install .

# build a clean environment
FROM base
WORKDIR /airbyte/integration_code

# copy all loaded and built libraries to a pure basic image
COPY --from=builder /install /usr/local
# add default timezone settings
COPY --from=builder /usr/share/zoneinfo/Etc/UTC /etc/localtime
RUN echo "Etc/UTC" > /etc/timezone

# copy payload code only
COPY main.py ./
COPY source_azure_blob_storage ./source_azure_blob_storage

ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.2.0
LABEL io.airbyte.name=airbyte/source-azure-blob-storage
92 changes: 76 additions & 16 deletions airbyte-integrations/connectors/source-azure-blob-storage/README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,66 @@
# Source Azure Blob Storage
# Azure Blob Storage Source

This is the repository for the Azure Blob Storage source connector in Java.
For information about how to use this connector within Airbyte, see [the User Documentation](https://docs.airbyte.io/integrations/sources/azure-blob-storage).
This is the repository for the Azure Blob Storage source connector, written in Python.
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.com/integrations/sources/azure-blob-storage).

## Local development

### Prerequisites
**To iterate on this connector, make sure to complete this prerequisites section.**

#### Minimum Python version required `= 3.9.0`

#### Build & Activate Virtual Environment and install dependencies
From this connector directory, create a virtual environment:
```
python -m venv .venv
```

This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
development environment of choice. To activate it from the terminal, run:
```
source .venv/bin/activate
pip install -r requirements.txt
```
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.

Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
should work as you expect.

#### Building via Gradle
From the Airbyte repository root, run:
```
./gradlew :airbyte-integrations:connectors:source-azure-blob-storage:build
```

#### Create credentials
**If you are a community contributor**, generate the necessary credentials and place them in `secrets/config.json` conforming to the spec file in `src/main/resources/spec.json`.
Note that the `secrets` directory is git-ignored by default, so there is no danger of accidentally checking in sensitive information.
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/azure-blob-storage)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_azure_blob_storage/spec.yaml` file.
Note that the `secrets` directory is gitignored by default, so there is no danger of accidentally checking in sensitive information.
See `integration_tests/sample_config.json` for a sample config file.

**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source azure-blob-storage test creds`
and place them into `secrets/config.json`.

**If you are an Airbyte core member**, follow the [instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci) to set up the credentials.
### Locally running the connector
```
python main.py spec
python main.py check --config secrets/config.json
python main.py discover --config secrets/config.json
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
```

### Locally running the connector docker image

#### Build
Build the connector image via Gradle:
First, make sure you build the latest Docker image:
```
docker build . -t airbyte/source-azure-blob-storage:dev
```

You can also build the connector image via Gradle:
```
./gradlew :airbyte-integrations:connectors:source-azure-blob-storage:airbyteDocker
```
Expand All @@ -35,30 +75,50 @@ docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-azure-blob-storage:dev
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-azure-blob-storage:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-azure-blob-storage:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
```

## Testing
We use `JUnit` for Java tests.

### Unit and Integration Tests
Place unit tests under `src/test/...`
Place integration tests in `src/test-integration/...`
Make sure to familiarize yourself with [pytest test discovery](https://docs.pytest.org/en/latest/goodpractices.html#test-discovery) to know how your test files and methods should be named.
First install test dependencies into your virtual environment:
```
pip install .[tests]
```
### Unit Tests
To run unit tests locally, from the connector directory run:
```
python -m pytest unit_tests
```

### Integration Tests
There are two types of integration tests: Acceptance Tests (Airbyte's test suite for all source connectors) and custom integration tests (which are specific to this connector).
#### Custom Integration tests
Place custom tests inside `integration_tests/` folder, then, from the connector root, run
```
python -m pytest integration_tests
```
#### Acceptance Tests
Airbyte has a standard test suite that all source connectors must pass. Implement the `TODO`s in
`src/test-integration/java/io/airbyte/integrations/sources/AzureBlobStorageSourceAcceptanceTest.java`.
Customize `acceptance-test-config.yml` file to configure tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
To run your integration tests with acceptance tests, from the connector root, run
```
python -m pytest integration_tests -p integration_tests.acceptance
```
To run your integration tests with docker

### Using gradle to run tests
All commands should be run from airbyte project root.
To run unit tests:
```
./gradlew :airbyte-integrations:connectors:source-azure-blob-storage:unitTest
./gradlew :airbyte-integrations:connectors:source-azure-blob-storage:check
```
To run acceptance and custom integration tests:
```
./gradlew :airbyte-integrations:connectors:source-azure-blob-storage:integrationTest
```

## Dependency Management
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
We split dependencies between two groups, dependencies that are:
* required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
* required for the testing need to go to `TEST_REQUIREMENTS` list

### Publishing a new version of the connector
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,137 @@
# See [Source Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/source-acceptance-tests-reference)
# for more information about how to configure these tests
connector_image: airbyte/source-azure-blob-storage:dev
acceptance-tests:
acceptance_tests:
basic_read:
tests:
- config_path: secrets/config.json
expect_records:
path: integration_tests/expected_records/csv.jsonl
exact_order: true
- config_path: secrets/csv_custom_encoding_config.json
expect_records:
path: integration_tests/expected_records/csv_custom_encoding.jsonl
exact_order: true
- config_path: secrets/csv_custom_format_config.json
expect_records:
path: integration_tests/expected_records/csv_custom_format.jsonl
exact_order: true
- config_path: secrets/csv_user_schema_config.json
expect_records:
path: integration_tests/expected_records/csv_user_schema.jsonl
exact_order: true
- config_path: secrets/csv_no_header_config.json
expect_records:
path: integration_tests/expected_records/csv_no_header.jsonl
exact_order: true
- config_path: secrets/csv_skip_rows_config.json
expect_records:
path: integration_tests/expected_records/csv_skip_rows.jsonl
exact_order: true
- config_path: secrets/csv_skip_rows_no_header_config.json
expect_records:
path: integration_tests/expected_records/csv_skip_rows_no_header.jsonl
exact_order: true
- config_path: secrets/csv_with_nulls_config.json
expect_records:
path: integration_tests/expected_records/csv_with_nulls.jsonl
exact_order: true
- config_path: secrets/csv_with_null_bools_config.json
expect_records:
path: integration_tests/expected_records/csv_with_null_bools.jsonl
exact_order: true
- config_path: secrets/parquet_config.json
expect_records:
path: integration_tests/expected_records/parquet.jsonl
exact_order: true
- config_path: secrets/avro_config.json
expect_records:
path: integration_tests/expected_records/avro.jsonl
exact_order: true
- config_path: secrets/jsonl_config.json
expect_records:
path: integration_tests/expected_records/jsonl.jsonl
exact_order: true
- config_path: secrets/jsonl_newlines_config.json
expect_records:
path: integration_tests/expected_records/jsonl_newlines.jsonl
exact_order: true
connection:
tests:
- config_path: secrets/config.json
status: succeed
- config_path: secrets/csv_custom_encoding_config.json
status: succeed
- config_path: secrets/csv_custom_format_config.json
status: succeed
- config_path: secrets/csv_user_schema_config.json
status: succeed
- config_path: secrets/csv_no_header_config.json
status: succeed
- config_path: secrets/csv_skip_rows_config.json
status: succeed
- config_path: secrets/csv_skip_rows_no_header_config.json
status: succeed
- config_path: secrets/csv_with_nulls_config.json
status: succeed
- config_path: secrets/csv_with_null_bools_config.json
status: succeed
- config_path: secrets/parquet_config.json
status: succeed
- config_path: secrets/avro_config.json
status: succeed
- config_path: secrets/jsonl_config.json
status: succeed
- config_path: secrets/jsonl_newlines_config.json
status: succeed
discovery:
tests:
- config_path: secrets/config.json
- config_path: secrets/csv_custom_encoding_config.json
- config_path: secrets/csv_custom_format_config.json
- config_path: secrets/csv_user_schema_config.json
- config_path: secrets/csv_no_header_config.json
- config_path: secrets/csv_skip_rows_config.json
- config_path: secrets/csv_skip_rows_no_header_config.json
- config_path: secrets/csv_with_nulls_config.json
- config_path: secrets/csv_with_null_bools_config.json
- config_path: secrets/parquet_config.json
- config_path: secrets/avro_config.json
- config_path: secrets/jsonl_config.json
- config_path: secrets/jsonl_newlines_config.json
full_refresh:
tests:
- config_path: secrets/config.json
configured_catalog_path: integration_tests/configured_catalogs/csv.json
- config_path: secrets/parquet_config.json
configured_catalog_path: integration_tests/configured_catalogs/parquet.json
- config_path: secrets/avro_config.json
configured_catalog_path: integration_tests/configured_catalogs/avro.json
- config_path: secrets/jsonl_config.json
configured_catalog_path: integration_tests/configured_catalogs/jsonl.json
- config_path: secrets/jsonl_newlines_config.json
configured_catalog_path: integration_tests/configured_catalogs/jsonl.json
incremental:
tests:
- config_path: secrets/config.json
configured_catalog_path: integration_tests/configured_catalogs/csv.json
future_state:
future_state_path: integration_tests/abnormal_states/csv.json
- config_path: secrets/parquet_config.json
configured_catalog_path: integration_tests/configured_catalogs/parquet.json
future_state:
future_state_path: integration_tests/abnormal_states/parquet.json
- config_path: secrets/avro_config.json
configured_catalog_path: integration_tests/configured_catalogs/avro.json
future_state:
future_state_path: integration_tests/abnormal_states/avro.json
- config_path: secrets/jsonl_config.json
configured_catalog_path: integration_tests/configured_catalogs/jsonl.json
future_state:
future_state_path: integration_tests/abnormal_states/jsonl.json
- config_path: secrets/jsonl_newlines_config.json
configured_catalog_path: integration_tests/configured_catalogs/jsonl.json
future_state:
future_state_path: integration_tests/abnormal_states/jsonl_newlines.json
spec:
tests:
- spec_path: "main/resources/spec.json"
- spec_path: integration_tests/spec.json
connector_image: airbyte/source-azure-blob-storage:dev
test_strictness_level: low
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
#!/usr/bin/env sh

source "$(git rev-parse --show-toplevel)/airbyte-integrations/bases/connector-acceptance-test/acceptance-test-docker.sh"

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[
{
"type": "STREAM",
"stream": {
"stream_state": {
"_ab_source_file_last_modified": "2999-01-01T00:00:00.000000Z_test_sample.avro",
"history": { "test_sample.avro": "2999-01-01T00:00:00.000000Z" }
},
"stream_descriptor": { "name": "airbyte-source-azure-blob-storage-test" }
}
}
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[
{
"type": "STREAM",
"stream": {
"stream_state": {
"_ab_source_file_last_modified": "2999-01-01T00:00:00.000000Z_simple_test.csv",
"history": { "simple_test.csv": "2999-01-01T00:00:00.000000Z" }
},
"stream_descriptor": { "name": "airbyte-source-azure-blob-storage-test" }
}
}
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[
{
"type": "STREAM",
"stream": {
"stream_state": {
"_ab_source_file_last_modified": "2999-01-01T00:00:00.000000Z_simple_test.jsonl",
"history": { "simple_test.jsonl": "2999-01-01T00:00:00.000000Z" }
},
"stream_descriptor": { "name": "airbyte-source-azure-blob-storage-test" }
}
}
]
Loading

0 comments on commit 31e8170

Please sign in to comment.