- Pull the
mage-ai
repository andcd
into it. - Run Mage in a development Docker container using
./scripts/dev.sh [PROJECT NAME]
. This starts Mage and allows us to make changes in realtime. See this page for more details. - Open another terminal and run:
This will open a shell in the Docker container and allow us to interact with the integrations.
docker exec -it mage-ai-server-1 bash
- Uninstall the existing
mage-integrations
package usingpip
:pip3 uninstall -y mage-integrations
cd
intomage_integrations/
.cd mage_integrations/
Run
touch ./mage_integrations/TEST_CATALOG.json &&
touch ./mage_integrations/TEST_CONFIG_S.json &&
touch ./mage_integrations/TEST_CONFIG_D.json &&
touch ./mage_integrations/TEST_STATE.json &&
touch ./mage_integrations/TEST_OUTPUT &&
echo "{}" >> ./mage_integrations/TEST_STATE.json
To create the following files:
TEST_CATALOG.json
TEST_CONFIG_S.json
TEST_CONFIG_D.json
TEST_STATE.json
TEST_OUTPUT
We'll be using these files to test the integration.
Populate TEST_CONFIG_S.json
with a sample configuration, this is found at:
mage_integrations/mage_integrations/sources/[INTEGRATION]/templates/config.json
For the GitHub integration, this is:
{
"access_token": "abcdefghijklmnopqrstuvwxyz1234567890ABCD",
"repository": "mage-ai/mage-ai",
"start_date": "2021-01-01T00:00:00Z",
"request_timeout": 300,
"base_url": "https://api.github.com"
}
Run the following command to discover streams for your source:
python3 mage_integrations/sources/[INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--discover \
--discover_streams
Now, you should see a list of streams that you can sync from the source. Grab a few of interest!
The output from GitHub looks like this:
[
{
"stream": "commits",
"tap_stream_id": "commits"
},
{
"stream": "comments",
"tap_stream_id": "comments"
},
...
]
Now test grabbing schemas for a few streams above and output the data to our catalog file. This should be passed, in string format, as a list of strings, e.g. '["commits", "comments"]'
.
python3 mage_integrations/sources/[INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--discover \
--selected_streams SCHEMAS > mage_integrations/TEST_CATALOG.json
For example, for the GitHub source:
python3 mage_integrations/sources/github/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--discover \
--selected_streams '["commits"]' > mage_integrations/TEST_CATALOG.json
Your catalog will now contain the schemas for the streams you selected. We need to enable a schema now— this is usually handled by the Mage UI, but we can do it ourselves.
For each stream in TEST_CATALOG.json
, find the nested metadata
key and add a "selected": true
:
...
"stream": "commits",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"table-key-properties": [
"sha"
],
"forced-replication-method": "INCREMENTAL",
"valid-replication-keys": "updated_at",
"inclusion": "available",
"selected": true
}
},
...
Additonally, also add to add "selected": true
to at least one column in each stream for SQL sources.
{
"stream": "commits",
"tap_stream_id": "commits",
"selected": true
}
Finally! It's time to test our stream execution. Run the following command to execute the stream and save the output to a file:
python3 mage_integrations/sources/[INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--state mage_integrations/TEST_STATE.json > mage_integrations/TEST_OUTPUT
Check TEST_OUTPUT
to see real-time results!
Now, let's test writing our output to a destination. Populate the destination config file with a sample configuration in a similar manner to the source config, then run:
python3 mage_integrations/destinations/postgresql/__init__.py \
--config mage_integrations/TEST_CONFIG_D.json \
--state mage_integrations/TEST_STATE.json \
--input_file_path mage_integrations/TEST_OUTPUT \
--debug
To write TEST_OUTPUT
to your destination. Note: you will need a sample data source to write to.
This will test pulling from the target and writing to the destination:
python3 mage_integrations/sources/[SOURCE_INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--state mage_integrations/TEST_STATE.json | python3 mage_integrations/destinations/[TARGET_INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_D.json \
--state mage_integrations/TEST_STATE.json \
--debug
For example, an end-to-end GitHub to Postgres data integration:
python3 mage_integrations/sources/github/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--state mage_integrations/TEST_STATE.json | python3 mage_integrations/destinations/postgres/__init__.py \
--config mage_integrations/TEST_CONFIG_D.json \
--state mage_integrations/TEST_STATE.json \
--debug
Once you've tested your tap in the terminal, it's time to test it in Mage.
First, return to your terminal and run pip install -U mage_integrations/
in your mage_integrations
directory. That will build our new mage-integrations
package and make the changes you made available to the UI.
Open up Mage (localhost:3000
in dev) and create a new data integration pipeline:
Select your source from the list:
Now, perform the following in the Mage UI to verify a working source:
- Test the connection
- View and select streams
- Sync one stream to a destination
- If you're adding a tap in a PR, be sure to add logs of the source and show data in the destination table to the PR description.
- If incremental sync is supported, please also test it: check if the state is updated and fetched correctly.
You can count the number of records in your stream with the following command:
python3 mage_integrations/sources/[INTEGRATION]/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--state mage_integrations/TEST_STATE.json \
--count_records \
--selected_streams '["your_stream"]'
Use this template to perform a sample query of your data:
python3 mage_integrations/sources/freshdesk/__init__.py \
--config mage_integrations/TEST_CONFIG_S.json \
--catalog mage_integrations/TEST_CATALOG.json \
--query_json '{"_end_date": null, "_execution_date": "2022-11-17T21:05:53.341319", "_execution_partition": "444/20221117T210443", "_start_date": null, "_limit": 1000, "_offset": 0}' \
--state mage_integrations/TEST_STATE.json