-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
129 Adds verified source: airtable #218
Conversation
Open questions How should users refer to Airtable tables?I see two options:
There are two use cases for names:
Additional Write Dispositions?I'm not sure how to support additional write dispositions, I'd have to study the sql_database code more deeply. If you have any pointers I'd be happy to hear. |
16efca7
to
4d20868
Compare
I'd also appreciate help on how to fix the following linting error:
The function returns a call to |
Currently, this draft PR has at least two implementation problems which I'm stuck with because they seem to be related to what's going on deep in the core:
I've been testing with this Airtable table provided by dlt hub as well as this quickstart template provided from Airtable. In the latter, each table name starts with an emoji. Question: Should the source connector handle emojis in resource names or is this something for the dlt framework? 1. Warnings about incomplete columnsAfter deleting my local duckdb database and run the pipeline using
2. Running the pipeline repeatedly creates runtime errors – due to emojis in column namesThen, running the pipeline again produces:
|
btw. if the code stays as simple as it is right now almost anyone can hack it and use ids for names.
Everything is on good track. you already add primary keys which is essential step. in What I'd suggest would be
|
return dlt.resource(
pyairtable.Table(access_token, base_id, table.get("id")).iterate(),
name=table.get("name"), # TODO: clarify if stable id or user-chosen name is preferred
primary_key=[primary_key_field["name"]],
write_disposition="replace",
) here
|
thanks for spotting this! this is simply very interesting. I'll dig deeper. try to reproduce and fill a bug for it. most probably I expect that schema content is an ascii string - and it should be! emojis should be normalized to |
@rudolfix
Thank you! Yes, at the first load they get indeed normalized to Here is a screenshot after the first run: I think I can make a minimal example reproducing this and file a bug. |
@willi-mueller just come to my mind: should we add some metadata to each row coming from Airtable. do they have any internal unique id? is the row number important? |
yes, every row (record) has an internal unique id. It seems to exist in the destination. They start with "rec". See first column: Or did you have something else in mind with "add some metadata"? |
regarding the warning with the binding: for every table, dlt complains about the first column. That's a pattern. |
@willi-mueller I've managed to reproduce encoding error and the fix is coming dlt-hub/dlt#509 |
3b9cb44
to
55cfb9a
Compare
@rudolfix I'd appreciate your help in getting the CI checks to pass. It seems that something is wrong with the way I import the third-party dependency
Still, the mypy checks fail. I'd like to hear your thoughts wether: |
@willi-mueller you have to add this dependency to dev dependency group with poetry as described here https://github.com/dlt-hub/verified-sources/blob/master/CONTRIBUTING.md#source-specific-dependencies-requirements-file And you are right by thinking that requirements in pipeline should be enough. This is actually really good idea (we started with the groups before there were any requirements for pipelines). |
9edbf8c
to
9e5bb4d
Compare
Oh, fantastic! Thank you for the pointer to the docs. I'm sorry I haven't seen that before. Still, the linter & test runner errors are unchanged.|
Test runner:
I'd appreciate advice on how we can move forward. Thank you! |
@willi-mueller you are doing everything right. it is poetry not adding your dependency to pyproject.toml: |
I read the comments on the issue you linked. I can test it, make the PR, and only reach out if the issue is more complex.
You must have already tons of things on your head.
Aug 14, 2023 03:35:34 rudolfix ***@***.***>:
…
@willi-mueller[https://github.com/willi-mueller] you are doing everything right. it is poetry not adding your dependency to pyproject.toml:
python-poetry/poetry#7230[python-poetry/poetry#7230]
in our case it was *black* linter config added not in the right price. I fixed our pyproject.toml and will push to github then you can do *poetry add pyairtable --group=airtable*. I will ping you when done
thx for finding another problem :)
—
Reply to this email directly, view it on GitHub[#218 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AABVG7GTQ5UWAWP3OLRPGXDXVFFSXANCNFSM6AAAAAA2Q5QFIE].
You are receiving this because you were mentioned.[Tracking image][https://github.com/notifications/beacon/AABVG7EGBQ2DCJVRQ2DB2ZDXVFFSXA5CNFSM6AAAAAA2Q5QFIGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTD5T3VW.gif]
|
OK! here's PR fixing blake but it may take a while to merge it https://github.com/dlt-hub/verified-sources/pull/243/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711 |
@willi-mueller problem in |
b49395c
to
67dc58f
Compare
🥳 Yes, it fixes most of the CI runs! Now, in the dlthub repo the tests are red due to the missing airtable secrets. I'll reach out on slack. I'm not sure if I understood your intention in #243: I added |
dlt complain that it does not see your access token... you should have something like this access_token="...." or [sources.airtable]
access_token="...." please make sure that you did one of the above. toml has it quirks ie. if you provide access token as in first example - it must be at very top of the file above any table (square brackets) btw. maybe you've found another bug, let's see :) |
Thank you for your quick reply. Here is how I set it up after having examined the contribution docs. My github actions secret for my fork: This is my Can you spot a mistake? Would I be able to reproduce this behavior in my fork by doing:
|
@willi-mueller absolutely. poke the facebook ads maybe and I'll give you a refresh token on slack :). I cannot really find any problem with your code. if our experiment with facebook does not work I'll try it myself from my priv account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willi-mueller this looks good! I have a few small suggestions to try. see review
81e40bd
to
a257313
Compare
- creates source for a base which returns resources for airtables - supports whitelisting airtables by names or ids - implements test suite calling live API - README for Airtable source improves typing
a257313
to
79685c0
Compare
…ence warnings and improve documentation refactoring according to latest recommendations of pyairtable simplify airtable filters by following the paradigm of the pyairtable API which treats table names and table IDs equally updates pyairtable dependency
79685c0
to
ea6a41f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is excellent! thanks!
Tell us what you do here
Relevant issue
#129
More PR info
TODO Before Merge
Step 1
Step 2
Step 3:
Share the table as read-only with me or with the public so that I can point the references in our test suite and example pipeline to it.
Thank you!
Future work
implement an API similar to the facebook_ads connector which allows users to specify column names and datatypes of their Airtables.