-
-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/airtable include fields #1631
Feature/airtable include fields #1631
Conversation
… - this helps reduce the amount of data fetched by the DocumentLoader when there are massive numbers of fields in an Airtable table.
… to add logging to figure out why this happens
…ble URL to use the /listRecords endpoint. I didn't RTFM clearly. This is currently documented here: https://support.airtable.com/docs/enforcement-of-url-length-limit-for-web-api-requests
…ad more than 100 records but not all of them, they can. Currently, there's a bug where the document loader doesn't work on loading more than 100 records because of internal Airtable API limitations. The Airtable API can only fetch up to 100 records per call - anything more than that requires pagination.
…ords you want across multiple API calls. If the maxRecords is greater than 100, then it will provide pagination hints accordingly.
…e the intention is the load *all* of the records. Also, marking both the Return All and Limit params as optional, so as to not confuse the user. Making them both required adds a lot of confusion that doesn't make sense. Ideally, the user either specifies Return All OR specifies the Limit value but not BOTH. It seems there's no way to define "conditional requirements" in Flowise params, so it's better to make both params optional.
wow this is awesome!! thanks a lot for the improvement! |
Hello, |
Hi @doumlegare , the DocumentLoader node doesn't support syncing Airtable record updates into any vector DB at this time. This is a known limitation within Flowise. In short, any documents you load through this automation will be added into the vector DB, always. If you had already loaded an earlier version of the document (from Airtable) into the vector DB and you re-run it again, you will end up with 2 documents in the vector DB (duplicates). For now, you have to implement a mechanism outside of Flowise to keep track of which documents were already added to the vector DB and which documents need to be updated in the vector DB. Long term fix is discussed here: |
Thanks for your fast response @dkindlund |
@doumlegare , I'm basically using it similar to how this video describes using Airtable: And yeah, I'm having to implement my own tracking mechanism outside of Flowise to make sure I'm not upserting duplicate documents into the vector store DB. |
Signed-off-by: Carson Yang <[email protected]>
Hey @HenryHengZJ , this PR covers a bunch of quality of life improvements to the Airtable DocumentLoader node, including:
Support for Including Only Specific Fields from the Airtable table:
Include Only Fields
can be a comma-separated list of field names or field IDs. If the field name itself contains commas, then it's recommended to only use the field ID instead of the name.GET
to aPOST
for fetching from Airtable -- this is because if you want to include more than about 50 fields, you would hit the max limit of the number of characters that can be put into the URL of aGET
request before Airtable throws errors. Their recommended workaround is to switch fromGET
toPOST
accordingly, so that you can specify an unlimited number of include fields -- reference: https://support.airtable.com/docs/enforcement-of-url-length-limit-for-web-api-requestsQuality of LIfe Improvements are:
Return All
andLimit
fields optional instead of both required. Before, this was confusing to the user because it wasn't clear WHY a user needs to specify any sort ofLimit
at all, ifReturn All
is true.Return All
is set totrue
, then theLimit
field is ignored completely (previously, there was a bug where even ifReturn All
wastrue
it would still only process up to the number of records specified byLimit
-- which was really confusing to the user).Limit
of greater than 100 records (like 264), this code will actually load all 264 records successfully (previously, there was a bug in theloadLimit()
function because there was no pagination logic implemented -- so it would only load up to 100 records from Airtable -- which is the maximum number of records that can be returned from a single Airtable API call).Hopefully, this code makes sense -- I've been using it for a couple of days now and it seems to be working for me well. Let me know if you have any questions. Thanks!