Feature/airtable include fields #1631

dkindlund · 2024-01-29T05:15:02Z

Hey @HenryHengZJ , this PR covers a bunch of quality of life improvements to the Airtable DocumentLoader node, including:

Support for Including Only Specific Fields from the Airtable table:
- This is huge if your source Airtable has like 100+ fields but you only want like 5 fields to actually be stored in the Vector DB.
- Without this feature, you're forcing any Airtable users who have 100+ fields to essentially sync their 5 fields they want to store in a Vector DB to first replicate those 5 fields over to a separate, intermediate Airtable base (which gets expensive quickly).
- Include Only Fields can be a comma-separated list of field names or field IDs. If the field name itself contains commas, then it's recommended to only use the field ID instead of the name.
- Switched the GET to a POST for fetching from Airtable -- this is because if you want to include more than about 50 fields, you would hit the max limit of the number of characters that can be put into the URL of a GET request before Airtable throws errors. Their recommended workaround is to switch from GET to POST accordingly, so that you can specify an unlimited number of include fields -- reference: https://support.airtable.com/docs/enforcement-of-url-length-limit-for-web-api-requests
Quality of LIfe Improvements are:
- Checking to make sure the Base ID and Table ID are specified (at a minimum).
- Made the Return All and Limit fields optional instead of both required. Before, this was confusing to the user because it wasn't clear WHY a user needs to specify any sort of Limit at all, if Return All is true.
- Now, we added explicit logic such that when Return All is set to true, then the Limit field is ignored completely (previously, there was a bug where even if Return All was true it would still only process up to the number of records specified by Limit -- which was really confusing to the user).
- Lastly, if you specify a Limit of greater than 100 records (like 264), this code will actually load all 264 records successfully (previously, there was a bug in the loadLimit() function because there was no pagination logic implemented -- so it would only load up to 100 records from Airtable -- which is the maximum number of records that can be returned from a single Airtable API call).

Hopefully, this code makes sense -- I've been using it for a couple of days now and it seems to be working for me well. Let me know if you have any questions. Thanks!

…f field ids

…AI node

…penAI-LLM-Types

… - this helps reduce the amount of data fetched by the DocumentLoader when there are massive numbers of fields in an Airtable table.

… to add logging to figure out why this happens

…ble URL to use the /listRecords endpoint. I didn't RTFM clearly. This is currently documented here: https://support.airtable.com/docs/enforcement-of-url-length-limit-for-web-api-requests

…ad more than 100 records but not all of them, they can. Currently, there's a bug where the document loader doesn't work on loading more than 100 records because of internal Airtable API limitations. The Airtable API can only fetch up to 100 records per call - anything more than that requires pagination.

…ords you want across multiple API calls. If the maxRecords is greater than 100, then it will provide pagination hints accordingly.

…e the intention is the load *all* of the records. Also, marking both the Return All and Limit params as optional, so as to not confuse the user. Making them both required adds a lot of confusion that doesn't make sense. Ideally, the user either specifies Return All OR specifies the Limit value but not BOTH. It seems there's no way to define "conditional requirements" in Flowise params, so it's better to make both params optional.

…true.

dkindlund · 2024-01-29T05:17:40Z

Screenshot:

HenryHengZJ · 2024-01-29T10:10:19Z

wow this is awesome!! thanks a lot for the improvement!

doumlegare · 2024-01-31T14:42:40Z

Hello,
When you use airtable into flowise you upload it for the first time into vectordb like pineconeand create an automatisation to update the file into db automaticly when a uopdate is made in airtable?
Thanks for sharing your experience

dkindlund · 2024-01-31T17:00:18Z

Hi @doumlegare , the DocumentLoader node doesn't support syncing Airtable record updates into any vector DB at this time. This is a known limitation within Flowise. In short, any documents you load through this automation will be added into the vector DB, always. If you had already loaded an earlier version of the document (from Airtable) into the vector DB and you re-run it again, you will end up with 2 documents in the vector DB (duplicates).

For now, you have to implement a mechanism outside of Flowise to keep track of which documents were already added to the vector DB and which documents need to be updated in the vector DB.

Long term fix is discussed here:
#1638

doumlegare · 2024-01-31T17:28:14Z

Thanks for your fast response @dkindlund
So how do you use Airtable as a database right now?
Do you have a flow exemple?

dkindlund · 2024-01-31T17:37:32Z

@doumlegare , I'm basically using it similar to how this video describes using Airtable:
https://www.youtube.com/watch?v=5U22PUI7jP0

And yeah, I'm having to implement my own tracking mechanism outside of Flowise to make sure I'm not upserting duplicate documents into the vector store DB.

Signed-off-by: Carson Yang <[email protected]>

Darien Kindlund and others added 30 commits December 30, 2023 14:13

Added support to exclude specific Airtable Field Ids

3fb8001

Updated Airtable field exclusion support to use field names instead o…

6006157

…f field ids

Merge branch 'FlowiseAI:main' into main

beefcf1

Added support for gpt-4 and gpt-4-32k models

e88859f

Fixing linting issues using 'yarn lint-fix'

66701ce

Merge branch 'FlowiseAI:main' into main

28bfd41

Made streaming support a configurable option within the AzureChatOpen…

e8c8503

…AI node

Merge remote-tracking branch 'upstream/main' into chore/Update-AzureO…

51a9808

…penAI-LLM-Types

Bumping version

e398247

Removed streaming feature since it broke chatflows

3d2b407

Merge branch 'chore/Update-AzureOpenAI-LLM-Types'

e07f27c

Merge branch 'FlowiseAI:main' into main

27f14ce

Merge branch 'FlowiseAI:main' into main

4d92989

Merge branch 'FlowiseAI:main' into main

c15489c

Merge branch 'FlowiseAI:main' into main

029d5a9

Merge branch 'FlowiseAI:main' into main

b76c3b2

Merge branch 'FlowiseAI:main' into main

dd32a31

Merge branch 'FlowiseAI:main' into main

8ca8e0e

Merge branch 'FlowiseAI:main' into main

6f7b740

Switched to specifying Airtable fields to include rather than exclude…

71f456a

… - this helps reduce the amount of data fetched by the DocumentLoader when there are massive numbers of fields in an Airtable table.

Fixing a bunch of build errors

ae64854

Fixing more build errors

1a7cb5a

Added more error checking and also fixed yet more build errors

72ec787

Clarifying the description for the optional fields param

8ae8481

For some reason, Airtable doesn't like the POST operations, so I need…

456dfab

… to add logging to figure out why this happens

When you switch from GET to POST, you're supposed to adjust the Airta…

3b788e4

…ble URL to use the /listRecords endpoint. I didn't RTFM clearly. This is currently documented here: https://support.airtable.com/docs/enforcement-of-url-length-limit-for-web-api-requests

Fix worked, removing debug logging, and bumped node version.

2237b1a

So Airtable API expects a maxRecords value to be the total set of rec…

dc39d7e

…ords you want across multiple API calls. If the maxRecords is greater than 100, then it will provide pagination hints accordingly.

Darien Kindlund added 3 commits January 28, 2024 20:42

Forgot to make maxRecords optional now

66eef84

Clarifying that the Limit value is ignored when Return All is set to …

b960f06

…true.

Reverting version bump

905c9fc

HenryHengZJ approved these changes Jan 29, 2024

View reviewed changes

HenryHengZJ merged commit 985e454 into FlowiseAI:main Jan 29, 2024
2 checks passed

JohnBQuinn pushed a commit to JohnBQuinn/Flowise that referenced this pull request Jun 7, 2024

Docs: fix zh-cn sitemap (FlowiseAI#1631)

5c8f2f9

Signed-off-by: Carson Yang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/airtable include fields #1631

Feature/airtable include fields #1631

dkindlund commented Jan 29, 2024

dkindlund commented Jan 29, 2024

HenryHengZJ commented Jan 29, 2024

doumlegare commented Jan 31, 2024

dkindlund commented Jan 31, 2024

doumlegare commented Jan 31, 2024

dkindlund commented Jan 31, 2024

Feature/airtable include fields #1631

Feature/airtable include fields #1631

Conversation

dkindlund commented Jan 29, 2024

dkindlund commented Jan 29, 2024

HenryHengZJ commented Jan 29, 2024

doumlegare commented Jan 31, 2024

dkindlund commented Jan 31, 2024

doumlegare commented Jan 31, 2024

dkindlund commented Jan 31, 2024