Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/airtable include fields #1631

Merged
merged 33 commits into from
Jan 29, 2024

Conversation

dkindlund
Copy link
Contributor

Hey @HenryHengZJ , this PR covers a bunch of quality of life improvements to the Airtable DocumentLoader node, including:

  • Support for Including Only Specific Fields from the Airtable table:

    • This is huge if your source Airtable has like 100+ fields but you only want like 5 fields to actually be stored in the Vector DB.
    • Without this feature, you're forcing any Airtable users who have 100+ fields to essentially sync their 5 fields they want to store in a Vector DB to first replicate those 5 fields over to a separate, intermediate Airtable base (which gets expensive quickly).
    • Include Only Fields can be a comma-separated list of field names or field IDs. If the field name itself contains commas, then it's recommended to only use the field ID instead of the name.
    • Switched the GET to a POST for fetching from Airtable -- this is because if you want to include more than about 50 fields, you would hit the max limit of the number of characters that can be put into the URL of a GET request before Airtable throws errors. Their recommended workaround is to switch from GET to POST accordingly, so that you can specify an unlimited number of include fields -- reference: https://support.airtable.com/docs/enforcement-of-url-length-limit-for-web-api-requests
  • Quality of LIfe Improvements are:

    • Checking to make sure the Base ID and Table ID are specified (at a minimum).
    • Made the Return All and Limit fields optional instead of both required. Before, this was confusing to the user because it wasn't clear WHY a user needs to specify any sort of Limit at all, if Return All is true.
    • Now, we added explicit logic such that when Return All is set to true, then the Limit field is ignored completely (previously, there was a bug where even if Return All was true it would still only process up to the number of records specified by Limit -- which was really confusing to the user).
    • Lastly, if you specify a Limit of greater than 100 records (like 264), this code will actually load all 264 records successfully (previously, there was a bug in the loadLimit() function because there was no pagination logic implemented -- so it would only load up to 100 records from Airtable -- which is the maximum number of records that can be returned from a single Airtable API call).

Hopefully, this code makes sense -- I've been using it for a couple of days now and it seems to be working for me well. Let me know if you have any questions. Thanks!

Darien Kindlund and others added 30 commits December 30, 2023 14:13
… - this helps reduce the amount of data fetched by the DocumentLoader when there are massive numbers of fields in an Airtable table.
… to add logging to figure out why this happens
…ad more than 100 records but not all of them, they can. Currently, there's a bug where the document loader doesn't work on loading more than 100 records because of internal Airtable API limitations. The Airtable API can only fetch up to 100 records per call - anything more than that requires pagination.
…ords you want across multiple API calls. If the maxRecords is greater than 100, then it will provide pagination hints accordingly.
…e the intention is the load *all* of the records. Also, marking both the Return All and Limit params as optional, so as to not confuse the user. Making them both required adds a lot of confusion that doesn't make sense. Ideally, the user either specifies Return All OR specifies the Limit value but not BOTH. It seems there's no way to define "conditional requirements" in Flowise params, so it's better to make both params optional.
@dkindlund
Copy link
Contributor Author

Screenshot:
image

@HenryHengZJ
Copy link
Contributor

wow this is awesome!! thanks a lot for the improvement!

@HenryHengZJ HenryHengZJ merged commit 985e454 into FlowiseAI:main Jan 29, 2024
2 checks passed
@doumlegare
Copy link

Hello,
When you use airtable into flowise you upload it for the first time into vectordb like pineconeand create an automatisation to update the file into db automaticly when a uopdate is made in airtable?
Thanks for sharing your experience

@dkindlund
Copy link
Contributor Author

Hi @doumlegare , the DocumentLoader node doesn't support syncing Airtable record updates into any vector DB at this time. This is a known limitation within Flowise. In short, any documents you load through this automation will be added into the vector DB, always. If you had already loaded an earlier version of the document (from Airtable) into the vector DB and you re-run it again, you will end up with 2 documents in the vector DB (duplicates).

For now, you have to implement a mechanism outside of Flowise to keep track of which documents were already added to the vector DB and which documents need to be updated in the vector DB.

Long term fix is discussed here:
#1638

@doumlegare
Copy link

Thanks for your fast response @dkindlund
So how do you use Airtable as a database right now?
Do you have a flow exemple?

@dkindlund
Copy link
Contributor Author

@doumlegare , I'm basically using it similar to how this video describes using Airtable:
https://www.youtube.com/watch?v=5U22PUI7jP0

And yeah, I'm having to implement my own tracking mechanism outside of Flowise to make sure I'm not upserting duplicate documents into the vector store DB.

JohnBQuinn pushed a commit to JohnBQuinn/Flowise that referenced this pull request Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants