-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SFR-2306: Integrating with Airtable to get backlist records #449
Conversation
This recent commit only adds the offset to the conditional for when a complete process is run for right now. My next commit will be focused on using the filterByFormula parameter in the Airtbale API to retrieve records based on the Last Modified field I recently added to the UofMichigan view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Dmitri, thanks for opening this PR. I'm really glad to see work on the Airtable side of things.
In general, I'm wondering if we can refine get_records
and get_records_json
to be a little more concise and readable. In get_records_json
I see a lot of control flow blocks that have similar code content, but slightly different configurations of variables. Is it possible to generalize some of this functionality, and to pull out and separately declare repeated variable declarations (e.g. is it possible to name the formulas and not repeat them)?
In get_records
I see a type hint, which is great (I think it would be good to add type hints to get_records_json
too. I think the updated variable name in line 30 is helpful, but it's still difficult for me to follow the logic in this function. And although the type hint is there, I don't see anything returned.
At a high level, I think it's hard for me to understand the data structure manipulation happening here when I don't have an example of the Airtable json response to hand. I think giving things like meta_dict
some more specific names that describe not just the data structure, but what's contained in those structures, could help add some clarity for someone who is just glancing at the file.
|
||
logger = create_log(__name__) | ||
|
||
BASE_URL = "https://api.airtable.com/v0/appBoLf4lMofecGPU/Publisher%20Backlists%20%26%20Collections%20%F0%9F%93%96?view=UofMichigan%20Press" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our conversation on Wednesday - do we want to update this to look at the all lists view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with changing the view to All lists view since we clarified which view we will be using for the process in the future.
if not start_timestamp: | ||
start_timestamp = datetime.now(timezone.utc).replace(tzinfo=None) - timedelta(hours=24) | ||
|
||
start_date_time_str = start_timestamp.strftime("%Y-%m-%d %H:%M:%S.%f") | ||
start_date_time_encoded = urllib.parse.quote(start_date_time_str) | ||
is_same_date_time_filter = f"IS_SAME(%7BLast%20Modified%7D,%20%22{start_date_time_encoded}%22,%20%22second%22" | ||
is_after_date_time_filter = f"%20IS_AFTER(%7BLast%20Modified%7D,%20%22{start_date_time_encoded}%22" | ||
|
||
filter_by_formula = f"OR({is_same_date_time_filter}),{is_after_date_time_filter}))" | ||
return filter_by_formula |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - this looks good. Do the Airtable last modified dates have seconds to them as well? Could we simplify this to do is after without the seconds or even minutes?
Nit: recommend adding a newline between the body of this function and the return statement for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Airtable last modified dates do have seconds in their response like this: 2024-06-20T17:22:51.000Z
I just tested out the IS_SAME filter in Insomnia and unit of time like seconds aren't required in the filter parameter which isn't stated in the Airtable formulas sheet on their website unlike other filters that detail when a parameter is optional. The unit of time details how we should match the dates up to a certain unit like hours, minutes, and seconds. Therefore, I can get rid of the unit of time parameter in that filter to simplify the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
if filter_by_formula: | ||
url_format = f'{BASE_URL}&filterByFormula={filter_by_formula}&pageSize={limit}' | ||
else: | ||
url_format = f'{BASE_URL}&pageSize={limit}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you can simplify this because we are adding the pageSize in both cases.
url = f'{BASE_URL}&pageSize={limit}'
if filter_by_formula:
url += f'&filterByFormula{filter_by_formula}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the suggestion!
array_json_records = self.get_records_json(full_import, start_timestamp, offset, limit) | ||
|
||
|
||
for json_dict in array_json_records: | ||
for records_value in json_dict['records']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend that get_records_array does this logic to return the records. that way this function is only getting back a list of json records and has to simply map it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to clarify what array_json_records consists of. It seems from what you're proposing that array_json_records looks like this:
[{"records": [values from all records from airtable API]}]
but in actuality it looks like this:
[{"records": [value from first 100 records from airtable]}, {"records":[value from next 100 records from airtable]}]
That's why I can't just return the values from the records key one time since there are several records due to their being multiple dictionaries in the array.
Description
Testing
make integration
pytest -k test_pub_backlist_process