Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFR-2306: Integrating with Airtable to get backlist records #449

Merged
merged 6 commits into from
Nov 22, 2024

Conversation

kylevillegas93
Copy link
Contributor

@kylevillegas93 kylevillegas93 commented Nov 15, 2024

Description

  • Integrates with Airtable to get backlist records

Testing

make integration
pytest -k test_pub_backlist_process

@mitri-slory
Copy link
Contributor

This recent commit only adds the offset to the conditional for when a complete process is run for right now. My next commit will be focused on using the filterByFormula parameter in the Airtbale API to retrieve records based on the Last Modified field I recently added to the UofMichigan view.

services/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Apophenia Apophenia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dmitri, thanks for opening this PR. I'm really glad to see work on the Airtable side of things.

In general, I'm wondering if we can refine get_records and get_records_json to be a little more concise and readable. In get_records_json I see a lot of control flow blocks that have similar code content, but slightly different configurations of variables. Is it possible to generalize some of this functionality, and to pull out and separately declare repeated variable declarations (e.g. is it possible to name the formulas and not repeat them)?

In get_records I see a type hint, which is great (I think it would be good to add type hints to get_records_json too. I think the updated variable name in line 30 is helpful, but it's still difficult for me to follow the logic in this function. And although the type hint is there, I don't see anything returned.

At a high level, I think it's hard for me to understand the data structure manipulation happening here when I don't have an example of the Airtable json response to hand. I think giving things like meta_dict some more specific names that describe not just the data structure, but what's contained in those structures, could help add some clarity for someone who is just glancing at the file.

services/sources/publisher_backlist_service.py Outdated Show resolved Hide resolved
services/sources/publisher_backlist_service.py Outdated Show resolved Hide resolved

logger = create_log(__name__)

BASE_URL = "https://api.airtable.com/v0/appBoLf4lMofecGPU/Publisher%20Backlists%20%26%20Collections%20%F0%9F%93%96?view=UofMichigan%20Press"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our conversation on Wednesday - do we want to update this to look at the all lists view?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with changing the view to All lists view since we clarified which view we will be using for the process in the future.

Comment on lines 81 to 90
if not start_timestamp:
start_timestamp = datetime.now(timezone.utc).replace(tzinfo=None) - timedelta(hours=24)

start_date_time_str = start_timestamp.strftime("%Y-%m-%d %H:%M:%S.%f")
start_date_time_encoded = urllib.parse.quote(start_date_time_str)
is_same_date_time_filter = f"IS_SAME(%7BLast%20Modified%7D,%20%22{start_date_time_encoded}%22,%20%22second%22"
is_after_date_time_filter = f"%20IS_AFTER(%7BLast%20Modified%7D,%20%22{start_date_time_encoded}%22"

filter_by_formula = f"OR({is_same_date_time_filter}),{is_after_date_time_filter}))"
return filter_by_formula
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - this looks good. Do the Airtable last modified dates have seconds to them as well? Could we simplify this to do is after without the seconds or even minutes?

Nit: recommend adding a newline between the body of this function and the return statement for readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Airtable last modified dates do have seconds in their response like this: 2024-06-20T17:22:51.000Z I just tested out the IS_SAME filter in Insomnia and unit of time like seconds aren't required in the filter parameter which isn't stated in the Airtable formulas sheet on their website unlike other filters that detail when a parameter is optional. The unit of time details how we should match the dates up to a certain unit like hours, minutes, and seconds. Therefore, I can get rid of the unit of time parameter in that filter to simplify the filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Comment on lines 98 to 101
if filter_by_formula:
url_format = f'{BASE_URL}&filterByFormula={filter_by_formula}&pageSize={limit}'
else:
url_format = f'{BASE_URL}&pageSize={limit}'
Copy link
Contributor Author

@kylevillegas93 kylevillegas93 Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you can simplify this because we are adding the pageSize in both cases.

url = f'{BASE_URL}&pageSize={limit}'

if filter_by_formula:
  url += f'&filterByFormula{filter_by_formula}'
  

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the suggestion!

Comment on lines 29 to 33
array_json_records = self.get_records_json(full_import, start_timestamp, offset, limit)


for json_dict in array_json_records:
for records_value in json_dict['records']:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that get_records_array does this logic to return the records. that way this function is only getting back a list of json records and has to simply map it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to clarify what array_json_records consists of. It seems from what you're proposing that array_json_records looks like this:

[{"records": [values from all records from airtable API]}]

but in actuality it looks like this:

[{"records": [value from first 100 records from airtable]}, {"records":[value from next 100 records from airtable]}]

That's why I can't just return the values from the records key one time since there are several records due to their being multiple dictionaries in the array.

@mitri-slory mitri-slory merged commit 681ce34 into main Nov 22, 2024
1 check passed
@mitri-slory mitri-slory deleted the SFR-2306 branch November 22, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants