-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-34076] Implement TTL-based caching for BigQuery table definitions #34135
base: master
Are you sure you want to change the base?
Conversation
fcba87c
to
cfff9ef
Compare
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
b31af48
to
cfff9ef
Compare
cfff9ef
to
6cc2629
Compare
f80bd52
to
890670a
Compare
- Fix field_type.upper() redundant call in beam_row_from_dict - Fix temp dataset handling logic in BigQueryWrapper.__init__ - Add comprehensive test coverage for table definition caching - Add proper thread safety with RLock - Add documentation and comments Fixes apache#34076
890670a
to
8c5d460
Compare
R: @stankiewicz Thanks for the review. I've addressed your feedback:
Could you please review the updated code in Please let me know if you'd like me to explain anything further. |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
- Reorder methods for better readability (get_query_location before get_table) - Use _get_temp_dataset() instead of inlining logic - Add comprehensive test coverage for temp dataset handling - Fix method organization as per review feedback Part of apache#34076
c496455
to
7128d38
Compare
7128d38
to
ac1c788
Compare
Hi @Strikerrx01, friendly ping to look at finishing this pr. Thanks! |
…for improved code quality
@Strikerrx01 any updates ? |
@chamikaramj I apologize for the lack of updates on this PR over the past week. I've been dealing with a medical emergency that prevented me from working on this. I'm now back and have just pushed an update with the formatting changes as requested. Thank you for your patience, and I'm ready to address any remaining feedback to get this PR merged. |
there are formatting issues -- https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks and run yapf, thanks! |
[BEAM-34076] Implement TTL-based caching for BigQuery table definitions
Description
This change implements a thread-safe caching mechanism for BigQuery table definitions
in the
BigQueryWrapper
class to address issue BEAM-34076. The implementation usesa TTL (Time-To-Live) caching strategy to reduce BigQuery API calls while maintaining
data freshness.
Motivation
Previously,
get_table()
was called independently by each worker, leading to potentialBigQuery quota issues when multiple workers access the same table definitions. This
implementation significantly reduces API calls by reusing cached table definitions
when appropriate.
Modifications
cachetools.TTLCache
with configurable cache parametersthreading.RLock
Tests
Added comprehensive unit tests that verify:
Documentation
Added method documentation for all new public methods and internal implementation
details to ensure maintainability and clarity.
Fixes #34076
Please Review
@apache/beam-maintainers