-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(low-code): Add API Budget #314
Conversation
📝 WalkthroughWalkthroughThis pull request introduces enhancements for managing API rate limiting and request matching. New schema definitions and model classes—such as APIBudget, HTTPAPIBudget, FixedWindowCallRatePolicy, MovingWindowCallRatePolicy, UnlimitedCallRatePolicy, Rate, and HttpRequestRegexMatcher—are added to the declarative component schema. The changes also update the Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant Factory as ModelToComponentFactory
participant APIB as APIBudget
participant Req as HttpRequester
participant Client as HttpClient
Dev->>Factory: Invoke creation methods (create_http_api_budget, etc.)
Factory->>APIB: Instantiate APIBudget with rate policies
Factory->>Req: Inject APIBudget into HttpRequester
Req->>Client: Initialize HttpClient with api_budget parameter
Would you like to see additional details or adjustments to the sequence diagram, wdyt? Possibly related PRs
Suggested labels
Suggested reviewers
If you have any preferences for additional reviewers or changes, just let me know! ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (11)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)
6-6
: Would you like to confirm the necessity of this import?
It's great thatdatetime
andtimedelta
are explicitly imported for the new rate-limiting features. However, if they're only used sparingly, perhaps consider consolidating related imports or confirming there's no overlap with existing standard library imports to keep the code tidy. wdyt?
660-674
: Consider stronger typing for method.
Right now,method
is a free-form string. Is it worth enforcing, say, anHttpMethod
enum to reduce potential typos or mismatches when matching requests? wdyt?
1654-1664
: Unlimited policy name alignment.
For an “unlimited” policy, do we still track calls or simply skip rate limiting entirely? Maybe clarify or log usage to avoid confusion. wdyt?
1797-1836
: Consider lowering the maximum attempts.
Havingmaximum_attempts_to_acquire
set to 100000 might risk excessive retries if the API limit is far lower. Do you think a smaller default (e.g., 100 or 1000) would be safer? wdyt?airbyte_cdk/sources/declarative/requesters/http_requester.py (2)
25-25
: Validate the need for this import.
TheAPIBudget
import is a neat addition. If the scenario for rate-limit budgeting is common, great! Otherwise, consider lazy imports or clarifying usage. wdyt?
59-59
: Add docstrings for clarity?
Introducingapi_budget
is helpful, but it might benefit from an inline docstring or comment explaining how it's typically configured or used. wdyt?airbyte_cdk/sources/declarative/declarative_component_schema.yaml (5)
1368-1410
: APIBudget Component Addition: Is This the Final Form?
The newAPIBudget
component looks comprehensive—it clearly defines the available properties such as policies, header names, status codes, and even a high default formaximum_attempts_to_acquire
. I'm wondering, is the default value (100000) formaximum_attempts_to_acquire
intentional? Also, would you consider providing an example or additional constraints for thepolicies
array to help guide users? wdyt?
1446-1470
: Moving Window Call Rate Policy: Flexibility is Key!
This component neatly captures the idea of a moving window rate limit through itsrates
andmatchers
. Would you consider imposing any constraints or suggesting a default for therates
array to help prevent misconfiguration, or is keeping it fully flexible the intent here? wdyt?
1471-1487
: Unlimited Call Rate Policy: Keeping It Open!
TheUnlimitedCallRatePolicy
is straightforward and allows for unrestricted calls for specific requests. Do you feel that we should restrict additional properties for tighter validation, or is the open-ended design intentional to provide extra flexibility? wdyt?
1506-1529
: HTTP Request Matcher: Versatile Matcher!
TheHttpRequestMatcher
is well-structured and flexible, covering method, URL, parameters, and headers. One thought—would it be a good idea to enforce a URI format on theurl
property (or add a note about what format is expected) to help catch misconfigurations early? wdyt?
1802-1805
: HttpRequester API Budget Integration: Nice Touch!
Integrating the newapi_budget
property into theHttpRequester
component by referencing theAPIBudget
definition looks like a smooth enhancement. Might it be helpful to add an example or extra documentation to illustrate howapi_budget
interacts with the HTTP requester and other related components? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(5 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(15 hunks)airbyte_cdk/sources/declarative/requesters/http_requester.py
(3 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
[error] 847-847: Argument 2 to 'LegacyToPerPartitionStateMigration' has incompatible type 'CustomIncrementalSync | DatetimeBasedCursor | None'; expected 'CustomIncrementalSync | DatetimeBasedCursor'
[error] 1146-1146: Argument 2 to 'ClampingEndProvider' has incompatible type 'Callable[[], datetime]'; expected 'Callable[[], CursorValueType]'
[error] 1164-1164: Argument 2 to 'ClampingEndProvider' has incompatible type 'Callable[[], datetime]'; expected 'Callable[[], CursorValueType]'
[error] 1172-1172: Argument 2 to 'ClampingEndProvider' has incompatible type 'Callable[[], datetime]'; expected 'Callable[[], CursorValueType]'
[error] 1190-1190: Argument 'start' to 'ConcurrentCursor' has incompatible type 'datetime'; expected 'CursorValueType | None'
[error] 1192-1192: Argument 'end_provider' to 'ConcurrentCursor' has incompatible type 'Callable[[], datetime]'; expected 'Callable[[], CursorValueType]'
[error] 2977-2977: Item 'None' of 'list[FixedWindowCallRatePolicy | MovingWindowCallRatePolicy | UnlimitedCallRatePolicy] | None' has no attribute 'iter' (not iterable)
[error] 2982-2982: Argument 'ratelimit_reset_header' to 'HttpAPIBudget' has incompatible type 'str | None'; expected 'str'
[error] 2983-2983: Argument 'ratelimit_remaining_header' to 'HttpAPIBudget' has incompatible type 'str | None'; expected 'str'
[error] 2984-2984: Argument 'status_codes_for_ratelimit_hit' to 'HttpAPIBudget' has incompatible type 'list[int] | None'; expected 'tuple[int]'
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (13)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)
646-658
: Should we add validation for non-positive values?
Currently, there's no check to ensurelimit
is a positive integer orinterval
is non-zero. Would adding a simple constraint or Pydantic validator help prevent misconfiguration issues? wdyt?
1612-1635
: Handling timezone and negative values.
next_reset_ts
andperiod
could introduce edge cases if they use local times or negative intervals. Maybe add a note or validator to ensure these are positive and timezone-aware. wdyt?
1637-1652
: Validate intervals in rates.
If anyRate
has an invalid or zero interval, the moving window might fail. Would you like to add checks ensuring intervals are valid durations? wdyt?
2107-2111
: Verify synergy with existing rate limit handling.
Addingapi_budget
is excellent for budget-based rate limiting. Are we certain it doesn't overlap with existing error-handling or backoff logic? A quick check for consistency or consolidated handling might help. wdyt?airbyte_cdk/sources/declarative/requesters/http_requester.py (1)
96-96
: Double-check integrated error handling.
We’re now passingapi_budget
intoHttpClient
. Are we sureHttpClient
coordinates gracefully with the existing error handler to handle rate limits cohesively? Maybe cross-check or comment on interactions. wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6)
115-117
: LGTM! Clean import organization.The new imports are well-organized and follow the existing pattern of grouping related models and components together.
Also applies to: 232-234, 250-252, 307-309, 328-330, 374-376, 490-497
636-641
: LGTM! Factory mapping is properly updated.The new components are correctly mapped to their respective factory methods in the
PYDANTIC_MODEL_TO_CONSTRUCTOR
dictionary.
1952-1956
: LGTM! Clean HttpRequester integration.The API budget integration into HttpRequester is clean and maintains optional behavior.
Also applies to: 1978-1978
2984-3024
: LGTM! Clean implementation of rate policy factories.The implementations for
create_fixed_window_call_rate_policy
,create_moving_window_call_rate_policy
, andcreate_unlimited_call_rate_policy
are well-structured and follow the factory pattern consistently.🧰 Tools
🪛 GitHub Actions: Linters
[error] 2984-2984: Argument 'status_codes_for_ratelimit_hit' to 'HttpAPIBudget' has incompatible type 'list[int] | None'; expected 'tuple[int]'
3025-3030
: LGTM! Simple Rate component factory.The
create_rate
implementation is clean and straightforward.
3031-3039
: LGTM! Clean HttpRequestMatcher factory.The
create_http_request_matcher
implementation is clean and properly handles all the matcher fields.airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1410-1445
: Fixed Window Call Rate Policy: Looks Solid!
TheFixedWindowCallRatePolicy
is clearly defined with the required properties (next_reset_ts
,period
,call_limit
, andmatchers
). Do you think it might be beneficial to enforce or validate the duration format for theperiod
property for consistency across the schema? wdyt?
1488-1505
: Rate Component: Clear and Concise!
This component succinctly defines a rate limit with a numericlimit
and aninterval
(formatted as a duration). Would it be useful to add extra validation to ensure that theinterval
value adheres to the expected ISO8601 duration format, or is that handled elsewhere in the system? wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (18)
airbyte_cdk/sources/streams/call_rate.py (4)
103-110
: Expanded docstring for HttpRequestMatcher
This descriptive docstring is quite helpful. Would you consider adding an example use-case in the docstring to make adoption even simpler? wdyt?
121-126
: Docstring clarification for constructor parameters
Great job documenting param usage. Might it be clearer to mention that headers are case-insensitively matched in the param doc directly? wdyt?
164-184
: Regex match and query/header dict checks
Everything seems consistent. The dictionary subset check with_match_dict
is a neat approach. Consider logging a debug line if matches fail for easy debugging. wdyt?
435-443
: New APIBudget dataclass usage
This refactor to a dataclass clarifies initialization. It might be beneficial to specify defaults for “policies” if empty. wdyt?airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)
140-143
: New logic for setting api_budget
Adding the api_budget integration is a valuable enhancement. Could be good to add a small debug log when the budget is set to help track usage. wdyt?airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
1806-1828
: New APIBudget class
This is a solid approach to grouping multiple policies. Maybe a minimal usage example in docstring (like how to define it in the manifest) could help. wdyt?
1900-1904
: api_budget field in DeclarativeSource
This top-level field is a nice injection point for rate-limiting. Ensuring no conflicts if multiple budgets are provided might be worth documenting. Any plan for that? wdyt?Also applies to: 1931-1935
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
553-553
: Consider using a more specific type annotation for_api_budget
.The current type annotation uses
Optional[Union[APIBudget, HttpAPIBudget]]
. Since both types share a common interface for rate limiting, would it make sense to use a protocol or abstract base class instead? This could make the code more maintainable and extensible. wdyt?- self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None + self._api_budget: Optional[RateLimitingBudget] = None # where RateLimitingBudget is a Protocol or ABC
2977-2999
: Consider extracting default values to constants.The
create_http_api_budget
method has several hardcoded default values. Would it be cleaner to extract these to class-level constants? This would make it easier to maintain and modify these values in the future. wdyt?+ DEFAULT_RATELIMIT_RESET_HEADER = "ratelimit-reset" + DEFAULT_RATELIMIT_REMAINING_HEADER = "ratelimit-remaining" + DEFAULT_RATELIMIT_STATUS_CODES = (429,) + DEFAULT_MAX_ATTEMPTS = 100000 def create_http_api_budget( self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any ) -> HttpAPIBudget: policies = [ self._create_component_from_model(model=policy, config=config) for policy in model.policies ] return HttpAPIBudget( policies=policies, maximum_attempts_to_acquire=model.maximum_attempts_to_acquire if model.maximum_attempts_to_acquire - else 100000, + else self.DEFAULT_MAX_ATTEMPTS, ratelimit_reset_header=model.ratelimit_reset_header if model.ratelimit_reset_header - else "ratelimit-reset", + else self.DEFAULT_RATELIMIT_RESET_HEADER, ratelimit_remaining_header=model.ratelimit_remaining_header if model.ratelimit_remaining_header - else "ratelimit-remaining", + else self.DEFAULT_RATELIMIT_REMAINING_HEADER, status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit if model.status_codes_for_ratelimit_hit - else (429,), + else self.DEFAULT_RATELIMIT_STATUS_CODES, )
3059-3073
: Consider using pattern matching for API budget type selection.The
set_api_budget
method uses if-else statements for type selection. Since Python 3.10+ supports pattern matching, would it be cleaner to use a match statement here? wdyt?def set_api_budget(self, component_definition: ComponentDefinition, config: Config) -> None: model_str = component_definition.get("type") - if model_str == "APIBudget": - # Annotate model_type as a type that is a subclass of BaseModel - model_type: Union[Type[APIBudgetModel], Type[HTTPAPIBudgetModel]] = APIBudgetModel - elif model_str == "HTTPAPIBudget": - model_type = HTTPAPIBudgetModel - else: - raise ValueError(f"Unknown API Budget type: {model_str}") + match model_str: + case "APIBudget": + model_type: Union[Type[APIBudgetModel], Type[HTTPAPIBudgetModel]] = APIBudgetModel + case "HTTPAPIBudget": + model_type = HTTPAPIBudgetModel + case _: + raise ValueError(f"Unknown API Budget type: {model_str}") self._api_budget = self.create_component( model_type=model_type, component_definition=component_definition, config=config )airbyte_cdk/sources/declarative/declarative_component_schema.yaml (8)
43-48
: API Budget Property Addition in the Main Schema
The new "api_budget" property is a clear and useful addition that allows the connector to accept either a generic APIBudget or an HTTP-specific APIBudget configuration. Its placement alongside other core properties looks logical. Would you consider adding an example or further constraints (such as enforcing non-null values) to tighten its usage? wdyt?
1375-1403
: New APIBudget Component Definition
The APIBudget definition is comprehensive, requiring both "type" and "policies" and including a field for "maximum_attempts_to_acquire" with a sensible default. The documentation text is clear about its purpose. Do you think adding an explicit example (or reference to one in our docs) would help users better understand the expected format for policies? wdyt?
1404-1448
: New HTTPAPIBudget Component Definition
This HTTP-specific variant of APIBudget extends the generic APIBudget neatly by adding headers for ratelimit reset and remaining counters, along with status code indicators for a rate limit hit. The defaults (like "ratelimit-reset" and "ratelimit-remaining") provide clarity. Would you consider making these header names configurable via examples in the schema for clarity? wdyt?
1449-1483
: FixedWindowCallRatePolicy Definition Addition
The FixedWindowCallRatePolicy component looks well defined, ensuring that a fixed number of calls can occur within a specified time window. I appreciate the use of ISO8601 date-time format for "next_reset_ts" and the duration format for "period". Would you consider including an example that illustrates a typical value for "period" (e.g. "P1D") to further guide users? wdyt?
1484-1508
: MovingWindowCallRatePolicy Component Addition
The MovingWindowCallRatePolicy is defined to allow a fixed number of calls within a moving time window and leverages an array of "Rate" objects along with matchers. This usage is consistent with our design. Do you think providing a brief inline example for the "rates" array might enhance clarity for users new to this concept? wdyt?
1509-1525
: UnlimitedCallRatePolicy Component Definition
The UnlimitedCallRatePolicy is straightforward, requiring only the "type" and the "matchers" array. This clearly communicates that the policy has no call limits. Would you be open to adding a note in the description about typical use cases for this policy? wdyt?
1527-1544
: Rate Component Definition Addition
Defining the "Rate" component to encapsulate a limit and an interval adds sharp granularity to our rate limiting description. The documentation mentions that "interval" uses the duration format, which is great. Would it be beneficial to include additional example values (or constraints) in the schema to further illustrate acceptable formats? wdyt?
1545-1575
: HTTP Request Matcher Component Update
The new HttpRequestMatcher definition appears robust and detailed, allowing matching on HTTP method, base URL, path (with regex support), parameters, and headers. This flexible design should serve varied use cases well. Do you feel any additional constraints or examples are needed to clarify how regex patterns should be constructed? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/manifest_declarative_source.py
(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(6 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(12 hunks)airbyte_cdk/sources/streams/call_rate.py
(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (11)
airbyte_cdk/sources/streams/call_rate.py (5)
9-14
: Added imports for "re" and updated dataclass/typing references
Were these imports introduced for regex support and type hints? Everything looks good in principle. Just confirming that no overshadowing of existing imports happens.
115-117
: Constructor parameters for HttpRequestMatcher
The new parameters 'url_base' and 'url_path_pattern' increase flexibility. Looks good, but do we need to handle trailing slashes for patterns as well? wdyt?
[approve]
127-139
: Initialization logic for method, url_base, headers, and params
The transformations (like rstrip on url_base, lowercasing header keys, stringifying query params) look correct. No immediate issues spotted.
158-161
: Method equality check
Ensuring we compare uppercase variants is crucial for consistency. This looks correct. No concerns here.
445-466
: acquire_call() method referencing policies
Your iteration to find the first matching policy is neat. The fallback logging when no policy matches is also helpful. Implementation looks correct.airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
6-6
: Importing datetime & timedelta
Looks like these are required for new rate-limiting classes. This addition aligns with usage below.
646-658
: Rate class definition
Using a pydantic model for “limit” and “interval” is neat. One small thought: do we need to ensure the interval is > 0? wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
115-117
: LGTM! Clean import organization.The new imports are well-organized and follow the existing pattern of grouping related models and components together.
Also applies to: 232-234, 244-246, 253-255, 310-312, 331-333, 377-379, 493-501
1952-1953
: LGTM! Clean integration with HttpRequester.The API budget is cleanly integrated into the HttpRequester class through the
api_budget
parameter.Also applies to: 1974-1974
2964-2976
: Consider adding validation for empty policies list.The
create_api_budget
method doesn't validate if the policies list is empty. An empty policies list might lead to unexpected behavior. wdyt about adding a validation check?def create_api_budget(self, model: APIBudgetModel, config: Config, **kwargs: Any) -> APIBudget: policies = [ self._create_component_from_model(model=policy, config=config) for policy in model.policies ] + if not policies: + raise ValueError("At least one policy must be specified for APIBudget") return APIBudget( policies=policies, maximum_attempts_to_acquire=model.maximum_attempts_to_acquire if model.maximum_attempts_to_acquire else 100000, )✅ Verification successful
Proposed Enhancement: Validate Non-Empty Policies List in create_api_budget
Hi there, upon reviewing the APIBudget creation code, I noticed that the policies list is built from model.policies without any checks for emptiness. If model.policies is empty, APIBudget ends up being initialized with an empty policies list, which might lead to unexpected behavior later on. Would you be open to adding a validation step (for example, raising a ValueError when policies is empty) to fail fast in case of misconfiguration? wdyt?
- Confirm that model.policies should indeed contain at least one policy.
- Adding an early validation helps prevent downstream errors by ensuring APIBudget is always constructed with valid data.
3001-3014
: Consider validating period format.The
create_fixed_window_call_rate_policy
method usesparse_duration
without validating the period format. Would it be safer to add error handling for invalid duration formats? wdyt?def create_fixed_window_call_rate_policy( self, model: FixedWindowCallRatePolicyModel, config: Config, **kwargs: Any ) -> FixedWindowCallRatePolicy: matchers = [ self._create_component_from_model(model=matcher, config=config) for matcher in model.matchers ] + try: + period = parse_duration(model.period) + except ValueError as e: + raise ValueError(f"Invalid period format: {model.period}. Expected ISO 8601 duration format.") from e return FixedWindowCallRatePolicy( next_reset_ts=model.next_reset_ts, - period=parse_duration(model.period), + period=period, call_limit=model.call_limit, matchers=matchers, )✅ Verification successful
🏁 Script executed:
#!/bin/bash # Search for the definition of parse_duration in the repository to understand its error handling rg "def parse_duration" -n .Length of output: 30
🏁 Script executed:
#!/bin/bash # Let's search for the definition of parse_duration in all Python files fd -e py -x grep -H "def parse_duration" {} \;Length of output: 51
🏁 Script executed:
#!/bin/bash # Let's search for any import lines that mention parse_duration to determine its source. rg -n "import .*parse_duration" .Length of output: 571
Based on our investigation, it looks like parse_duration is directly imported from the external isodate package without any internal safeguards in our code. Since an invalid ISO 8601 duration string would likely cause parse_duration to throw an exception, adding a try/except block to catch and emit a friendlier error makes sense. This would improve robustness when an unexpected format is provided. Would you consider incorporating the try/except as proposed in your patch snippet (wdyt)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
3049-3063
: Enhance error handling in set_api_budget.The error message could be more descriptive and include the list of supported types. Also, consider using a more type-safe approach. wdyt?
def set_api_budget(self, component_definition: ComponentDefinition, config: Config) -> None: model_str = component_definition.get("type") + supported_types = {"APIBudget", "HTTPAPIBudget"} + + if not model_str: + raise ValueError("API Budget type not specified") + + if model_str not in supported_types: + raise ValueError( + f"Unsupported API Budget type: {model_str}. " + f"Supported types are: {', '.join(supported_types)}" + ) - if model_str == "APIBudget": + model_type: Type[Union[APIBudgetModel, HTTPAPIBudgetModel]] = ( + APIBudgetModel if model_str == "APIBudget" else HTTPAPIBudgetModel + ) - # Annotate model_type as a type that is a subclass of BaseModel - model_type: Union[Type[APIBudgetModel], Type[HTTPAPIBudgetModel]] = APIBudgetModel - elif model_str == "HTTPAPIBudget": - model_type = HTTPAPIBudgetModel - else: - raise ValueError(f"Unknown API Budget type: {model_str}") self._api_budget = self.create_component( model_type=model_type, component_definition=component_definition, config=config )
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
553-553
: Consider explicitly initializing the _api_budget variable.The variable is implicitly initialized to None through the Optional type. For better readability and maintainability, would you consider making it explicit? wdyt?
- self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None + self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None # Initialized in set_api_budget
2964-2974
: Consider using a constant for the default maximum attempts.The default value of 100000 for
maximum_attempts_to_acquire
could be defined as a class constant for better maintainability. wdyt?class ModelToComponentFactory: + DEFAULT_MAXIMUM_ATTEMPTS = 100000 ... def create_api_budget(self, model: APIBudgetModel, config: Config, **kwargs: Any) -> APIBudget: policies = [ self._create_component_from_model(model=policy, config=config) for policy in model.policies ] return APIBudget( policies=policies, - maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or 100000, + maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or self.DEFAULT_MAXIMUM_ATTEMPTS, )
2975-2990
: Consider using constants for default header names and status codes.The default values for headers and status codes could be defined as class constants for better maintainability. wdyt?
class ModelToComponentFactory: + DEFAULT_RATELIMIT_RESET_HEADER = "ratelimit-reset" + DEFAULT_RATELIMIT_REMAINING_HEADER = "ratelimit-remaining" + DEFAULT_RATELIMIT_STATUS_CODES = (429,) ... def create_http_api_budget(self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any) -> HttpAPIBudget: policies = [ self._create_component_from_model(model=policy, config=config) for policy in model.policies ] return HttpAPIBudget( policies=policies, maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or self.DEFAULT_MAXIMUM_ATTEMPTS, - ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset", - ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining", - status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,), + ratelimit_reset_header=model.ratelimit_reset_header or self.DEFAULT_RATELIMIT_RESET_HEADER, + ratelimit_remaining_header=model.ratelimit_remaining_header or self.DEFAULT_RATELIMIT_REMAINING_HEADER, + status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or self.DEFAULT_RATELIMIT_STATUS_CODES, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(12 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
115-117
: LGTM! Well-organized imports.The new imports are properly organized alphabetically and follow the existing pattern of grouping related imports together.
Also applies to: 232-234, 244-246, 253-255, 310-312, 331-333, 377-379, 493-501
2991-3048
: LGTM! Well-implemented factory methods.The implementations of
create_fixed_window_call_rate_policy
,create_moving_window_call_rate_policy
,create_unlimited_call_rate_policy
,create_rate
, andcreate_http_request_matcher
are clean and follow the established patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (12)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
553-553
: Consider using a more specific type annotation for_api_budget
.The current type annotation allows both
APIBudget
andHttpAPIBudget
. Since these have different interfaces, it might be better to use a common base class or interface. wdyt?- self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None + self._api_budget: Optional[BaseBudget] = None # Create a base class/interface that both budget types implement
2979-2994
: Consider customizing default header names through configuration.The HTTP API budget uses hardcoded default header names. Would it be better to make these configurable through the connector config? This would allow easier customization for different APIs. wdyt?
return HttpAPIBudget( policies=policies, maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or 100000, - ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset", - ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining", + ratelimit_reset_header=model.ratelimit_reset_header or config.get("ratelimit_reset_header", "ratelimit-reset"), + ratelimit_remaining_header=model.ratelimit_remaining_header or config.get("ratelimit_remaining_header", "ratelimit-remaining"), status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,), )
3042-3052
: Consider adding pattern validation for HttpRequestMatcher.The URL path pattern and other regex patterns are not validated. Should we validate these at creation time to catch invalid patterns early? wdyt?
def create_http_request_matcher( self, model: HttpRequestRegexMatcherModel, config: Config, **kwargs: Any ) -> HttpRequestRegexMatcher: + # Validate regex patterns + try: + if model.url_path_pattern: + re.compile(model.url_path_pattern) + for pattern in model.params.values(): + re.compile(pattern) + for pattern in model.headers.values(): + re.compile(pattern) + except re.error as e: + raise ValueError(f"Invalid regex pattern: {str(e)}") + return HttpRequestRegexMatcher( method=model.method, url_base=model.url_base, url_path_pattern=model.url_path_pattern, params=model.params, headers=model.headers, )
3053-3067
: Consider using a more descriptive error message in set_api_budget.The error message for unknown API Budget type could be more helpful by listing the supported types. wdyt?
- raise ValueError(f"Unknown API Budget type: {model_str}") + raise ValueError(f"Unknown API Budget type: {model_str}. Supported types are: APIBudget, HTTPAPIBudget")airbyte_cdk/sources/declarative/declarative_component_schema.yaml (8)
43-48
: New "api_budget" Property in DeclarativeSource
This addition introduces the new api_budget field that accepts either a generic APIBudget or an HTTPAPIBudget. It is a flexible way to configure API request limits for all streams. Would you be open to adding an example or sample value in the schema description for clarity?
1375-1403
: APIBudget Component Definition
The APIBudget component is well-defined with required fields for type and policies, and includes a default of 100000 for maximum_attempts_to_acquire. Do you think adding more context about the rationale for this high default, or perhaps offering an example usage inline, might be helpful for future maintainers?
1404-1448
: HTTPAPIBudget Component Definition
This extension of APIBudget for HTTP-specific settings looks great—it clearly defines the headers required to update rate limiting information based on HTTP responses. Would you consider including usage examples (for instance, how ratelimit_reset_header and ratelimit_remaining_header are used) or even constraints on acceptable header names?
1449-1483
: FixedWindowCallRatePolicy
The FixedWindowCallRatePolicy is cleanly implemented, requiring next_reset_ts, period (formatted as a duration), call_limit, and matchers. Have you considered adding a regex or format check for the period field to ensure it conforms to the expected ISO8601 duration format? Wdyt?
1484-1507
: MovingWindowCallRatePolicy Component
This new policy introduces a rates array and associated matchers to define call limits over a moving window. Would it be beneficial to include more detailed examples or validations for the structure of individual rate entries for enhanced clarity?
1509-1525
: UnlimitedCallRatePolicy Component
The UnlimitedCallRatePolicy is straightforward and correctly captures a scenario where unlimited calls are permitted. Do you think it might be useful to include a note or an example in the description to remind users when this policy should be applied?
1527-1544
: Rate Definition
The Rate component, with its limit and interval fields, is an essential building block for rate limiting. Would providing a couple of inline examples of typical values for 'interval' (using ISO8601 duration) add even more clarity for consumers of this schema?
1545-1574
: HttpRequestRegexMatcher Component
This matcher component covers method, url_base, url_path_pattern, parameters, and headers, offering robust flexibility for matching outgoing HTTP requests. Would imposing some recommendations regarding which properties are most commonly required (or perhaps marking some as mandatory for typical use cases) help reduce misconfiguration, or do you prefer to keep it fully optional for maximum flexibility?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(6 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(12 hunks)airbyte_cdk/sources/declarative/requesters/http_requester.py
(3 hunks)airbyte_cdk/sources/streams/call_rate.py
(7 hunks)unit_tests/sources/streams/test_call_rate.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/requesters/http_requester.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Publish CDK version to PyPI
- GitHub Check: Publish SDM to DockerHub
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (8)
unit_tests/sources/streams/test_call_rate.py (1)
363-447
: Great test coverage for the new regex-based request matching!The test class is well-structured with comprehensive test cases covering all key features. Each test method is focused and includes both positive and negative scenarios.
airbyte_cdk/sources/streams/call_rate.py (2)
165-257
: LGTM! Well-structured implementation of regex-based request matching.The implementation is clean with good input normalization and case-insensitive handling. The docstrings are clear and comprehensive.
498-588
: Nice refactoring using dataclasses!The dataclass refactoring makes the code more concise and maintainable. The default values for optional fields are sensible.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
646-1871
: Great schema definitions for the new rate limiting components!The schema is well-documented with clear field descriptions, helpful examples, and proper type hints. The integration with the declarative source schema is clean.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
493-501
: LGTM! Clean import organization for API budget components.The imports are well-organized and grouped logically with other stream-related imports.
1943-1944
: LGTM! Clean integration of API budget into HttpRequester.The API budget is correctly passed through to the HttpRequester constructor.
Also applies to: 1965-1965
2995-3008
: Consider validating period format in FixedWindowCallRatePolicy.The period is parsed using
parse_duration
but there's no error handling if the format is invalid. Should we add validation and provide a clear error message? wdyt?
2968-2978
: Consider adding validation for empty policies list.The create_api_budget method accepts an empty policies list. Should we validate that at least one policy is provided? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
553-553
: Consider adding a more specific type annotation for_api_budget
.The current annotation
Optional[Union[APIBudget, HttpAPIBudget]]
could be more specific sinceHttpAPIBudget
is a subclass ofAPIBudget
. wdyt about using justOptional[APIBudget]
?
2989-2992
: Consider extracting header names and status codes as constants.These values are used as defaults and could be reused elsewhere. wdyt about defining them as class constants?
+ DEFAULT_RATELIMIT_RESET_HEADER = "ratelimit-reset" + DEFAULT_RATELIMIT_REMAINING_HEADER = "ratelimit-remaining" + DEFAULT_RATELIMIT_STATUS_CODES = (429,) def create_http_api_budget( self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any ) -> HttpAPIBudget: # ... return HttpAPIBudget( policies=policies, maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or 100000, - ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset", - ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining", - status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,), + ratelimit_reset_header=model.ratelimit_reset_header or self.DEFAULT_RATELIMIT_RESET_HEADER, + ratelimit_remaining_header=model.ratelimit_remaining_header or self.DEFAULT_RATELIMIT_REMAINING_HEADER, + status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or self.DEFAULT_RATELIMIT_STATUS_CODES, )
3053-3066
: Consider improving error handling and documentation in set_api_budget.A few suggestions to enhance this method:
- Add docstring explaining the purpose and parameters
- Consider more specific error messages for different failure cases
- Add validation for required fields in component_definition
wdyt about these improvements?
def set_api_budget(self, component_definition: ComponentDefinition, config: Config) -> None: + """Configure the API budget for the factory. + + Args: + component_definition: The component definition containing API budget configuration + config: The connector configuration + + Raises: + ValueError: If the API budget type is unknown or required fields are missing + """ + if not isinstance(component_definition, dict): + raise ValueError("Component definition must be a dictionary") + model_str = component_definition.get("type") + if not model_str: + raise ValueError("Component definition must specify a 'type'") + if model_str == "APIBudget": model_type: Union[Type[APIBudgetModel], Type[HTTPAPIBudgetModel]] = APIBudgetModel elif model_str == "HTTPAPIBudget": model_type = HTTPAPIBudgetModel else: - raise ValueError(f"Unknown API Budget type: {model_str}") + raise ValueError(f"Unsupported API Budget type: {model_str}. Must be one of: APIBudget, HTTPAPIBudget")
1943-1944
: Consider adding a comment explaining the api_budget assignment.The assignment of
self._api_budget
to a local variable could benefit from a brief comment explaining its purpose. wdyt?- api_budget = self._api_budget + # Use the configured API budget if one was set via set_api_budget + api_budget = self._api_budgetunit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)
3570-3647
: Great test coverage for the API budget functionality!The test effectively validates the integration of HTTPAPIBudget with HttpRequester, including rate limiting headers, status codes, and moving window policy configuration. Would you consider adding a few edge cases to make it even more robust? For example:
- Multiple rate limits with different intervals
- Invalid/missing header scenarios
- Different HTTP methods in matchers
wdyt?
3649-3725
: Well-structured test for fixed window rate limiting!The test thoroughly validates the FixedWindowCallRatePolicy configuration. Since this involves time-based behavior, would you consider adding tests with frozen time at different points to verify the policy's behavior around reset boundaries? For example:
- Just before reset time
- At reset time
- Just after reset time
This could help catch any timing-related edge cases. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(12 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
145-145
: LGTM! The import for MovingWindowCallRatePolicy is well-placed and used in the new test functions.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me but I have some questions to make sure I have all the context. I'll re-review the code once I have this context
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (14)
airbyte_cdk/sources/streams/call_rate.py (4)
171-269
: Consider validating URL path patterns and normalizing URL bases, wdyt?A few suggestions to make the matcher more robust:
- The URL path pattern could be validated during initialization to catch invalid regex patterns early.
- The URL base normalization could handle both leading and trailing slashes consistently.
def __init__( self, method: Optional[str] = None, url_base: Optional[str] = None, url_path_pattern: Optional[str] = None, params: Optional[Mapping[str, Any]] = None, headers: Optional[Mapping[str, Any]] = None, ): self._method = method.upper() if method else None - self._url_base = url_base.rstrip("/") if url_base else None + # Normalize URL base by stripping both leading and trailing slashes + self._url_base = url_base.strip("/") if url_base else None # Compile the URL path pattern if provided. + if url_path_pattern: + try: + self._url_path_pattern = re.compile(url_path_pattern) + except re.error as e: + raise ValueError(f"Invalid URL path pattern '{url_path_pattern}': {e}") + else: + self._url_path_pattern = None - self._url_path_pattern = re.compile(url_path_pattern) if url_path_pattern else None
164-168
: Consider standardizing the string representation format across classes, wdyt?The string representations could follow a more consistent pattern. For example:
HttpRequestMatcher
: Uses parentheses and comma-separated key-value pairsHttpRequestRegexMatcher
: Uses similar format but includes regex patternFixedWindowCallRatePolicy
: Uses different format with line breaksMovingWindowCallRatePolicy
: Uses yet another formatWould it be clearer to standardize these? Here's a suggestion for
FixedWindowCallRatePolicy
:- def __str__(self) -> str: - matcher_str = ", ".join(f"{matcher}" for matcher in self._matchers) - return ( - f"FixedWindowCallRatePolicy(call_limit={self._call_limit}, period={self._offset}, " - f"calls_used={self._calls_num}, next_reset={self._next_reset_ts}, " - f"matchers=[{matcher_str}])" - ) + def __str__(self) -> str: + return ( + f"FixedWindowCallRatePolicy(" + f"call_limit={self._call_limit}, " + f"period={self._offset}, " + f"calls_used={self._calls_num}, " + f"next_reset={self._next_reset_ts}, " + f"matchers=[{', '.join(str(m) for m in self._matchers)}])" + )Also applies to: 264-269, 369-375, 483-494
548-564
: Consider enhancing endpoint extraction with URL parsing and specific error handling, wdyt?The current implementation could be more robust:
- Use
urllib.parse
to handle URL components consistently- Add specific error handling for different failure cases
def _extract_endpoint(self, request: Any) -> str: """Extract the endpoint URL from the request if available.""" - endpoint = None try: # If the request is already a PreparedRequest, it should have a URL. if isinstance(request, requests.PreparedRequest): - endpoint = request.url + url = request.url # If it's a requests.Request, we call prepare() to extract the URL. elif isinstance(request, requests.Request): prepared = request.prepare() - endpoint = prepared.url - except Exception as e: - logger.debug(f"Error extracting endpoint: {e}") - if endpoint: - return endpoint + url = prepared.url + else: + return "unknown endpoint" + + # Parse the URL to extract just the path + parsed = parse.urlparse(url) + return parsed.path or "/" + except AttributeError as e: + logger.debug(f"Request object missing URL attribute: {e}") + except ValueError as e: + logger.debug(f"Invalid URL format: {e}") + except Exception as e: + logger.debug(f"Unexpected error extracting endpoint: {e}") return "unknown endpoint"
627-629
: Consider adding a comment explaining the negative duration check, wdyt?- # Ensure we never sleep for a negative duration. - time_to_wait = max(timedelta(0), time_to_wait) + # Ensure we never sleep for a negative duration, which could happen if the + # system clock changes or if there's clock skew between the client and server. + time_to_wait = max(timedelta(0), time_to_wait)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
646-661
: Consider adding validation and more examples for the Rate model, wdyt?The model could be more robust with:
- Validation for positive rate limits
- ISO duration format validation
- More descriptive examples
class Rate(BaseModel): class Config: extra = Extra.allow limit: int = Field( ..., description="The maximum number of calls allowed within the interval.", + gt=0, # Ensure positive rate limit title="Limit", + examples=[100, 1000, 5000], ) interval: str = Field( ..., description="The time interval for the rate limit.", - examples=["PT1H", "P1D"], + examples=["PT1H", "P1D", "PT30M", "PT15S"], + regex="^P(?!$)(\d+Y)?(\d+M)?(\d+W)?(\d+D)?(T(?=\d)(\d+H)?(\d+M)?(\d+S)?)?$", title="Interval", )
1630-1677
: Consider enhancing the policy models' documentation with examples and use cases, wdyt?The policy models could benefit from more detailed documentation:
- Example configurations for common scenarios
- Use case recommendations
- Comparison between different policies
class FixedWindowCallRatePolicy(BaseModel): class Config: extra = Extra.allow type: Literal["FixedWindowCallRatePolicy"] period: timedelta = Field( ..., - description="The time interval for the rate limit window.", + description="The time interval for the rate limit window. Best suited for APIs with fixed reset periods (e.g., hourly or daily limits).", title="Period", + examples=["PT1H", "P1D"], ) class MovingWindowCallRatePolicy(BaseModel): class Config: extra = Extra.allow type: Literal["MovingWindowCallRatePolicy"] rates: List[Rate] = Field( ..., - description="List of rates that define the call limits for different time intervals.", + description="List of rates that define the call limits for different time intervals. Ideal for APIs with rolling windows or multiple rate limits (e.g., 100 calls per minute AND 1000 calls per hour).", title="Rates", + examples=[ + [{"limit": 100, "interval": "PT1M"}, {"limit": 1000, "interval": "PT1H"}], + ], )
1810-1846
: Consider adding validation for the HTTPAPIBudget model, wdyt?The model could be more robust with:
- Header name validation
- Status code range validation
- Default values alignment
class HTTPAPIBudget(BaseModel): class Config: extra = Extra.allow type: Literal["HTTPAPIBudget"] policies: List[ Union[ FixedWindowCallRatePolicy, MovingWindowCallRatePolicy, UnlimitedCallRatePolicy, ] ] = Field( ..., description="List of call rate policies that define how many calls are allowed.", title="Policies", + min_items=1, # Ensure at least one policy is defined ) ratelimit_reset_header: Optional[str] = Field( "ratelimit-reset", description="The HTTP response header name that indicates when the rate limit resets.", title="Rate Limit Reset Header", + regex="^[A-Za-z0-9-]+$", # Ensure valid header name format ) status_codes_for_ratelimit_hit: Optional[List[int]] = Field( [429], description="List of HTTP status codes that indicate a rate limit has been hit.", title="Status Codes for Rate Limit Hit", + items={"ge": 100, "lt": 600}, # Ensure valid HTTP status code range ) maximum_attempts_to_acquire: Optional[int] = Field( 100000, description="The maximum number of attempts to acquire a call before giving up.", title="Maximum Attempts to Acquire", + gt=0, # Ensure positive number of attempts )airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
550-550
: Consider adding type annotation for_api_budget
.The instance variable could benefit from an explicit type annotation. wdyt about:
- self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None + self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None
2964-2978
: Consider adding validation for empty policies list.The
create_http_api_budget
method might want to validate that the policies list is not empty, as an API budget without policies may not be useful. wdyt about adding a check?def create_http_api_budget( self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any ) -> HttpAPIBudget: policies = [ self._create_component_from_model(model=policy, config=config) for policy in model.policies ] + if not policies: + raise ValueError("At least one policy must be provided for HTTPAPIBudget") return HttpAPIBudget( policies=policies, ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset", ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining", status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,), )
2979-2995
: Consider documenting the 10-day initial reset timestamp choice.In
create_fixed_window_call_rate_policy
, the initial reset timestamp is set to 10 days from now. It would be helpful to document why this specific duration was chosen. wdyt about adding a more detailed comment?- # Set the initial reset timestamp to 10 days from now. - # This value will be updated by the first request. + # Set the initial reset timestamp to 10 days from now to ensure the first request + # doesn't immediately hit rate limits. This is a safe default as the actual reset + # timestamp will be updated based on the API response headers after the first request.
3023-3028
: Consider validating the interval format.The
create_rate
method usesparse_duration
but doesn't handle potential parsing errors. wdyt about adding error handling?def create_rate(self, model: RateModel, config: Config, **kwargs: Any) -> Rate: + try: + interval = parse_duration(model.interval) + except ValueError as e: + raise ValueError(f"Invalid interval format in Rate: {e}") + return Rate( limit=model.limit, - interval=parse_duration(model.interval), + interval=interval, )
3040-3044
: Consider adding error handling for invalid component definitions.The
set_api_budget
method could benefit from explicit error handling for invalid component definitions. wdyt about:def set_api_budget(self, component_definition: ComponentDefinition, config: Config) -> None: + if not component_definition or "type" not in component_definition: + raise ValueError("Invalid API budget component definition") + self._api_budget = self.create_component( model_type=HTTPAPIBudgetModel, component_definition=component_definition, config=config )airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1412-1416
: Consider reducing the default maximum attempts.The default value of 100000 for
maximum_attempts_to_acquire
seems quite high and could lead to very long wait times. Would you consider reducing this to a more reasonable default like 1000 or even 100? Users can always override it if they need more attempts. wdyt?
1432-1435
: Align duration format with other duration fields.The
period
field uses a customformat: duration
while other duration fields in the schema use ISO 8601 duration format with examples like "PT1H" or "P1D". Should we align this for consistency? wdyt?- format: duration + examples: + - "PT1H" + - "P1D"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(6 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(11 hunks)airbyte_cdk/sources/streams/call_rate.py
(9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (8)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
229-231
: LGTM! The new imports are well organized and follow the existing pattern.The imports are logically grouped and maintain consistency with the existing codebase structure.
Also applies to: 241-243, 250-252, 307-309, 328-330, 374-376, 490-498
1939-1940
: LGTM! The API budget integration increate_http_requester
is clean and non-intrusive.The changes maintain backward compatibility by using the instance variable without modifying the existing logic.
Also applies to: 1961-1961
2996-3010
: LGTM! The moving window policy implementation is clean and well-structured.The method correctly creates rates and matchers from the model components.
3011-3022
: LGTM! The unlimited policy implementation is straightforward.The implementation is simple and focused, which is appropriate for an unlimited policy.
3029-3039
: LGTM! The request matcher implementation is well-structured.The method handles all necessary matching criteria (method, URL, params, headers) in a clean way.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
1513-1515
: Great documentation for URL components!The description clearly explains how to use
url_base
andurl_path_pattern
together, which helps prevent confusion about URL construction.
1490-1509
: Well-structured Rate definition!The Rate component is clean, focused, and includes good examples for the interval field. The required fields are appropriate, and the additionalProperties flag allows for future extensibility.
1373-1417
: Verify header names are case-insensitive.The default header names (
ratelimit-reset
andratelimit-remaining
) might need to handle different cases as HTTP headers are case-insensitive. Should we add a note about case-insensitive matching? wdyt?✅ Verification successful
🏁 Script executed:
#!/bin/bash # Search for existing rate limit header handling in the codebase rg -i "ratelimit.*header" -A 5Length of output: 7934
Confirm HTTP header case-insensitivity behavior
The code leverages the HTTP client's built‐in case‐insensitive header matching (e.g., Requests'
CaseInsensitiveDict
), so even if header names are defined in a particular case (like"ratelimit-reset"
), lookups will succeed regardless of the case in the actual HTTP response. We do see in unit tests that alternative casings (e.g.,"X-RateLimit-Reset"
) are correctly handled.Would you consider adding a clarifying note in the YAML description (and/or docs) to explicitly mention that header lookups are case-insensitive? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1430-1430
: 🛠️ Refactor suggestionRemove the format field as discussed.
Based on the past review comments, the
format: duration
field should be removed to align with other fields of the same type.- format: duration
🧹 Nitpick comments (8)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)
646-661
: Consider enhancing the Rate class documentation.The class looks good, but we could make it even better by adding more details to the documentation. What do you think about:
- Adding examples of interval formats in the field description?
- Clarifying if negative limits are allowed?
class Rate(BaseModel): class Config: extra = Extra.allow limit: int = Field( ..., description="The maximum number of calls allowed within the interval.", + examples=[100, 1000], title="Limit", ) interval: str = Field( ..., - description="The time interval for the rate limit.", + description="The time interval for the rate limit in ISO 8601 duration format (e.g., PT1H for 1 hour, P1D for 1 day).", examples=["PT1H", "P1D"], title="Interval", )
663-686
: Consider adding validation for URL patterns.The HttpRequestRegexMatcher looks good, but we could make it more robust. What do you think about adding some validation to ensure the URL patterns are valid regex expressions?
class HttpRequestRegexMatcher(BaseModel): class Config: extra = Extra.allow + @validator('url_path_pattern') + def validate_regex_pattern(cls, v): + if v is not None: + try: + re.compile(v) + except re.error as e: + raise ValueError(f"Invalid regex pattern: {e}") + return v method: Optional[str] = Field( None, description="The HTTP method to match (e.g., GET, POST).", title="Method" )
1630-1648
: Consider adding validation for period duration.The FixedWindowCallRatePolicy looks good, but we could add validation to ensure the period is a positive duration. What do you think?
class FixedWindowCallRatePolicy(BaseModel): class Config: extra = Extra.allow + @validator('period') + def validate_period(cls, v): + if v.total_seconds() <= 0: + raise ValueError("Period must be positive") + return v type: Literal["FixedWindowCallRatePolicy"] period: timedelta = Field( ..., description="The time interval for the rate limit window.", title="Period" )
1810-1842
: Consider adding default values for optional fields.The HTTPAPIBudget class looks good, but we could make it more user-friendly by providing sensible defaults for the optional fields. What do you think?
class HTTPAPIBudget(BaseModel): class Config: extra = Extra.allow type: Literal["HTTPAPIBudget"] policies: List[ Union[ FixedWindowCallRatePolicy, MovingWindowCallRatePolicy, UnlimitedCallRatePolicy, ] ] = Field( ..., description="List of call rate policies that define how many calls are allowed.", title="Policies", ) ratelimit_reset_header: Optional[str] = Field( - "ratelimit-reset", + "x-ratelimit-reset", # More commonly used header name description="The HTTP response header name that indicates when the rate limit resets.", title="Rate Limit Reset Header", ) ratelimit_remaining_header: Optional[str] = Field( - "ratelimit-remaining", + "x-ratelimit-remaining", # More commonly used header name description="The HTTP response header name that indicates the number of remaining allowed calls.", title="Rate Limit Remaining Header", ) status_codes_for_ratelimit_hit: Optional[List[int]] = Field( - [429], + [429, 403], # Some APIs use 403 for rate limiting description="List of HTTP status codes that indicate a rate limit has been hit.", title="Status Codes for Rate Limit Hit", )airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)
1373-1412
: Consider adding examples for header names, wdyt?The HTTPAPIBudget component looks good with clear description and required fields. However, it might be helpful to add examples for
ratelimit_reset_header
andratelimit_remaining_header
to show common header names used by different APIs.ratelimit_reset_header: title: Rate Limit Reset Header description: The HTTP response header name that indicates when the rate limit resets. type: string default: "ratelimit-reset" + examples: + - "x-ratelimit-reset" + - "retry-after" + - "x-rate-limit-reset" ratelimit_remaining_header: title: Rate Limit Remaining Header description: The HTTP response header name that indicates the number of remaining allowed calls. type: string default: "ratelimit-remaining" + examples: + - "x-ratelimit-remaining" + - "x-rate-limit-remaining"
1413-1442
: Consider adding examples for the period field, wdyt?The FixedWindowCallRatePolicy component is well-structured, but it would be helpful to add examples for the
period
field to demonstrate valid duration formats.period: title: Period description: The time interval for the rate limit window. type: string format: duration + examples: + - "PT1H" # 1 hour + - "P1D" # 1 day + - "PT30M" # 30 minutes
1485-1504
: Consider adding validation for positive integers, wdyt?The Rate component looks good, but consider adding validation to ensure the limit is a positive integer.
limit: title: Limit description: The maximum number of calls allowed within the interval. type: integer + minimum: 1
1505-1535
: Consider adding examples for URL patterns, wdyt?The HttpRequestRegexMatcher component is well-defined, but it would be helpful to add examples for the
url_path_pattern
field to demonstrate common regex patterns.url_path_pattern: title: URL Path Pattern description: A regular expression pattern to match the URL path. type: string + examples: + - "^/v1/users/\\d+$" # Matches paths like /v1/users/123 + - "^/v2/orders/[a-zA-Z0-9-]+$" # Matches paths like /v2/orders/order-123
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1876-1880
: LGTM! The api_budget field is well documented.The addition of the api_budget field to DeclarativeSource1 and DeclarativeSource2 is clear and consistent.
Also applies to: 1907-1911
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
43-48
: LGTM! The newapi_budget
property is well-defined.The property is properly integrated into the root schema with clear description and appropriate references to the new components. The description effectively communicates that this applies across all streams.
@artem1205 I’ve added debug logs with request details and policies, as well as documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)
3570-3648
: LGTM! Well-structured test for MovingWindowCallRatePolicy.The test thoroughly validates the API budget configuration with MovingWindowCallRatePolicy. I particularly like how you test both the high-level budget configuration and the specific policy details.
Would you consider adding a test case for multiple rate limits in the same policy (e.g., different limits for different time intervals)? This could help validate more complex rate limiting scenarios, wdyt?
3649-3719
: LGTM! Comprehensive test for FixedWindowCallRatePolicy.The test effectively validates the fixed window policy configuration and its integration with HttpRequester. Good job on testing the period conversion from ISO 8601 duration to seconds.
Consider adding assertions for error cases, like invalid period formats or negative call limits? This would help ensure robust error handling, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (14)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
646-661
: Consider adding interval validation and documentation.The
interval
field accepts ISO 8601 duration strings but lacks validation. Would you consider:
- Adding a validator to ensure the string is a valid ISO 8601 duration?
- Including example durations in the field description?
interval: str = Field( ..., description="The time interval for the rate limit in ISO 8601 duration format (e.g., PT1H for 1 hour, P1D for 1 day).", examples=["PT1H", "P1D", "PT30M"], title="Interval", )
1667-1677
: Consider enhancing documentation for unlimited policy.The
UnlimitedCallRatePolicy
could benefit from clearer documentation about its use cases. Would you consider adding a note explaining when to use this policy (e.g., for endpoints that don't have rate limits)?type: Literal["UnlimitedCallRatePolicy"] matchers: List[HttpRequestRegexMatcher] = Field( ..., description="List of matchers that define which requests this policy applies to. Use this policy for endpoints that don't have rate limits or when you want to explicitly mark endpoints as unlimited.", title="Matchers", )
1810-1841
: Consider expanding rate limit headers and status codes.The implementation looks solid, but would you consider:
- Adding more common header variations (e.g.,
X-RateLimit-Reset
,X-RateLimit-Remaining
)?- Including additional common rate limit status codes (e.g., 403, 503)?
ratelimit_reset_header: Optional[str] = Field( "ratelimit-reset", description="The HTTP response header name that indicates when the rate limit resets. Common variations: ratelimit-reset, x-ratelimit-reset", title="Rate Limit Reset Header", ) status_codes_for_ratelimit_hit: Optional[List[int]] = Field( [429, 403, 503], description="List of HTTP status codes that indicate a rate limit has been hit. Common codes: 429 (Too Many Requests), 403 (Forbidden), 503 (Service Unavailable)", title="Status Codes for Rate Limit Hit", )airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
550-550
: Consider adding type annotation for the instance variable.wdyt about adding a type annotation to make the code more maintainable?
- self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None + self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None # type: ignore[assignment]
2981-2994
: Consider adding validation for required fields.The code provides default values for optional fields, but wdyt about adding validation to ensure that required fields like
policies
are not None?def create_http_api_budget( self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any ) -> HttpAPIBudget: + if not model.policies: + raise ValueError("policies is required for HTTPAPIBudget") + policies = [ self._create_component_from_model(model=policy, config=config) for policy in model.policies ] return HttpAPIBudget( policies=policies, ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset", ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining", status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,), )
2996-3011
: Consider documenting the initial reset timestamp logic.The code sets an initial reset timestamp 10 days in the future. wdyt about adding a comment explaining why this specific value was chosen and that it will be updated by the first request?
def create_fixed_window_call_rate_policy( self, model: FixedWindowCallRatePolicyModel, config: Config, **kwargs: Any ) -> FixedWindowCallRatePolicy: matchers = [ self._create_component_from_model(model=matcher, config=config) for matcher in model.matchers ] - # Set the initial reset timestamp to 10 days from now. - # This value will be updated by the first request. + # Set a far-future initial reset timestamp (10 days) to prevent immediate rate limiting. + # This is a temporary value that will be updated with the actual server-provided + # reset timestamp after the first request. return FixedWindowCallRatePolicy( next_reset_ts=datetime.datetime.now() + datetime.timedelta(days=10), period=model.period, call_limit=model.call_limit, matchers=matchers, )
3040-3044
: Consider validating duration format.wdyt about adding validation to ensure the interval string is in a valid duration format before parsing?
def create_rate(self, model: RateModel, config: Config, **kwargs: Any) -> Rate: + try: + interval = parse_duration(model.interval) + except ValueError as e: + raise ValueError(f"Invalid interval format: {model.interval}. Expected ISO 8601 duration format.") from e + return Rate( limit=model.limit, - interval=parse_duration(model.interval), + interval=interval, )
3046-3055
: Consider adding regex pattern validation.wdyt about validating the regex patterns in
url_path_pattern
to ensure they are valid before creating the matcher?def create_http_request_matcher( self, model: HttpRequestRegexMatcherModel, config: Config, **kwargs: Any ) -> HttpRequestRegexMatcher: + try: + if model.url_path_pattern: + re.compile(model.url_path_pattern) + except re.error as e: + raise ValueError(f"Invalid URL path pattern: {model.url_path_pattern}. Error: {str(e)}") + return HttpRequestRegexMatcher( method=model.method, url_base=model.url_base, url_path_pattern=model.url_path_pattern, params=model.params, headers=model.headers, )unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)
3689-3768
: LGTM! Well-structured test for MovingWindowCallRatePolicy.The test case provides good coverage of the basic functionality. What do you think about adding a few edge cases to make it more robust? For example:
- Invalid rate limit headers
- Multiple rate policies
- Invalid status codes
3769-3840
: LGTM! Comprehensive test for FixedWindowCallRatePolicy.The test case thoroughly validates the fixed window policy configuration. Consider adding tests for:
- Invalid period format
- Zero or negative call limits
- Invalid regex patterns in matchers
wdyt?airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)
1370-1411
: Great implementation of the HTTPAPIBudget component!The component is well-structured with clear scope limitations and flexible configuration options. The description clearly states that only requests using
HttpRequester
are rate-limited.One small suggestion: Would you consider adding an example configuration in the description to help users get started quickly? wdyt?
1429-1429
: Consider removing the format field or updating its documentation.The
format: duration
field seems to be a remnant from a previous version, as it's not documented in the description. Should we remove it or add documentation about the expected duration format? wdyt?
1484-1503
: Well-structured Rate component with room for more documentation.The component is clean and focused. Would you consider enhancing the interval description to explain the expected ISO8601 duration format (like the examples show)? This would help users understand valid values for the interval field. wdyt?
1504-1534
: Great implementation of the request matcher!The matcher provides comprehensive capabilities for matching HTTP requests. To make it even more user-friendly, would you consider adding example regex patterns for the
url_path_pattern
field? This could help users understand how to write effective patterns. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(6 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(11 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (11)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)
663-686
: LGTM! Comprehensive request matching capabilities.The
HttpRequestRegexMatcher
class provides flexible matching through method, URL, parameters, and headers. The implementation looks solid and well-documented.
1630-1648
: LGTM! Well-structured fixed window rate limiting.The
FixedWindowCallRatePolicy
implementation is clean and provides a straightforward way to implement fixed window rate limiting.
1650-1665
: LGTM! Flexible moving window rate limiting.The
MovingWindowCallRatePolicy
allows for multiple rates with different intervals, providing great flexibility for complex rate limiting scenarios.
1876-1876
: LGTM! Clean integration of API budget.The addition of the optional
api_budget
field toDeclarativeSource
classes is clean and maintains backward compatibility.Also applies to: 1903-1903
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
229-231
: LGTM! Clean import organization.The new imports are well-organized and follow the existing pattern of grouping related imports together.
Also applies to: 241-243, 250-252, 307-309, 328-330, 374-376, 490-498
1956-1957
: LGTM! Clean integration with HttpRequester.The API budget is correctly passed to the HttpRequester constructor.
Also applies to: 1978-1978
3013-3026
: LGTM! Clean implementation of MovingWindowCallRatePolicy.The implementation correctly creates rates and matchers from the model.
3028-3038
: LGTM! Clean implementation of UnlimitedCallRatePolicy.The implementation correctly creates matchers from the model.
3057-3060
: LGTM! Clean implementation of set_api_budget.The method correctly creates and sets the API budget component.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
43-44
: LGTM! Good addition of the API budget property.The new property is well-integrated into the schema structure and correctly references the HTTPAPIBudget definition.
1412-1483
: Excellent design of the rate limiting policies!The three policy types (Fixed Window, Moving Window, and Unlimited) provide a comprehensive solution for different rate limiting scenarios. The consistent use of matchers across all policies makes the implementation flexible and maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small things for maintenance reasons but the behavior makes sense to me
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
6df08c2
to
09ad1a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
airbyte_cdk/sources/streams/call_rate.py (2)
547-563
: Consider enhancing the endpoint extraction method.The method handles different request types well, but could be improved. What do you think about:
- Adding type hints for better code maintainability?
- Using more specific exception handling instead of catching all exceptions?
- def _extract_endpoint(self, request: Any) -> str: + def _extract_endpoint(self, request: Union[requests.Request, requests.PreparedRequest]) -> str: """Extract the endpoint URL from the request if available.""" endpoint = None try: - # If the request is already a PreparedRequest, it should have a URL. if isinstance(request, requests.PreparedRequest): endpoint = request.url - # If it's a requests.Request, we call prepare() to extract the URL. elif isinstance(request, requests.Request): prepared = request.prepare() endpoint = prepared.url - except Exception as e: + except (AttributeError, TypeError) as e: logger.debug(f"Error extracting endpoint: {e}") if endpoint: return endpoint return "unknown endpoint"
170-262
: Consider enhancing the HttpRequestRegexMatcher implementation.The implementation is solid, but could be even better. What do you think about:
- Adding input validation for the regex pattern to catch invalid patterns early?
- Caching the compiled regex pattern results for better performance when the same pattern is used multiple times?
def __init__( self, method: Optional[str] = None, url_base: Optional[str] = None, url_path_pattern: Optional[str] = None, params: Optional[Mapping[str, Any]] = None, headers: Optional[Mapping[str, Any]] = None, ): + # Validate regex pattern + if url_path_pattern: + try: + re.compile(url_path_pattern) + except re.error as e: + raise ValueError(f"Invalid regex pattern: {e}") self._method = method.upper() if method else None self._url_base = url_base.rstrip("/") if url_base else None - self._url_path_pattern = re.compile(url_path_pattern) if url_path_pattern else None + # Cache compiled pattern + self._url_path_pattern_str = url_path_pattern + self._url_path_pattern = re.compile(url_path_pattern) if url_path_pattern else None self._params = {str(k): str(v) for k, v in (params or {}).items()} self._headers = {str(k).lower(): str(v) for k, v in (headers or {}).items()}airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
550-550
: Consider explicitly initializing_api_budget
to None.For better readability and to follow Python best practices, wdyt about explicitly initializing the
_api_budget
variable to None?- self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None + self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None
2964-2977
: Consider handling None values more gracefully.The
create_http_api_budget
method could handle None values more gracefully. wdyt about using theor
operator for default values?- policies=policies, - ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset", - ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining", - status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or [429], + policies=policies or [], + ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset", + ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining", + status_codes_for_ratelimit_hit=tuple(model.status_codes_for_ratelimit_hit or [429]),
2987-2994
: Consider making the initial reset timestamp configurable.The hardcoded 10-day initial reset timestamp might not be suitable for all use cases. wdyt about making it configurable through the model?
+ initial_reset_period = parse_duration(model.initial_reset_period) if model.initial_reset_period else datetime.timedelta(days=10) return FixedWindowCallRatePolicy( - next_reset_ts=datetime.datetime.now() + datetime.timedelta(days=10), + next_reset_ts=datetime.datetime.now() + initial_reset_period, period=parse_duration(model.period), call_limit=model.call_limit, matchers=matchers, )
1939-1940
: Consider adding a docstring explaining API Budget usage.For better maintainability, wdyt about adding a comment explaining how the API Budget is used in the HttpRequester?
+ # Use the configured API Budget for rate limiting if available api_budget = self._api_budget
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
1370-1411
: LGTM! Well-structured API budget component with clear documentation.The HTTPAPIBudget component is well-designed with configurable headers and status codes. The documentation clearly states the scope limitation to HttpRequester.
One suggestion: Would you consider adding examples in the description to show how the rate limit headers might look in actual API responses? This could help users map their API's response headers correctly, wdyt?
1412-1502
: The rate limiting policies are well-structured but could use example configurations.The three policy types (Fixed Window, Moving Window, and Unlimited) provide good flexibility for different rate limiting scenarios. The Rate component nicely encapsulates the limit and interval concepts.
Would you consider adding example configurations for each policy type to show:
- How to set up a "100 calls per hour" fixed window limit
- How to configure multiple rates for moving window (e.g., "1000 per day AND 100 per hour")
- A typical use case for unlimited policy (e.g., health check endpoints)
This would help users choose and configure the right policy for their needs, wdyt?
1503-1533
: The HTTP request matcher provides flexible matching but could use pattern examples.The HttpRequestRegexMatcher component allows comprehensive request matching. Would you consider enhancing the documentation with:
- Example regex patterns for common URL path matching scenarios
- Best practices for writing efficient regex patterns
- Examples showing how to combine multiple matching criteria
This would help users write correct and maintainable matchers, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(5 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(11 hunks)airbyte_cdk/sources/streams/call_rate.py
(9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (6)
airbyte_cdk/sources/streams/call_rate.py (2)
584-586
: Great addition of debug logs!The debug logs provide valuable context about endpoints and policies, which will help with troubleshooting rate limit issues.
Also applies to: 628-631, 634-637, 641-644
163-167
: Well-structured string representations!The
__str__
methods provide comprehensive information about the objects, which is very helpful for debugging and logging. The consistent format across different classes is a nice touch.Also applies to: 263-268, 368-374, 482-493
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
645-660
: Well-structured schema models for rate limiting!The new models for rate limiting and request matching are well-designed:
- Clear field descriptions and examples
- Consistent use of Field annotations
- Good type hints and validation
- Follows the existing patterns in the file
Also applies to: 662-685, 1629-1647, 1649-1664, 1666-1676, 1809-1840
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
229-231
: LGTM! Well-organized imports.The new imports for API Budget related classes are properly organized alphabetically and follow the existing pattern.
Also applies to: 241-243, 250-252, 307-309, 328-330, 374-376, 490-498
1961-1961
: LGTM! Clean integration.The API Budget is cleanly integrated into the HttpRequester creation.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
43-44
: LGTM! Good addition of the API budget configuration at the root level.The api_budget property is well-integrated into the schema structure, allowing for global API budget configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/streams/call_rate.py (3)
171-269
: Consider adding input validation for regex pattern.The implementation looks solid! However, what do you think about adding validation for the url_path_pattern to catch invalid regex patterns early? This could prevent runtime errors.
def __init__( self, method: Optional[str] = None, url_base: Optional[str] = None, url_path_pattern: Optional[str] = None, params: Optional[Mapping[str, Any]] = None, headers: Optional[Mapping[str, Any]] = None, ): + if url_path_pattern: + try: + re.compile(url_path_pattern) + except re.error as e: + raise ValueError(f"Invalid regex pattern '{url_path_pattern}': {e}")
548-564
: Consider enhancing endpoint extraction robustness.The endpoint extraction looks good but could be more robust. What do you think about:
- Handling URL parsing errors explicitly?
- Extracting just the path for cleaner logs?
def _extract_endpoint(self, request: Any) -> str: """Extract the endpoint URL from the request if available.""" endpoint = None try: if isinstance(request, requests.PreparedRequest): - endpoint = request.url + parsed = parse.urlparse(request.url) + endpoint = parsed.path elif isinstance(request, requests.Request): prepared = request.prepare() - endpoint = prepared.url + parsed = parse.urlparse(prepared.url) + endpoint = parsed.path - except Exception as e: + except (AttributeError, ValueError) as e: logger.debug(f"Error extracting endpoint: {e}") if endpoint: return endpoint return "unknown endpoint"
656-656
: Consider validating status codes.What do you think about validating that the status codes are within the valid HTTP range (100-599)? This could prevent configuration errors.
def __init__( self, ratelimit_reset_header: str = "ratelimit-reset", ratelimit_remaining_header: str = "ratelimit-remaining", status_codes_for_ratelimit_hit: list[int] = [429], **kwargs: Any, ): + if not all(100 <= code <= 599 for code in status_codes_for_ratelimit_hit): + raise ValueError("HTTP status codes must be between 100 and 599") self._ratelimit_reset_header = ratelimit_reset_header self._ratelimit_remaining_header = ratelimit_remaining_header self._status_codes_for_ratelimit_hit = status_codes_for_ratelimit_hit
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/streams/call_rate.py
(10 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte_cdk/sources/streams/call_rate.py (2)
9-9
: LGTM! Good call on configuring the pyrate_limiter logger.Setting the pyrate_limiter logger to WARNING level helps reduce noise in the logs.
Also applies to: 29-29
164-168
: Great job on the consistent string representations!The str methods provide clear, consistent, and informative output that will be valuable for debugging and logging.
Also applies to: 369-375, 483-494
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
airbyte_cdk/sources/streams/call_rate.py (5)
119-135
: Consider caching the regex pattern.Since
re.escape(url_path)
is called during initialization and the pattern is immutable, we could cache the compiled pattern. What do you think about this optimization?if url: parsed_url = parse.urlsplit(url) url_base = f"{parsed_url.scheme}://{parsed_url.netloc}" - url_path = parsed_url.path if parsed_url.path != "/" else None + url_path_pattern = re.escape(parsed_url.path) if parsed_url.path != "/" else None else: url_base = None - url_path = None + url_path_pattern = None # Use HttpRequestRegexMatcher under the hood self._regex_matcher = HttpRequestRegexMatcher( method=method, url_base=url_base, - url_path_pattern=re.escape(url_path) if url_path else None, + url_path_pattern=url_path_pattern, params=params, headers=headers, )
201-208
: Consider adding type hints for the request parameter.The error handling is good, but type hints would make it clearer what types are supported. wdyt?
- def __call__(self, request: Any) -> bool: + def __call__(self, request: Union[requests.Request, requests.PreparedRequest]) -> bool:
464-475
: Consider using a list comprehension for matchers.The string representation could be more concise. What do you think about this optimization?
- matcher_str = ", ".join(f"{matcher}" for matcher in self._matchers) + matcher_str = ", ".join(map(str, self._matchers))
529-545
: Consider using a more specific exception type.The broad Exception catch could mask important errors. Would it make sense to catch specific exceptions like AttributeError or ValueError?
- except Exception as e: + except (AttributeError, ValueError) as e:
637-638
: Consider validating status codes.HTTP status codes should be between 100 and 599. What do you think about adding validation for the input list?
def __init__( self, ratelimit_reset_header: str = "ratelimit-reset", ratelimit_remaining_header: str = "ratelimit-remaining", status_codes_for_ratelimit_hit: list[int] = [429], **kwargs: Any, ): + if not all(100 <= code <= 599 for code in status_codes_for_ratelimit_hit): + raise ValueError("All status codes must be valid HTTP status codes (100-599)") self._ratelimit_reset_header = ratelimit_reset_header
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/streams/call_rate.py
(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/streams/call_rate.py (1)
9-9
: LGTM! Good logging setup.Setting pyrate_limiter's log level to WARNING helps reduce noise in the logs while still capturing important information.
Also applies to: 29-29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a speedrun review but I'm good with what I saw for the refactor of the HTTP matcher. Thanks for the refactor! I'm sure it'll be easier to manage.
Summary by CodeRabbit