Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(low-code): Add API Budget #314

Merged
merged 16 commits into from
Feb 12, 2025
Merged

feat(low-code): Add API Budget #314

merged 16 commits into from
Feb 12, 2025

Conversation

tolik0
Copy link
Contributor

@tolik0 tolik0 commented Feb 5, 2025

Summary by CodeRabbit

  • New Features
    • Introduced enhanced API request management that lets you configure precise call limits and usage controls.
    • Added several rate limiting strategies, including options for fixed time windows, moving windows, and unrestricted requests.
    • Enabled customizable matching criteria to apply rate controls based on request details, ensuring better adherence to API usage policies.
    • Integrated API budget management directly into the request handling process.
    • Enhanced functionality for managing API budgets through new components and policies.
    • Added support for HTTP-specific rate limiting features, enhancing flexibility in managing API requests.
    • Introduced a new class for complex HTTP request matching using regex patterns.
    • Added unit tests to validate the new API budget management features and HTTP request matching functionality.
    • Validated API budget integration with HTTP requester components through new test cases.

@github-actions github-actions bot added the enhancement New feature or request label Feb 5, 2025
Copy link
Contributor

coderabbitai bot commented Feb 5, 2025

📝 Walkthrough

Walkthrough

This pull request introduces enhancements for managing API rate limiting and request matching. New schema definitions and model classes—such as APIBudget, HTTPAPIBudget, FixedWindowCallRatePolicy, MovingWindowCallRatePolicy, UnlimitedCallRatePolicy, Rate, and HttpRequestRegexMatcher—are added to the declarative component schema. The changes also update the HttpRequester class to include an optional api_budget attribute, which is integrated into the HTTP client initialization. Additionally, new methods have been added to the component factory to support constructing these new components.

Changes

File(s) Change Summary
airbyte_cdk/.../declarative_component_schema.yaml Added new schema definitions for APIBudget, HTTPAPIBudget, FixedWindowCallRatePolicy, MovingWindowCallRatePolicy, UnlimitedCallRatePolicy, Rate, and HttpRequestRegexMatcher; added api_budget property to DeclarativeSource.
airbyte_cdk/.../declarative_component_schema.py Added new classes: Rate, HttpRequestRegexMatcher, APIBudget, FixedWindowCallRatePolicy, MovingWindowCallRatePolicy, UnlimitedCallRatePolicy; extended HttpRequester with an optional api_budget field.
airbyte_cdk/.../model_to_component_factory.py Introduced creation methods: create_http_api_budget, create_fixed_window_call_rate_policy, create_moving_window_call_rate_policy, create_unlimited_call_rate_policy, create_rate, and create_http_request_matcher.
airbyte_cdk/.../http_requester.py Updated HttpRequester: added optional api_budget attribute and passed this attribute to the HttpClient during initialization.
airbyte_cdk/.../manifest_declarative_source.py Enhanced streams method to retrieve and set api_budget from _source_config.
airbyte_cdk/.../call_rate.py Added HttpRequestRegexMatcher class; updated APIBudget class with new logging methods and enhanced constructors.
unit_tests/.../test_call_rate.py Added TestHttpRequestRegexMatcher class with unit tests for HTTP request matching functionality.
unit_tests/.../test_model_to_component_factory.py Added test_api_budget() and test_api_budget_fixed_window_policy() to validate API budget integration with HTTP requester.
unit_tests/.../test_http_requester.py Updated http_requester_factory to include api_budget; added test_http_requester_with_mock_apibudget to test budget management logic.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant Factory as ModelToComponentFactory
    participant APIB as APIBudget
    participant Req as HttpRequester
    participant Client as HttpClient

    Dev->>Factory: Invoke creation methods (create_http_api_budget, etc.)
    Factory->>APIB: Instantiate APIBudget with rate policies
    Factory->>Req: Inject APIBudget into HttpRequester
    Req->>Client: Initialize HttpClient with api_budget parameter
Loading

Would you like to see additional details or adjustments to the sequence diagram, wdyt?

Possibly related PRs

Suggested labels

oauth

Suggested reviewers

  • artem1205
  • maxi297
  • bnchrch

If you have any preferences for additional reviewers or changes, just let me know!

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)

6-6: Would you like to confirm the necessity of this import?
It's great that datetime and timedelta are explicitly imported for the new rate-limiting features. However, if they're only used sparingly, perhaps consider consolidating related imports or confirming there's no overlap with existing standard library imports to keep the code tidy. wdyt?


660-674: Consider stronger typing for method.
Right now, method is a free-form string. Is it worth enforcing, say, an HttpMethod enum to reduce potential typos or mismatches when matching requests? wdyt?


1654-1664: Unlimited policy name alignment.
For an “unlimited” policy, do we still track calls or simply skip rate limiting entirely? Maybe clarify or log usage to avoid confusion. wdyt?


1797-1836: Consider lowering the maximum attempts.
Having maximum_attempts_to_acquire set to 100000 might risk excessive retries if the API limit is far lower. Do you think a smaller default (e.g., 100 or 1000) would be safer? wdyt?

airbyte_cdk/sources/declarative/requesters/http_requester.py (2)

25-25: Validate the need for this import.
The APIBudget import is a neat addition. If the scenario for rate-limit budgeting is common, great! Otherwise, consider lazy imports or clarifying usage. wdyt?


59-59: Add docstrings for clarity?
Introducing api_budget is helpful, but it might benefit from an inline docstring or comment explaining how it's typically configured or used. wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (5)

1368-1410: APIBudget Component Addition: Is This the Final Form?
The new APIBudget component looks comprehensive—it clearly defines the available properties such as policies, header names, status codes, and even a high default for maximum_attempts_to_acquire. I'm wondering, is the default value (100000) for maximum_attempts_to_acquire intentional? Also, would you consider providing an example or additional constraints for the policies array to help guide users? wdyt?


1446-1470: Moving Window Call Rate Policy: Flexibility is Key!
This component neatly captures the idea of a moving window rate limit through its rates and matchers. Would you consider imposing any constraints or suggesting a default for the rates array to help prevent misconfiguration, or is keeping it fully flexible the intent here? wdyt?


1471-1487: Unlimited Call Rate Policy: Keeping It Open!
The UnlimitedCallRatePolicy is straightforward and allows for unrestricted calls for specific requests. Do you feel that we should restrict additional properties for tighter validation, or is the open-ended design intentional to provide extra flexibility? wdyt?


1506-1529: HTTP Request Matcher: Versatile Matcher!
The HttpRequestMatcher is well-structured and flexible, covering method, URL, parameters, and headers. One thought—would it be a good idea to enforce a URI format on the url property (or add a note about what format is expected) to help catch misconfigurations early? wdyt?


1802-1805: HttpRequester API Budget Integration: Nice Touch!
Integrating the new api_budget property into the HttpRequester component by referencing the APIBudget definition looks like a smooth enhancement. Might it be helpful to add an example or extra documentation to illustrate how api_budget interacts with the HTTP requester and other related components? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca68c5c and 707a6c6.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (15 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py (3 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

[error] 847-847: Argument 2 to 'LegacyToPerPartitionStateMigration' has incompatible type 'CustomIncrementalSync | DatetimeBasedCursor | None'; expected 'CustomIncrementalSync | DatetimeBasedCursor'


[error] 1146-1146: Argument 2 to 'ClampingEndProvider' has incompatible type 'Callable[[], datetime]'; expected 'Callable[[], CursorValueType]'


[error] 1164-1164: Argument 2 to 'ClampingEndProvider' has incompatible type 'Callable[[], datetime]'; expected 'Callable[[], CursorValueType]'


[error] 1172-1172: Argument 2 to 'ClampingEndProvider' has incompatible type 'Callable[[], datetime]'; expected 'Callable[[], CursorValueType]'


[error] 1190-1190: Argument 'start' to 'ConcurrentCursor' has incompatible type 'datetime'; expected 'CursorValueType | None'


[error] 1192-1192: Argument 'end_provider' to 'ConcurrentCursor' has incompatible type 'Callable[[], datetime]'; expected 'Callable[[], CursorValueType]'


[error] 2977-2977: Item 'None' of 'list[FixedWindowCallRatePolicy | MovingWindowCallRatePolicy | UnlimitedCallRatePolicy] | None' has no attribute 'iter' (not iterable)


[error] 2982-2982: Argument 'ratelimit_reset_header' to 'HttpAPIBudget' has incompatible type 'str | None'; expected 'str'


[error] 2983-2983: Argument 'ratelimit_remaining_header' to 'HttpAPIBudget' has incompatible type 'str | None'; expected 'str'


[error] 2984-2984: Argument 'status_codes_for_ratelimit_hit' to 'HttpAPIBudget' has incompatible type 'list[int] | None'; expected 'tuple[int]'

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (13)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)

646-658: Should we add validation for non-positive values?
Currently, there's no check to ensure limit is a positive integer or interval is non-zero. Would adding a simple constraint or Pydantic validator help prevent misconfiguration issues? wdyt?


1612-1635: Handling timezone and negative values.
next_reset_ts and period could introduce edge cases if they use local times or negative intervals. Maybe add a note or validator to ensure these are positive and timezone-aware. wdyt?


1637-1652: Validate intervals in rates.
If any Rate has an invalid or zero interval, the moving window might fail. Would you like to add checks ensuring intervals are valid durations? wdyt?


2107-2111: Verify synergy with existing rate limit handling.
Adding api_budget is excellent for budget-based rate limiting. Are we certain it doesn't overlap with existing error-handling or backoff logic? A quick check for consistency or consolidated handling might help. wdyt?

airbyte_cdk/sources/declarative/requesters/http_requester.py (1)

96-96: Double-check integrated error handling.
We’re now passing api_budget into HttpClient. Are we sure HttpClient coordinates gracefully with the existing error handler to handle rate limits cohesively? Maybe cross-check or comment on interactions. wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6)

115-117: LGTM! Clean import organization.

The new imports are well-organized and follow the existing pattern of grouping related models and components together.

Also applies to: 232-234, 250-252, 307-309, 328-330, 374-376, 490-497


636-641: LGTM! Factory mapping is properly updated.

The new components are correctly mapped to their respective factory methods in the PYDANTIC_MODEL_TO_CONSTRUCTOR dictionary.


1952-1956: LGTM! Clean HttpRequester integration.

The API budget integration into HttpRequester is clean and maintains optional behavior.

Also applies to: 1978-1978


2984-3024: LGTM! Clean implementation of rate policy factories.

The implementations for create_fixed_window_call_rate_policy, create_moving_window_call_rate_policy, and create_unlimited_call_rate_policy are well-structured and follow the factory pattern consistently.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 2984-2984: Argument 'status_codes_for_ratelimit_hit' to 'HttpAPIBudget' has incompatible type 'list[int] | None'; expected 'tuple[int]'


3025-3030: LGTM! Simple Rate component factory.

The create_rate implementation is clean and straightforward.


3031-3039: LGTM! Clean HttpRequestMatcher factory.

The create_http_request_matcher implementation is clean and properly handles all the matcher fields.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1410-1445: Fixed Window Call Rate Policy: Looks Solid!
The FixedWindowCallRatePolicy is clearly defined with the required properties (next_reset_ts, period, call_limit, and matchers). Do you think it might be beneficial to enforce or validate the duration format for the period property for consistency across the schema? wdyt?


1488-1505: Rate Component: Clear and Concise!
This component succinctly defines a rate limit with a numeric limit and an interval (formatted as a duration). Would it be useful to add extra validation to ensure that the interval value adheres to the expected ISO8601 duration format, or is that handled elsewhere in the system? wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (18)
airbyte_cdk/sources/streams/call_rate.py (4)

103-110: Expanded docstring for HttpRequestMatcher
This descriptive docstring is quite helpful. Would you consider adding an example use-case in the docstring to make adoption even simpler? wdyt?


121-126: Docstring clarification for constructor parameters
Great job documenting param usage. Might it be clearer to mention that headers are case-insensitively matched in the param doc directly? wdyt?


164-184: Regex match and query/header dict checks
Everything seems consistent. The dictionary subset check with _match_dict is a neat approach. Consider logging a debug line if matches fail for easy debugging. wdyt?


435-443: New APIBudget dataclass usage
This refactor to a dataclass clarifies initialization. It might be beneficial to specify defaults for “policies” if empty. wdyt?

airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)

140-143: New logic for setting api_budget
Adding the api_budget integration is a valuable enhancement. Could be good to add a small debug log when the budget is set to help track usage. wdyt?

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

1806-1828: New APIBudget class
This is a solid approach to grouping multiple policies. Maybe a minimal usage example in docstring (like how to define it in the manifest) could help. wdyt?


1900-1904: api_budget field in DeclarativeSource
This top-level field is a nice injection point for rate-limiting. Ensuring no conflicts if multiple budgets are provided might be worth documenting. Any plan for that? wdyt?

Also applies to: 1931-1935

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

553-553: Consider using a more specific type annotation for _api_budget.

The current type annotation uses Optional[Union[APIBudget, HttpAPIBudget]]. Since both types share a common interface for rate limiting, would it make sense to use a protocol or abstract base class instead? This could make the code more maintainable and extensible. wdyt?

-    self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None
+    self._api_budget: Optional[RateLimitingBudget] = None  # where RateLimitingBudget is a Protocol or ABC

2977-2999: Consider extracting default values to constants.

The create_http_api_budget method has several hardcoded default values. Would it be cleaner to extract these to class-level constants? This would make it easier to maintain and modify these values in the future. wdyt?

+    DEFAULT_RATELIMIT_RESET_HEADER = "ratelimit-reset"
+    DEFAULT_RATELIMIT_REMAINING_HEADER = "ratelimit-remaining"
+    DEFAULT_RATELIMIT_STATUS_CODES = (429,)
+    DEFAULT_MAX_ATTEMPTS = 100000

     def create_http_api_budget(
         self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any
     ) -> HttpAPIBudget:
         policies = [
             self._create_component_from_model(model=policy, config=config)
             for policy in model.policies
         ]

         return HttpAPIBudget(
             policies=policies,
             maximum_attempts_to_acquire=model.maximum_attempts_to_acquire
             if model.maximum_attempts_to_acquire
-            else 100000,
+            else self.DEFAULT_MAX_ATTEMPTS,
             ratelimit_reset_header=model.ratelimit_reset_header
             if model.ratelimit_reset_header
-            else "ratelimit-reset",
+            else self.DEFAULT_RATELIMIT_RESET_HEADER,
             ratelimit_remaining_header=model.ratelimit_remaining_header
             if model.ratelimit_remaining_header
-            else "ratelimit-remaining",
+            else self.DEFAULT_RATELIMIT_REMAINING_HEADER,
             status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit
             if model.status_codes_for_ratelimit_hit
-            else (429,),
+            else self.DEFAULT_RATELIMIT_STATUS_CODES,
         )

3059-3073: Consider using pattern matching for API budget type selection.

The set_api_budget method uses if-else statements for type selection. Since Python 3.10+ supports pattern matching, would it be cleaner to use a match statement here? wdyt?

     def set_api_budget(self, component_definition: ComponentDefinition, config: Config) -> None:
         model_str = component_definition.get("type")
-        if model_str == "APIBudget":
-            # Annotate model_type as a type that is a subclass of BaseModel
-            model_type: Union[Type[APIBudgetModel], Type[HTTPAPIBudgetModel]] = APIBudgetModel
-        elif model_str == "HTTPAPIBudget":
-            model_type = HTTPAPIBudgetModel
-        else:
-            raise ValueError(f"Unknown API Budget type: {model_str}")
+        match model_str:
+            case "APIBudget":
+                model_type: Union[Type[APIBudgetModel], Type[HTTPAPIBudgetModel]] = APIBudgetModel
+            case "HTTPAPIBudget":
+                model_type = HTTPAPIBudgetModel
+            case _:
+                raise ValueError(f"Unknown API Budget type: {model_str}")

         self._api_budget = self.create_component(
             model_type=model_type, component_definition=component_definition, config=config
         )
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (8)

43-48: API Budget Property Addition in the Main Schema
The new "api_budget" property is a clear and useful addition that allows the connector to accept either a generic APIBudget or an HTTP-specific APIBudget configuration. Its placement alongside other core properties looks logical. Would you consider adding an example or further constraints (such as enforcing non-null values) to tighten its usage? wdyt?


1375-1403: New APIBudget Component Definition
The APIBudget definition is comprehensive, requiring both "type" and "policies" and including a field for "maximum_attempts_to_acquire" with a sensible default. The documentation text is clear about its purpose. Do you think adding an explicit example (or reference to one in our docs) would help users better understand the expected format for policies? wdyt?


1404-1448: New HTTPAPIBudget Component Definition
This HTTP-specific variant of APIBudget extends the generic APIBudget neatly by adding headers for ratelimit reset and remaining counters, along with status code indicators for a rate limit hit. The defaults (like "ratelimit-reset" and "ratelimit-remaining") provide clarity. Would you consider making these header names configurable via examples in the schema for clarity? wdyt?


1449-1483: FixedWindowCallRatePolicy Definition Addition
The FixedWindowCallRatePolicy component looks well defined, ensuring that a fixed number of calls can occur within a specified time window. I appreciate the use of ISO8601 date-time format for "next_reset_ts" and the duration format for "period". Would you consider including an example that illustrates a typical value for "period" (e.g. "P1D") to further guide users? wdyt?


1484-1508: MovingWindowCallRatePolicy Component Addition
The MovingWindowCallRatePolicy is defined to allow a fixed number of calls within a moving time window and leverages an array of "Rate" objects along with matchers. This usage is consistent with our design. Do you think providing a brief inline example for the "rates" array might enhance clarity for users new to this concept? wdyt?


1509-1525: UnlimitedCallRatePolicy Component Definition
The UnlimitedCallRatePolicy is straightforward, requiring only the "type" and the "matchers" array. This clearly communicates that the policy has no call limits. Would you be open to adding a note in the description about typical use cases for this policy? wdyt?


1527-1544: Rate Component Definition Addition
Defining the "Rate" component to encapsulate a limit and an interval adds sharp granularity to our rate limiting description. The documentation mentions that "interval" uses the duration format, which is great. Would it be beneficial to include additional example values (or constraints) in the schema to further illustrate acceptable formats? wdyt?


1545-1575: HTTP Request Matcher Component Update
The new HttpRequestMatcher definition appears robust and detailed, allowing matching on HTTP method, base URL, path (with regex support), parameters, and headers. This flexible design should serve varied use cases well. Do you feel any additional constraints or examples are needed to clarify how regex patterns should be constructed? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 707a6c6 and b6bcdd7.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (6 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (12 hunks)
  • airbyte_cdk/sources/streams/call_rate.py (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (11)
airbyte_cdk/sources/streams/call_rate.py (5)

9-14: Added imports for "re" and updated dataclass/typing references
Were these imports introduced for regex support and type hints? Everything looks good in principle. Just confirming that no overshadowing of existing imports happens.


115-117: Constructor parameters for HttpRequestMatcher
The new parameters 'url_base' and 'url_path_pattern' increase flexibility. Looks good, but do we need to handle trailing slashes for patterns as well? wdyt?
[approve]


127-139: Initialization logic for method, url_base, headers, and params
The transformations (like rstrip on url_base, lowercasing header keys, stringifying query params) look correct. No immediate issues spotted.


158-161: Method equality check
Ensuring we compare uppercase variants is crucial for consistency. This looks correct. No concerns here.


445-466: acquire_call() method referencing policies
Your iteration to find the first matching policy is neat. The fallback logging when no policy matches is also helpful. Implementation looks correct.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

6-6: Importing datetime & timedelta
Looks like these are required for new rate-limiting classes. This addition aligns with usage below.


646-658: Rate class definition
Using a pydantic model for “limit” and “interval” is neat. One small thought: do we need to ensure the interval is > 0? wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

115-117: LGTM! Clean import organization.

The new imports are well-organized and follow the existing pattern of grouping related models and components together.

Also applies to: 232-234, 244-246, 253-255, 310-312, 331-333, 377-379, 493-501


1952-1953: LGTM! Clean integration with HttpRequester.

The API budget is cleanly integrated into the HttpRequester class through the api_budget parameter.

Also applies to: 1974-1974


2964-2976: Consider adding validation for empty policies list.

The create_api_budget method doesn't validate if the policies list is empty. An empty policies list might lead to unexpected behavior. wdyt about adding a validation check?

     def create_api_budget(self, model: APIBudgetModel, config: Config, **kwargs: Any) -> APIBudget:
         policies = [
             self._create_component_from_model(model=policy, config=config)
             for policy in model.policies
         ]
+        if not policies:
+            raise ValueError("At least one policy must be specified for APIBudget")

         return APIBudget(
             policies=policies,
             maximum_attempts_to_acquire=model.maximum_attempts_to_acquire
             if model.maximum_attempts_to_acquire
             else 100000,
         )
✅ Verification successful

Proposed Enhancement: Validate Non-Empty Policies List in create_api_budget

Hi there, upon reviewing the APIBudget creation code, I noticed that the policies list is built from model.policies without any checks for emptiness. If model.policies is empty, APIBudget ends up being initialized with an empty policies list, which might lead to unexpected behavior later on. Would you be open to adding a validation step (for example, raising a ValueError when policies is empty) to fail fast in case of misconfiguration? wdyt?

  • Confirm that model.policies should indeed contain at least one policy.
  • Adding an early validation helps prevent downstream errors by ensuring APIBudget is always constructed with valid data.

3001-3014: Consider validating period format.

The create_fixed_window_call_rate_policy method uses parse_duration without validating the period format. Would it be safer to add error handling for invalid duration formats? wdyt?

     def create_fixed_window_call_rate_policy(
         self, model: FixedWindowCallRatePolicyModel, config: Config, **kwargs: Any
     ) -> FixedWindowCallRatePolicy:
         matchers = [
             self._create_component_from_model(model=matcher, config=config)
             for matcher in model.matchers
         ]
+        try:
+            period = parse_duration(model.period)
+        except ValueError as e:
+            raise ValueError(f"Invalid period format: {model.period}. Expected ISO 8601 duration format.") from e
         return FixedWindowCallRatePolicy(
             next_reset_ts=model.next_reset_ts,
-            period=parse_duration(model.period),
+            period=period,
             call_limit=model.call_limit,
             matchers=matchers,
         )
✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Search for the definition of parse_duration in the repository to understand its error handling
rg "def parse_duration" -n .

Length of output: 30


🏁 Script executed:

#!/bin/bash
# Let's search for the definition of parse_duration in all Python files
fd -e py -x grep -H "def parse_duration" {} \;

Length of output: 51


🏁 Script executed:

#!/bin/bash
# Let's search for any import lines that mention parse_duration to determine its source.
rg -n "import .*parse_duration" .

Length of output: 571


Based on our investigation, it looks like parse_duration is directly imported from the external isodate package without any internal safeguards in our code. Since an invalid ISO 8601 duration string would likely cause parse_duration to throw an exception, adding a try/except block to catch and emit a friendlier error makes sense. This would improve robustness when an unexpected format is provided. Would you consider incorporating the try/except as proposed in your patch snippet (wdyt)?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3049-3063: Enhance error handling in set_api_budget.

The error message could be more descriptive and include the list of supported types. Also, consider using a more type-safe approach. wdyt?

    def set_api_budget(self, component_definition: ComponentDefinition, config: Config) -> None:
        model_str = component_definition.get("type")
+       supported_types = {"APIBudget", "HTTPAPIBudget"}
+       
+       if not model_str:
+           raise ValueError("API Budget type not specified")
+           
+       if model_str not in supported_types:
+           raise ValueError(
+               f"Unsupported API Budget type: {model_str}. "
+               f"Supported types are: {', '.join(supported_types)}"
+           )

-       if model_str == "APIBudget":
+       model_type: Type[Union[APIBudgetModel, HTTPAPIBudgetModel]] = (
+           APIBudgetModel if model_str == "APIBudget" else HTTPAPIBudgetModel
+       )

-           # Annotate model_type as a type that is a subclass of BaseModel
-           model_type: Union[Type[APIBudgetModel], Type[HTTPAPIBudgetModel]] = APIBudgetModel
-       elif model_str == "HTTPAPIBudget":
-           model_type = HTTPAPIBudgetModel
-       else:
-           raise ValueError(f"Unknown API Budget type: {model_str}")

        self._api_budget = self.create_component(
            model_type=model_type, component_definition=component_definition, config=config
        )
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

553-553: Consider explicitly initializing the _api_budget variable.

The variable is implicitly initialized to None through the Optional type. For better readability and maintainability, would you consider making it explicit? wdyt?

-    self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None
+    self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None  # Initialized in set_api_budget

2964-2974: Consider using a constant for the default maximum attempts.

The default value of 100000 for maximum_attempts_to_acquire could be defined as a class constant for better maintainability. wdyt?

class ModelToComponentFactory:
+    DEFAULT_MAXIMUM_ATTEMPTS = 100000
    ...
    def create_api_budget(self, model: APIBudgetModel, config: Config, **kwargs: Any) -> APIBudget:
        policies = [
            self._create_component_from_model(model=policy, config=config)
            for policy in model.policies
        ]

        return APIBudget(
            policies=policies,
-           maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or 100000,
+           maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or self.DEFAULT_MAXIMUM_ATTEMPTS,
        )

2975-2990: Consider using constants for default header names and status codes.

The default values for headers and status codes could be defined as class constants for better maintainability. wdyt?

class ModelToComponentFactory:
+    DEFAULT_RATELIMIT_RESET_HEADER = "ratelimit-reset"
+    DEFAULT_RATELIMIT_REMAINING_HEADER = "ratelimit-remaining"
+    DEFAULT_RATELIMIT_STATUS_CODES = (429,)
    ...
    def create_http_api_budget(self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any) -> HttpAPIBudget:
        policies = [
            self._create_component_from_model(model=policy, config=config)
            for policy in model.policies
        ]

        return HttpAPIBudget(
            policies=policies,
            maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or self.DEFAULT_MAXIMUM_ATTEMPTS,
-           ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset",
-           ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining",
-           status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,),
+           ratelimit_reset_header=model.ratelimit_reset_header or self.DEFAULT_RATELIMIT_RESET_HEADER,
+           ratelimit_remaining_header=model.ratelimit_remaining_header or self.DEFAULT_RATELIMIT_REMAINING_HEADER,
+           status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or self.DEFAULT_RATELIMIT_STATUS_CODES,
        )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6bcdd7 and 040ff9e.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (12 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

115-117: LGTM! Well-organized imports.

The new imports are properly organized alphabetically and follow the existing pattern of grouping related imports together.

Also applies to: 232-234, 244-246, 253-255, 310-312, 331-333, 377-379, 493-501


2991-3048: LGTM! Well-implemented factory methods.

The implementations of create_fixed_window_call_rate_policy, create_moving_window_call_rate_policy, create_unlimited_call_rate_policy, create_rate, and create_http_request_matcher are clean and follow the established patterns.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (12)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

553-553: Consider using a more specific type annotation for _api_budget.

The current type annotation allows both APIBudget and HttpAPIBudget. Since these have different interfaces, it might be better to use a common base class or interface. wdyt?

-    self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None
+    self._api_budget: Optional[BaseBudget] = None  # Create a base class/interface that both budget types implement

2979-2994: Consider customizing default header names through configuration.

The HTTP API budget uses hardcoded default header names. Would it be better to make these configurable through the connector config? This would allow easier customization for different APIs. wdyt?

         return HttpAPIBudget(
             policies=policies,
             maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or 100000,
-            ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset",
-            ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining",
+            ratelimit_reset_header=model.ratelimit_reset_header or config.get("ratelimit_reset_header", "ratelimit-reset"),
+            ratelimit_remaining_header=model.ratelimit_remaining_header or config.get("ratelimit_remaining_header", "ratelimit-remaining"),
             status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,),
         )

3042-3052: Consider adding pattern validation for HttpRequestMatcher.

The URL path pattern and other regex patterns are not validated. Should we validate these at creation time to catch invalid patterns early? wdyt?

     def create_http_request_matcher(
         self, model: HttpRequestRegexMatcherModel, config: Config, **kwargs: Any
     ) -> HttpRequestRegexMatcher:
+        # Validate regex patterns
+        try:
+            if model.url_path_pattern:
+                re.compile(model.url_path_pattern)
+            for pattern in model.params.values():
+                re.compile(pattern)
+            for pattern in model.headers.values():
+                re.compile(pattern)
+        except re.error as e:
+            raise ValueError(f"Invalid regex pattern: {str(e)}")
+
         return HttpRequestRegexMatcher(
             method=model.method,
             url_base=model.url_base,
             url_path_pattern=model.url_path_pattern,
             params=model.params,
             headers=model.headers,
         )

3053-3067: Consider using a more descriptive error message in set_api_budget.

The error message for unknown API Budget type could be more helpful by listing the supported types. wdyt?

-            raise ValueError(f"Unknown API Budget type: {model_str}")
+            raise ValueError(f"Unknown API Budget type: {model_str}. Supported types are: APIBudget, HTTPAPIBudget")
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (8)

43-48: New "api_budget" Property in DeclarativeSource
This addition introduces the new api_budget field that accepts either a generic APIBudget or an HTTPAPIBudget. It is a flexible way to configure API request limits for all streams. Would you be open to adding an example or sample value in the schema description for clarity?


1375-1403: APIBudget Component Definition
The APIBudget component is well-defined with required fields for type and policies, and includes a default of 100000 for maximum_attempts_to_acquire. Do you think adding more context about the rationale for this high default, or perhaps offering an example usage inline, might be helpful for future maintainers?


1404-1448: HTTPAPIBudget Component Definition
This extension of APIBudget for HTTP-specific settings looks great—it clearly defines the headers required to update rate limiting information based on HTTP responses. Would you consider including usage examples (for instance, how ratelimit_reset_header and ratelimit_remaining_header are used) or even constraints on acceptable header names?


1449-1483: FixedWindowCallRatePolicy
The FixedWindowCallRatePolicy is cleanly implemented, requiring next_reset_ts, period (formatted as a duration), call_limit, and matchers. Have you considered adding a regex or format check for the period field to ensure it conforms to the expected ISO8601 duration format? Wdyt?


1484-1507: MovingWindowCallRatePolicy Component
This new policy introduces a rates array and associated matchers to define call limits over a moving window. Would it be beneficial to include more detailed examples or validations for the structure of individual rate entries for enhanced clarity?


1509-1525: UnlimitedCallRatePolicy Component
The UnlimitedCallRatePolicy is straightforward and correctly captures a scenario where unlimited calls are permitted. Do you think it might be useful to include a note or an example in the description to remind users when this policy should be applied?


1527-1544: Rate Definition
The Rate component, with its limit and interval fields, is an essential building block for rate limiting. Would providing a couple of inline examples of typical values for 'interval' (using ISO8601 duration) add even more clarity for consumers of this schema?


1545-1574: HttpRequestRegexMatcher Component
This matcher component covers method, url_base, url_path_pattern, parameters, and headers, offering robust flexibility for matching outgoing HTTP requests. Would imposing some recommendations regarding which properties are most commonly required (or perhaps marking some as mandatory for typical use cases) help reduce misconfiguration, or do you prefer to keep it fully optional for maximum flexibility?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 040ff9e and 15f830c.

📒 Files selected for processing (6)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (6 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (12 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py (3 hunks)
  • airbyte_cdk/sources/streams/call_rate.py (7 hunks)
  • unit_tests/sources/streams/test_call_rate.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Publish CDK version to PyPI
  • GitHub Check: Publish SDM to DockerHub
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (8)
unit_tests/sources/streams/test_call_rate.py (1)

363-447: Great test coverage for the new regex-based request matching!

The test class is well-structured with comprehensive test cases covering all key features. Each test method is focused and includes both positive and negative scenarios.

airbyte_cdk/sources/streams/call_rate.py (2)

165-257: LGTM! Well-structured implementation of regex-based request matching.

The implementation is clean with good input normalization and case-insensitive handling. The docstrings are clear and comprehensive.


498-588: Nice refactoring using dataclasses!

The dataclass refactoring makes the code more concise and maintainable. The default values for optional fields are sensible.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

646-1871: Great schema definitions for the new rate limiting components!

The schema is well-documented with clear field descriptions, helpful examples, and proper type hints. The integration with the declarative source schema is clean.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

493-501: LGTM! Clean import organization for API budget components.

The imports are well-organized and grouped logically with other stream-related imports.


1943-1944: LGTM! Clean integration of API budget into HttpRequester.

The API budget is correctly passed through to the HttpRequester constructor.

Also applies to: 1965-1965


2995-3008: Consider validating period format in FixedWindowCallRatePolicy.

The period is parsed using parse_duration but there's no error handling if the format is invalid. Should we add validation and provide a clear error message? wdyt?


2968-2978: Consider adding validation for empty policies list.

The create_api_budget method accepts an empty policies list. Should we validate that at least one policy is provided? wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

553-553: Consider adding a more specific type annotation for _api_budget.

The current annotation Optional[Union[APIBudget, HttpAPIBudget]] could be more specific since HttpAPIBudget is a subclass of APIBudget. wdyt about using just Optional[APIBudget]?


2989-2992: Consider extracting header names and status codes as constants.

These values are used as defaults and could be reused elsewhere. wdyt about defining them as class constants?

+    DEFAULT_RATELIMIT_RESET_HEADER = "ratelimit-reset"
+    DEFAULT_RATELIMIT_REMAINING_HEADER = "ratelimit-remaining"
+    DEFAULT_RATELIMIT_STATUS_CODES = (429,)

     def create_http_api_budget(
         self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any
     ) -> HttpAPIBudget:
         # ...
         return HttpAPIBudget(
             policies=policies,
             maximum_attempts_to_acquire=model.maximum_attempts_to_acquire or 100000,
-            ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset",
-            ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining",
-            status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,),
+            ratelimit_reset_header=model.ratelimit_reset_header or self.DEFAULT_RATELIMIT_RESET_HEADER,
+            ratelimit_remaining_header=model.ratelimit_remaining_header or self.DEFAULT_RATELIMIT_REMAINING_HEADER,
+            status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or self.DEFAULT_RATELIMIT_STATUS_CODES,
         )

3053-3066: Consider improving error handling and documentation in set_api_budget.

A few suggestions to enhance this method:

  1. Add docstring explaining the purpose and parameters
  2. Consider more specific error messages for different failure cases
  3. Add validation for required fields in component_definition

wdyt about these improvements?

     def set_api_budget(self, component_definition: ComponentDefinition, config: Config) -> None:
+        """Configure the API budget for the factory.
+        
+        Args:
+            component_definition: The component definition containing API budget configuration
+            config: The connector configuration
+            
+        Raises:
+            ValueError: If the API budget type is unknown or required fields are missing
+        """
+        if not isinstance(component_definition, dict):
+            raise ValueError("Component definition must be a dictionary")
+
         model_str = component_definition.get("type")
+        if not model_str:
+            raise ValueError("Component definition must specify a 'type'")
+
         if model_str == "APIBudget":
             model_type: Union[Type[APIBudgetModel], Type[HTTPAPIBudgetModel]] = APIBudgetModel
         elif model_str == "HTTPAPIBudget":
             model_type = HTTPAPIBudgetModel
         else:
-            raise ValueError(f"Unknown API Budget type: {model_str}")
+            raise ValueError(f"Unsupported API Budget type: {model_str}. Must be one of: APIBudget, HTTPAPIBudget")

1943-1944: Consider adding a comment explaining the api_budget assignment.

The assignment of self._api_budget to a local variable could benefit from a brief comment explaining its purpose. wdyt?

-        api_budget = self._api_budget
+        # Use the configured API budget if one was set via set_api_budget
+        api_budget = self._api_budget
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

3570-3647: Great test coverage for the API budget functionality!

The test effectively validates the integration of HTTPAPIBudget with HttpRequester, including rate limiting headers, status codes, and moving window policy configuration. Would you consider adding a few edge cases to make it even more robust? For example:

  1. Multiple rate limits with different intervals
  2. Invalid/missing header scenarios
  3. Different HTTP methods in matchers
    wdyt?

3649-3725: Well-structured test for fixed window rate limiting!

The test thoroughly validates the FixedWindowCallRatePolicy configuration. Since this involves time-based behavior, would you consider adding tests with frozen time at different points to verify the policy's behavior around reset boundaries? For example:

  1. Just before reset time
  2. At reset time
  3. Just after reset time
    This could help catch any timing-related edge cases. wdyt?
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1285668 and 7be9842.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (12 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

145-145: LGTM! The import for MovingWindowCallRatePolicy is well-placed and used in the new test functions.

@tolik0 tolik0 requested review from maxi297 and artem1205 February 10, 2025 09:27
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me but I have some questions to make sure I have all the context. I'll re-review the code once I have this context

Copy link
Contributor

@artem1205 artem1205 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

nit: could you please:

  1. add more info to our log message so we could know which stream/endpoint/rule is applied/affected.
    image
  2. add docs/examples on how to use it?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (14)
airbyte_cdk/sources/streams/call_rate.py (4)

171-269: Consider validating URL path patterns and normalizing URL bases, wdyt?

A few suggestions to make the matcher more robust:

  1. The URL path pattern could be validated during initialization to catch invalid regex patterns early.
  2. The URL base normalization could handle both leading and trailing slashes consistently.
 def __init__(
     self,
     method: Optional[str] = None,
     url_base: Optional[str] = None,
     url_path_pattern: Optional[str] = None,
     params: Optional[Mapping[str, Any]] = None,
     headers: Optional[Mapping[str, Any]] = None,
 ):
     self._method = method.upper() if method else None
-    self._url_base = url_base.rstrip("/") if url_base else None
+    # Normalize URL base by stripping both leading and trailing slashes
+    self._url_base = url_base.strip("/") if url_base else None

     # Compile the URL path pattern if provided.
+    if url_path_pattern:
+        try:
+            self._url_path_pattern = re.compile(url_path_pattern)
+        except re.error as e:
+            raise ValueError(f"Invalid URL path pattern '{url_path_pattern}': {e}")
+    else:
+        self._url_path_pattern = None
-    self._url_path_pattern = re.compile(url_path_pattern) if url_path_pattern else None

164-168: Consider standardizing the string representation format across classes, wdyt?

The string representations could follow a more consistent pattern. For example:

  • HttpRequestMatcher: Uses parentheses and comma-separated key-value pairs
  • HttpRequestRegexMatcher: Uses similar format but includes regex pattern
  • FixedWindowCallRatePolicy: Uses different format with line breaks
  • MovingWindowCallRatePolicy: Uses yet another format

Would it be clearer to standardize these? Here's a suggestion for FixedWindowCallRatePolicy:

-    def __str__(self) -> str:
-        matcher_str = ", ".join(f"{matcher}" for matcher in self._matchers)
-        return (
-            f"FixedWindowCallRatePolicy(call_limit={self._call_limit}, period={self._offset}, "
-            f"calls_used={self._calls_num}, next_reset={self._next_reset_ts}, "
-            f"matchers=[{matcher_str}])"
-        )
+    def __str__(self) -> str:
+        return (
+            f"FixedWindowCallRatePolicy("
+            f"call_limit={self._call_limit}, "
+            f"period={self._offset}, "
+            f"calls_used={self._calls_num}, "
+            f"next_reset={self._next_reset_ts}, "
+            f"matchers=[{', '.join(str(m) for m in self._matchers)}])"
+        )

Also applies to: 264-269, 369-375, 483-494


548-564: Consider enhancing endpoint extraction with URL parsing and specific error handling, wdyt?

The current implementation could be more robust:

  1. Use urllib.parse to handle URL components consistently
  2. Add specific error handling for different failure cases
 def _extract_endpoint(self, request: Any) -> str:
     """Extract the endpoint URL from the request if available."""
-    endpoint = None
     try:
         # If the request is already a PreparedRequest, it should have a URL.
         if isinstance(request, requests.PreparedRequest):
-            endpoint = request.url
+            url = request.url
         # If it's a requests.Request, we call prepare() to extract the URL.
         elif isinstance(request, requests.Request):
             prepared = request.prepare()
-            endpoint = prepared.url
-    except Exception as e:
-        logger.debug(f"Error extracting endpoint: {e}")
-    if endpoint:
-        return endpoint
+            url = prepared.url
+        else:
+            return "unknown endpoint"
+
+        # Parse the URL to extract just the path
+        parsed = parse.urlparse(url)
+        return parsed.path or "/"
+    except AttributeError as e:
+        logger.debug(f"Request object missing URL attribute: {e}")
+    except ValueError as e:
+        logger.debug(f"Invalid URL format: {e}")
+    except Exception as e:
+        logger.debug(f"Unexpected error extracting endpoint: {e}")
     return "unknown endpoint"

627-629: Consider adding a comment explaining the negative duration check, wdyt?

-    # Ensure we never sleep for a negative duration.
-    time_to_wait = max(timedelta(0), time_to_wait)
+    # Ensure we never sleep for a negative duration, which could happen if the
+    # system clock changes or if there's clock skew between the client and server.
+    time_to_wait = max(timedelta(0), time_to_wait)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

646-661: Consider adding validation and more examples for the Rate model, wdyt?

The model could be more robust with:

  1. Validation for positive rate limits
  2. ISO duration format validation
  3. More descriptive examples
 class Rate(BaseModel):
     class Config:
         extra = Extra.allow

     limit: int = Field(
         ...,
         description="The maximum number of calls allowed within the interval.",
+        gt=0,  # Ensure positive rate limit
         title="Limit",
+        examples=[100, 1000, 5000],
     )
     interval: str = Field(
         ...,
         description="The time interval for the rate limit.",
-        examples=["PT1H", "P1D"],
+        examples=["PT1H", "P1D", "PT30M", "PT15S"],
+        regex="^P(?!$)(\d+Y)?(\d+M)?(\d+W)?(\d+D)?(T(?=\d)(\d+H)?(\d+M)?(\d+S)?)?$",
         title="Interval",
     )

1630-1677: Consider enhancing the policy models' documentation with examples and use cases, wdyt?

The policy models could benefit from more detailed documentation:

  1. Example configurations for common scenarios
  2. Use case recommendations
  3. Comparison between different policies
 class FixedWindowCallRatePolicy(BaseModel):
     class Config:
         extra = Extra.allow

     type: Literal["FixedWindowCallRatePolicy"]
     period: timedelta = Field(
         ...,
-        description="The time interval for the rate limit window.",
+        description="The time interval for the rate limit window. Best suited for APIs with fixed reset periods (e.g., hourly or daily limits).",
         title="Period",
+        examples=["PT1H", "P1D"],
     )

 class MovingWindowCallRatePolicy(BaseModel):
     class Config:
         extra = Extra.allow

     type: Literal["MovingWindowCallRatePolicy"]
     rates: List[Rate] = Field(
         ...,
-        description="List of rates that define the call limits for different time intervals.",
+        description="List of rates that define the call limits for different time intervals. Ideal for APIs with rolling windows or multiple rate limits (e.g., 100 calls per minute AND 1000 calls per hour).",
         title="Rates",
+        examples=[
+            [{"limit": 100, "interval": "PT1M"}, {"limit": 1000, "interval": "PT1H"}],
+        ],
     )

1810-1846: Consider adding validation for the HTTPAPIBudget model, wdyt?

The model could be more robust with:

  1. Header name validation
  2. Status code range validation
  3. Default values alignment
 class HTTPAPIBudget(BaseModel):
     class Config:
         extra = Extra.allow

     type: Literal["HTTPAPIBudget"]
     policies: List[
         Union[
             FixedWindowCallRatePolicy,
             MovingWindowCallRatePolicy,
             UnlimitedCallRatePolicy,
         ]
     ] = Field(
         ...,
         description="List of call rate policies that define how many calls are allowed.",
         title="Policies",
+        min_items=1,  # Ensure at least one policy is defined
     )
     ratelimit_reset_header: Optional[str] = Field(
         "ratelimit-reset",
         description="The HTTP response header name that indicates when the rate limit resets.",
         title="Rate Limit Reset Header",
+        regex="^[A-Za-z0-9-]+$",  # Ensure valid header name format
     )
     status_codes_for_ratelimit_hit: Optional[List[int]] = Field(
         [429],
         description="List of HTTP status codes that indicate a rate limit has been hit.",
         title="Status Codes for Rate Limit Hit",
+        items={"ge": 100, "lt": 600},  # Ensure valid HTTP status code range
     )
     maximum_attempts_to_acquire: Optional[int] = Field(
         100000,
         description="The maximum number of attempts to acquire a call before giving up.",
         title="Maximum Attempts to Acquire",
+        gt=0,  # Ensure positive number of attempts
     )
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)

550-550: Consider adding type annotation for _api_budget.

The instance variable could benefit from an explicit type annotation. wdyt about:

-        self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None
+        self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None

2964-2978: Consider adding validation for empty policies list.

The create_http_api_budget method might want to validate that the policies list is not empty, as an API budget without policies may not be useful. wdyt about adding a check?

    def create_http_api_budget(
        self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any
    ) -> HttpAPIBudget:
        policies = [
            self._create_component_from_model(model=policy, config=config)
            for policy in model.policies
        ]
+       if not policies:
+           raise ValueError("At least one policy must be provided for HTTPAPIBudget")

        return HttpAPIBudget(
            policies=policies,
            ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset",
            ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining",
            status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,),
        )

2979-2995: Consider documenting the 10-day initial reset timestamp choice.

In create_fixed_window_call_rate_policy, the initial reset timestamp is set to 10 days from now. It would be helpful to document why this specific duration was chosen. wdyt about adding a more detailed comment?

-       # Set the initial reset timestamp to 10 days from now.
-       # This value will be updated by the first request.
+       # Set the initial reset timestamp to 10 days from now to ensure the first request
+       # doesn't immediately hit rate limits. This is a safe default as the actual reset
+       # timestamp will be updated based on the API response headers after the first request.

3023-3028: Consider validating the interval format.

The create_rate method uses parse_duration but doesn't handle potential parsing errors. wdyt about adding error handling?

    def create_rate(self, model: RateModel, config: Config, **kwargs: Any) -> Rate:
+       try:
+           interval = parse_duration(model.interval)
+       except ValueError as e:
+           raise ValueError(f"Invalid interval format in Rate: {e}")
+
        return Rate(
            limit=model.limit,
-           interval=parse_duration(model.interval),
+           interval=interval,
        )

3040-3044: Consider adding error handling for invalid component definitions.

The set_api_budget method could benefit from explicit error handling for invalid component definitions. wdyt about:

    def set_api_budget(self, component_definition: ComponentDefinition, config: Config) -> None:
+       if not component_definition or "type" not in component_definition:
+           raise ValueError("Invalid API budget component definition")
+
        self._api_budget = self.create_component(
            model_type=HTTPAPIBudgetModel, component_definition=component_definition, config=config
        )
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1412-1416: Consider reducing the default maximum attempts.

The default value of 100000 for maximum_attempts_to_acquire seems quite high and could lead to very long wait times. Would you consider reducing this to a more reasonable default like 1000 or even 100? Users can always override it if they need more attempts. wdyt?


1432-1435: Align duration format with other duration fields.

The period field uses a custom format: duration while other duration fields in the schema use ISO 8601 duration format with examples like "PT1H" or "P1D". Should we align this for consistency? wdyt?

-      format: duration
+      examples:
+        - "PT1H"
+        - "P1D"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7be9842 and 675d829.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (6 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (11 hunks)
  • airbyte_cdk/sources/streams/call_rate.py (9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (8)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)

229-231: LGTM! The new imports are well organized and follow the existing pattern.

The imports are logically grouped and maintain consistency with the existing codebase structure.

Also applies to: 241-243, 250-252, 307-309, 328-330, 374-376, 490-498


1939-1940: LGTM! The API budget integration in create_http_requester is clean and non-intrusive.

The changes maintain backward compatibility by using the instance variable without modifying the existing logic.

Also applies to: 1961-1961


2996-3010: LGTM! The moving window policy implementation is clean and well-structured.

The method correctly creates rates and matchers from the model components.


3011-3022: LGTM! The unlimited policy implementation is straightforward.

The implementation is simple and focused, which is appropriate for an unlimited policy.


3029-3039: LGTM! The request matcher implementation is well-structured.

The method handles all necessary matching criteria (method, URL, params, headers) in a clean way.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

1513-1515: Great documentation for URL components!

The description clearly explains how to use url_base and url_path_pattern together, which helps prevent confusion about URL construction.


1490-1509: Well-structured Rate definition!

The Rate component is clean, focused, and includes good examples for the interval field. The required fields are appropriate, and the additionalProperties flag allows for future extensibility.


1373-1417: Verify header names are case-insensitive.

The default header names (ratelimit-reset and ratelimit-remaining) might need to handle different cases as HTTP headers are case-insensitive. Should we add a note about case-insensitive matching? wdyt?

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Search for existing rate limit header handling in the codebase
rg -i "ratelimit.*header" -A 5

Length of output: 7934


Confirm HTTP header case-insensitivity behavior

The code leverages the HTTP client's built‐in case‐insensitive header matching (e.g., Requests' CaseInsensitiveDict), so even if header names are defined in a particular case (like "ratelimit-reset"), lookups will succeed regardless of the case in the actual HTTP response. We do see in unit tests that alternative casings (e.g., "X-RateLimit-Reset") are correctly handled.

Would you consider adding a clarifying note in the YAML description (and/or docs) to explicitly mention that header lookups are case-insensitive? wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

1430-1430: 🛠️ Refactor suggestion

Remove the format field as discussed.

Based on the past review comments, the format: duration field should be removed to align with other fields of the same type.

- format: duration
🧹 Nitpick comments (8)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)

646-661: Consider enhancing the Rate class documentation.

The class looks good, but we could make it even better by adding more details to the documentation. What do you think about:

  1. Adding examples of interval formats in the field description?
  2. Clarifying if negative limits are allowed?
 class Rate(BaseModel):
     class Config:
         extra = Extra.allow

     limit: int = Field(
         ...,
         description="The maximum number of calls allowed within the interval.",
+        examples=[100, 1000],
         title="Limit",
     )
     interval: str = Field(
         ...,
-        description="The time interval for the rate limit.",
+        description="The time interval for the rate limit in ISO 8601 duration format (e.g., PT1H for 1 hour, P1D for 1 day).",
         examples=["PT1H", "P1D"],
         title="Interval",
     )

663-686: Consider adding validation for URL patterns.

The HttpRequestRegexMatcher looks good, but we could make it more robust. What do you think about adding some validation to ensure the URL patterns are valid regex expressions?

 class HttpRequestRegexMatcher(BaseModel):
     class Config:
         extra = Extra.allow

+    @validator('url_path_pattern')
+    def validate_regex_pattern(cls, v):
+        if v is not None:
+            try:
+                re.compile(v)
+            except re.error as e:
+                raise ValueError(f"Invalid regex pattern: {e}")
+        return v

     method: Optional[str] = Field(
         None, description="The HTTP method to match (e.g., GET, POST).", title="Method"
     )

1630-1648: Consider adding validation for period duration.

The FixedWindowCallRatePolicy looks good, but we could add validation to ensure the period is a positive duration. What do you think?

 class FixedWindowCallRatePolicy(BaseModel):
     class Config:
         extra = Extra.allow

+    @validator('period')
+    def validate_period(cls, v):
+        if v.total_seconds() <= 0:
+            raise ValueError("Period must be positive")
+        return v

     type: Literal["FixedWindowCallRatePolicy"]
     period: timedelta = Field(
         ..., description="The time interval for the rate limit window.", title="Period"
     )

1810-1842: Consider adding default values for optional fields.

The HTTPAPIBudget class looks good, but we could make it more user-friendly by providing sensible defaults for the optional fields. What do you think?

 class HTTPAPIBudget(BaseModel):
     class Config:
         extra = Extra.allow

     type: Literal["HTTPAPIBudget"]
     policies: List[
         Union[
             FixedWindowCallRatePolicy,
             MovingWindowCallRatePolicy,
             UnlimitedCallRatePolicy,
         ]
     ] = Field(
         ...,
         description="List of call rate policies that define how many calls are allowed.",
         title="Policies",
     )
     ratelimit_reset_header: Optional[str] = Field(
-        "ratelimit-reset",
+        "x-ratelimit-reset",  # More commonly used header name
         description="The HTTP response header name that indicates when the rate limit resets.",
         title="Rate Limit Reset Header",
     )
     ratelimit_remaining_header: Optional[str] = Field(
-        "ratelimit-remaining",
+        "x-ratelimit-remaining",  # More commonly used header name
         description="The HTTP response header name that indicates the number of remaining allowed calls.",
         title="Rate Limit Remaining Header",
     )
     status_codes_for_ratelimit_hit: Optional[List[int]] = Field(
-        [429],
+        [429, 403],  # Some APIs use 403 for rate limiting
         description="List of HTTP status codes that indicate a rate limit has been hit.",
         title="Status Codes for Rate Limit Hit",
     )
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)

1373-1412: Consider adding examples for header names, wdyt?

The HTTPAPIBudget component looks good with clear description and required fields. However, it might be helpful to add examples for ratelimit_reset_header and ratelimit_remaining_header to show common header names used by different APIs.

  ratelimit_reset_header:
    title: Rate Limit Reset Header
    description: The HTTP response header name that indicates when the rate limit resets.
    type: string
    default: "ratelimit-reset"
+   examples:
+     - "x-ratelimit-reset"
+     - "retry-after"
+     - "x-rate-limit-reset"
  ratelimit_remaining_header:
    title: Rate Limit Remaining Header
    description: The HTTP response header name that indicates the number of remaining allowed calls.
    type: string
    default: "ratelimit-remaining"
+   examples:
+     - "x-ratelimit-remaining"
+     - "x-rate-limit-remaining"

1413-1442: Consider adding examples for the period field, wdyt?

The FixedWindowCallRatePolicy component is well-structured, but it would be helpful to add examples for the period field to demonstrate valid duration formats.

  period:
    title: Period
    description: The time interval for the rate limit window.
    type: string
    format: duration
+   examples:
+     - "PT1H"    # 1 hour
+     - "P1D"     # 1 day
+     - "PT30M"   # 30 minutes

1485-1504: Consider adding validation for positive integers, wdyt?

The Rate component looks good, but consider adding validation to ensure the limit is a positive integer.

  limit:
    title: Limit
    description: The maximum number of calls allowed within the interval.
    type: integer
+   minimum: 1

1505-1535: Consider adding examples for URL patterns, wdyt?

The HttpRequestRegexMatcher component is well-defined, but it would be helpful to add examples for the url_path_pattern field to demonstrate common regex patterns.

  url_path_pattern:
    title: URL Path Pattern
    description: A regular expression pattern to match the URL path.
    type: string
+   examples:
+     - "^/v1/users/\\d+$"           # Matches paths like /v1/users/123
+     - "^/v2/orders/[a-zA-Z0-9-]+$" # Matches paths like /v2/orders/order-123
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 675d829 and f2529a4.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1876-1880: LGTM! The api_budget field is well documented.

The addition of the api_budget field to DeclarativeSource1 and DeclarativeSource2 is clear and consistent.

Also applies to: 1907-1911

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

43-48: LGTM! The new api_budget property is well-defined.

The property is properly integrated into the root schema with clear description and appropriate references to the new components. The description effectively communicates that this applies across all streams.

@tolik0
Copy link
Contributor Author

tolik0 commented Feb 11, 2025

@artem1205 I’ve added debug logs with request details and policies, as well as documentation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

3570-3648: LGTM! Well-structured test for MovingWindowCallRatePolicy.

The test thoroughly validates the API budget configuration with MovingWindowCallRatePolicy. I particularly like how you test both the high-level budget configuration and the specific policy details.

Would you consider adding a test case for multiple rate limits in the same policy (e.g., different limits for different time intervals)? This could help validate more complex rate limiting scenarios, wdyt?


3649-3719: LGTM! Comprehensive test for FixedWindowCallRatePolicy.

The test effectively validates the fixed window policy configuration and its integration with HttpRequester. Good job on testing the period conversion from ISO 8601 duration to seconds.

Consider adding assertions for error cases, like invalid period formats or negative call limits? This would help ensure robust error handling, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2529a4 and ba8a735.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (14)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

646-661: Consider adding interval validation and documentation.

The interval field accepts ISO 8601 duration strings but lacks validation. Would you consider:

  1. Adding a validator to ensure the string is a valid ISO 8601 duration?
  2. Including example durations in the field description?
interval: str = Field(
    ...,
    description="The time interval for the rate limit in ISO 8601 duration format (e.g., PT1H for 1 hour, P1D for 1 day).",
    examples=["PT1H", "P1D", "PT30M"],
    title="Interval",
)

1667-1677: Consider enhancing documentation for unlimited policy.

The UnlimitedCallRatePolicy could benefit from clearer documentation about its use cases. Would you consider adding a note explaining when to use this policy (e.g., for endpoints that don't have rate limits)?

type: Literal["UnlimitedCallRatePolicy"]
matchers: List[HttpRequestRegexMatcher] = Field(
    ...,
    description="List of matchers that define which requests this policy applies to. Use this policy for endpoints that don't have rate limits or when you want to explicitly mark endpoints as unlimited.",
    title="Matchers",
)

1810-1841: Consider expanding rate limit headers and status codes.

The implementation looks solid, but would you consider:

  1. Adding more common header variations (e.g., X-RateLimit-Reset, X-RateLimit-Remaining)?
  2. Including additional common rate limit status codes (e.g., 403, 503)?
ratelimit_reset_header: Optional[str] = Field(
    "ratelimit-reset",
    description="The HTTP response header name that indicates when the rate limit resets. Common variations: ratelimit-reset, x-ratelimit-reset",
    title="Rate Limit Reset Header",
)
status_codes_for_ratelimit_hit: Optional[List[int]] = Field(
    [429, 403, 503],
    description="List of HTTP status codes that indicate a rate limit has been hit. Common codes: 429 (Too Many Requests), 403 (Forbidden), 503 (Service Unavailable)",
    title="Status Codes for Rate Limit Hit",
)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)

550-550: Consider adding type annotation for the instance variable.

wdyt about adding a type annotation to make the code more maintainable?

-        self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None
+        self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None  # type: ignore[assignment]

2981-2994: Consider adding validation for required fields.

The code provides default values for optional fields, but wdyt about adding validation to ensure that required fields like policies are not None?

    def create_http_api_budget(
        self, model: HTTPAPIBudgetModel, config: Config, **kwargs: Any
    ) -> HttpAPIBudget:
+        if not model.policies:
+            raise ValueError("policies is required for HTTPAPIBudget")
+
        policies = [
            self._create_component_from_model(model=policy, config=config)
            for policy in model.policies
        ]

        return HttpAPIBudget(
            policies=policies,
            ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset",
            ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining",
            status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or (429,),
        )

2996-3011: Consider documenting the initial reset timestamp logic.

The code sets an initial reset timestamp 10 days in the future. wdyt about adding a comment explaining why this specific value was chosen and that it will be updated by the first request?

    def create_fixed_window_call_rate_policy(
        self, model: FixedWindowCallRatePolicyModel, config: Config, **kwargs: Any
    ) -> FixedWindowCallRatePolicy:
        matchers = [
            self._create_component_from_model(model=matcher, config=config)
            for matcher in model.matchers
        ]

-        # Set the initial reset timestamp to 10 days from now.
-        # This value will be updated by the first request.
+        # Set a far-future initial reset timestamp (10 days) to prevent immediate rate limiting.
+        # This is a temporary value that will be updated with the actual server-provided
+        # reset timestamp after the first request.
        return FixedWindowCallRatePolicy(
            next_reset_ts=datetime.datetime.now() + datetime.timedelta(days=10),
            period=model.period,
            call_limit=model.call_limit,
            matchers=matchers,
        )

3040-3044: Consider validating duration format.

wdyt about adding validation to ensure the interval string is in a valid duration format before parsing?

    def create_rate(self, model: RateModel, config: Config, **kwargs: Any) -> Rate:
+        try:
+            interval = parse_duration(model.interval)
+        except ValueError as e:
+            raise ValueError(f"Invalid interval format: {model.interval}. Expected ISO 8601 duration format.") from e
+
        return Rate(
            limit=model.limit,
-            interval=parse_duration(model.interval),
+            interval=interval,
        )

3046-3055: Consider adding regex pattern validation.

wdyt about validating the regex patterns in url_path_pattern to ensure they are valid before creating the matcher?

    def create_http_request_matcher(
        self, model: HttpRequestRegexMatcherModel, config: Config, **kwargs: Any
    ) -> HttpRequestRegexMatcher:
+        try:
+            if model.url_path_pattern:
+                re.compile(model.url_path_pattern)
+        except re.error as e:
+            raise ValueError(f"Invalid URL path pattern: {model.url_path_pattern}. Error: {str(e)}")
+
        return HttpRequestRegexMatcher(
            method=model.method,
            url_base=model.url_base,
            url_path_pattern=model.url_path_pattern,
            params=model.params,
            headers=model.headers,
        )
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

3689-3768: LGTM! Well-structured test for MovingWindowCallRatePolicy.

The test case provides good coverage of the basic functionality. What do you think about adding a few edge cases to make it more robust? For example:

  • Invalid rate limit headers
  • Multiple rate policies
  • Invalid status codes

3769-3840: LGTM! Comprehensive test for FixedWindowCallRatePolicy.

The test case thoroughly validates the fixed window policy configuration. Consider adding tests for:

  • Invalid period format
  • Zero or negative call limits
  • Invalid regex patterns in matchers
    wdyt?
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)

1370-1411: Great implementation of the HTTPAPIBudget component!

The component is well-structured with clear scope limitations and flexible configuration options. The description clearly states that only requests using HttpRequester are rate-limited.

One small suggestion: Would you consider adding an example configuration in the description to help users get started quickly? wdyt?


1429-1429: Consider removing the format field or updating its documentation.

The format: duration field seems to be a remnant from a previous version, as it's not documented in the description. Should we remove it or add documentation about the expected duration format? wdyt?


1484-1503: Well-structured Rate component with room for more documentation.

The component is clean and focused. Would you consider enhancing the interval description to explain the expected ISO8601 duration format (like the examples show)? This would help users understand valid values for the interval field. wdyt?


1504-1534: Great implementation of the request matcher!

The matcher provides comprehensive capabilities for matching HTTP requests. To make it even more user-friendly, would you consider adding example regex patterns for the url_path_pattern field? This could help users understand how to write effective patterns. wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba8a735 and 6df08c2.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (6 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (11 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (11)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)

663-686: LGTM! Comprehensive request matching capabilities.

The HttpRequestRegexMatcher class provides flexible matching through method, URL, parameters, and headers. The implementation looks solid and well-documented.


1630-1648: LGTM! Well-structured fixed window rate limiting.

The FixedWindowCallRatePolicy implementation is clean and provides a straightforward way to implement fixed window rate limiting.


1650-1665: LGTM! Flexible moving window rate limiting.

The MovingWindowCallRatePolicy allows for multiple rates with different intervals, providing great flexibility for complex rate limiting scenarios.


1876-1876: LGTM! Clean integration of API budget.

The addition of the optional api_budget field to DeclarativeSource classes is clean and maintains backward compatibility.

Also applies to: 1903-1903

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)

229-231: LGTM! Clean import organization.

The new imports are well-organized and follow the existing pattern of grouping related imports together.

Also applies to: 241-243, 250-252, 307-309, 328-330, 374-376, 490-498


1956-1957: LGTM! Clean integration with HttpRequester.

The API budget is correctly passed to the HttpRequester constructor.

Also applies to: 1978-1978


3013-3026: LGTM! Clean implementation of MovingWindowCallRatePolicy.

The implementation correctly creates rates and matchers from the model.


3028-3038: LGTM! Clean implementation of UnlimitedCallRatePolicy.

The implementation correctly creates matchers from the model.


3057-3060: LGTM! Clean implementation of set_api_budget.

The method correctly creates and sets the API budget component.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

43-44: LGTM! Good addition of the API budget property.

The new property is well-integrated into the schema structure and correctly references the HTTPAPIBudget definition.


1412-1483: Excellent design of the rate limiting policies!

The three policy types (Fixed Window, Moving Window, and Unlimited) provide a comprehensive solution for different rate limiting scenarios. The consistent use of matchers across all policies makes the implementation flexible and maintainable.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small things for maintenance reasons but the behavior makes sense to me

airbyte_cdk/sources/streams/call_rate.py Show resolved Hide resolved
airbyte_cdk/sources/streams/call_rate.py Outdated Show resolved Hide resolved
airbyte_cdk/sources/streams/call_rate.py Show resolved Hide resolved
@tolik0 tolik0 force-pushed the tolik0/add-api-budget branch from 6df08c2 to 09ad1a1 Compare February 11, 2025 21:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
airbyte_cdk/sources/streams/call_rate.py (2)

547-563: Consider enhancing the endpoint extraction method.

The method handles different request types well, but could be improved. What do you think about:

  1. Adding type hints for better code maintainability?
  2. Using more specific exception handling instead of catching all exceptions?
-    def _extract_endpoint(self, request: Any) -> str:
+    def _extract_endpoint(self, request: Union[requests.Request, requests.PreparedRequest]) -> str:
         """Extract the endpoint URL from the request if available."""
         endpoint = None
         try:
-            # If the request is already a PreparedRequest, it should have a URL.
             if isinstance(request, requests.PreparedRequest):
                 endpoint = request.url
-            # If it's a requests.Request, we call prepare() to extract the URL.
             elif isinstance(request, requests.Request):
                 prepared = request.prepare()
                 endpoint = prepared.url
-        except Exception as e:
+        except (AttributeError, TypeError) as e:
             logger.debug(f"Error extracting endpoint: {e}")
         if endpoint:
             return endpoint
         return "unknown endpoint"

170-262: Consider enhancing the HttpRequestRegexMatcher implementation.

The implementation is solid, but could be even better. What do you think about:

  1. Adding input validation for the regex pattern to catch invalid patterns early?
  2. Caching the compiled regex pattern results for better performance when the same pattern is used multiple times?
     def __init__(
         self,
         method: Optional[str] = None,
         url_base: Optional[str] = None,
         url_path_pattern: Optional[str] = None,
         params: Optional[Mapping[str, Any]] = None,
         headers: Optional[Mapping[str, Any]] = None,
     ):
+        # Validate regex pattern
+        if url_path_pattern:
+            try:
+                re.compile(url_path_pattern)
+            except re.error as e:
+                raise ValueError(f"Invalid regex pattern: {e}")

         self._method = method.upper() if method else None
         self._url_base = url_base.rstrip("/") if url_base else None
-        self._url_path_pattern = re.compile(url_path_pattern) if url_path_pattern else None
+        # Cache compiled pattern
+        self._url_path_pattern_str = url_path_pattern
+        self._url_path_pattern = re.compile(url_path_pattern) if url_path_pattern else None
         self._params = {str(k): str(v) for k, v in (params or {}).items()}
         self._headers = {str(k).lower(): str(v) for k, v in (headers or {}).items()}
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

550-550: Consider explicitly initializing _api_budget to None.

For better readability and to follow Python best practices, wdyt about explicitly initializing the _api_budget variable to None?

-        self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None
+        self._api_budget: Optional[Union[APIBudget, HttpAPIBudget]] = None

2964-2977: Consider handling None values more gracefully.

The create_http_api_budget method could handle None values more gracefully. wdyt about using the or operator for default values?

-            policies=policies,
-            ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset",
-            ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining",
-            status_codes_for_ratelimit_hit=model.status_codes_for_ratelimit_hit or [429],
+            policies=policies or [],
+            ratelimit_reset_header=model.ratelimit_reset_header or "ratelimit-reset",
+            ratelimit_remaining_header=model.ratelimit_remaining_header or "ratelimit-remaining",
+            status_codes_for_ratelimit_hit=tuple(model.status_codes_for_ratelimit_hit or [429]),

2987-2994: Consider making the initial reset timestamp configurable.

The hardcoded 10-day initial reset timestamp might not be suitable for all use cases. wdyt about making it configurable through the model?

+        initial_reset_period = parse_duration(model.initial_reset_period) if model.initial_reset_period else datetime.timedelta(days=10)
         return FixedWindowCallRatePolicy(
-            next_reset_ts=datetime.datetime.now() + datetime.timedelta(days=10),
+            next_reset_ts=datetime.datetime.now() + initial_reset_period,
             period=parse_duration(model.period),
             call_limit=model.call_limit,
             matchers=matchers,
         )

1939-1940: Consider adding a docstring explaining API Budget usage.

For better maintainability, wdyt about adding a comment explaining how the API Budget is used in the HttpRequester?

+        # Use the configured API Budget for rate limiting if available
         api_budget = self._api_budget
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

1370-1411: LGTM! Well-structured API budget component with clear documentation.

The HTTPAPIBudget component is well-designed with configurable headers and status codes. The documentation clearly states the scope limitation to HttpRequester.

One suggestion: Would you consider adding examples in the description to show how the rate limit headers might look in actual API responses? This could help users map their API's response headers correctly, wdyt?


1412-1502: The rate limiting policies are well-structured but could use example configurations.

The three policy types (Fixed Window, Moving Window, and Unlimited) provide good flexibility for different rate limiting scenarios. The Rate component nicely encapsulates the limit and interval concepts.

Would you consider adding example configurations for each policy type to show:

  1. How to set up a "100 calls per hour" fixed window limit
  2. How to configure multiple rates for moving window (e.g., "1000 per day AND 100 per hour")
  3. A typical use case for unlimited policy (e.g., health check endpoints)

This would help users choose and configure the right policy for their needs, wdyt?


1503-1533: The HTTP request matcher provides flexible matching but could use pattern examples.

The HttpRequestRegexMatcher component allows comprehensive request matching. Would you consider enhancing the documentation with:

  1. Example regex patterns for common URL path matching scenarios
  2. Best practices for writing efficient regex patterns
  3. Examples showing how to combine multiple matching criteria

This would help users write correct and maintainable matchers, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6df08c2 and 09ad1a1.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (11 hunks)
  • airbyte_cdk/sources/streams/call_rate.py (9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (6)
airbyte_cdk/sources/streams/call_rate.py (2)

584-586: Great addition of debug logs!

The debug logs provide valuable context about endpoints and policies, which will help with troubleshooting rate limit issues.

Also applies to: 628-631, 634-637, 641-644


163-167: Well-structured string representations!

The __str__ methods provide comprehensive information about the objects, which is very helpful for debugging and logging. The consistent format across different classes is a nice touch.

Also applies to: 263-268, 368-374, 482-493

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

645-660: Well-structured schema models for rate limiting!

The new models for rate limiting and request matching are well-designed:

  • Clear field descriptions and examples
  • Consistent use of Field annotations
  • Good type hints and validation
  • Follows the existing patterns in the file

Also applies to: 662-685, 1629-1647, 1649-1664, 1666-1676, 1809-1840

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

229-231: LGTM! Well-organized imports.

The new imports for API Budget related classes are properly organized alphabetically and follow the existing pattern.

Also applies to: 241-243, 250-252, 307-309, 328-330, 374-376, 490-498


1961-1961: LGTM! Clean integration.

The API Budget is cleanly integrated into the HttpRequester creation.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

43-44: LGTM! Good addition of the API budget configuration at the root level.

The api_budget property is well-integrated into the schema structure, allowing for global API budget configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/streams/call_rate.py (3)

171-269: Consider adding input validation for regex pattern.

The implementation looks solid! However, what do you think about adding validation for the url_path_pattern to catch invalid regex patterns early? This could prevent runtime errors.

 def __init__(
     self,
     method: Optional[str] = None,
     url_base: Optional[str] = None,
     url_path_pattern: Optional[str] = None,
     params: Optional[Mapping[str, Any]] = None,
     headers: Optional[Mapping[str, Any]] = None,
 ):
+    if url_path_pattern:
+        try:
+            re.compile(url_path_pattern)
+        except re.error as e:
+            raise ValueError(f"Invalid regex pattern '{url_path_pattern}': {e}")

548-564: Consider enhancing endpoint extraction robustness.

The endpoint extraction looks good but could be more robust. What do you think about:

  1. Handling URL parsing errors explicitly?
  2. Extracting just the path for cleaner logs?
 def _extract_endpoint(self, request: Any) -> str:
     """Extract the endpoint URL from the request if available."""
     endpoint = None
     try:
         if isinstance(request, requests.PreparedRequest):
-            endpoint = request.url
+            parsed = parse.urlparse(request.url)
+            endpoint = parsed.path
         elif isinstance(request, requests.Request):
             prepared = request.prepare()
-            endpoint = prepared.url
+            parsed = parse.urlparse(prepared.url)
+            endpoint = parsed.path
-    except Exception as e:
+    except (AttributeError, ValueError) as e:
         logger.debug(f"Error extracting endpoint: {e}")
     if endpoint:
         return endpoint
     return "unknown endpoint"

656-656: Consider validating status codes.

What do you think about validating that the status codes are within the valid HTTP range (100-599)? This could prevent configuration errors.

 def __init__(
     self,
     ratelimit_reset_header: str = "ratelimit-reset",
     ratelimit_remaining_header: str = "ratelimit-remaining",
     status_codes_for_ratelimit_hit: list[int] = [429],
     **kwargs: Any,
 ):
+    if not all(100 <= code <= 599 for code in status_codes_for_ratelimit_hit):
+        raise ValueError("HTTP status codes must be between 100 and 599")
     self._ratelimit_reset_header = ratelimit_reset_header
     self._ratelimit_remaining_header = ratelimit_remaining_header
     self._status_codes_for_ratelimit_hit = status_codes_for_ratelimit_hit
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09ad1a1 and bd9ac57.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/call_rate.py (10 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte_cdk/sources/streams/call_rate.py (2)

9-9: LGTM! Good call on configuring the pyrate_limiter logger.

Setting the pyrate_limiter logger to WARNING level helps reduce noise in the logs.

Also applies to: 29-29


164-168: Great job on the consistent string representations!

The str methods provide clear, consistent, and informative output that will be valuable for debugging and logging.

Also applies to: 369-375, 483-494

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
airbyte_cdk/sources/streams/call_rate.py (5)

119-135: Consider caching the regex pattern.

Since re.escape(url_path) is called during initialization and the pattern is immutable, we could cache the compiled pattern. What do you think about this optimization?

     if url:
         parsed_url = parse.urlsplit(url)
         url_base = f"{parsed_url.scheme}://{parsed_url.netloc}"
-        url_path = parsed_url.path if parsed_url.path != "/" else None
+        url_path_pattern = re.escape(parsed_url.path) if parsed_url.path != "/" else None
     else:
         url_base = None
-        url_path = None
+        url_path_pattern = None

     # Use HttpRequestRegexMatcher under the hood
     self._regex_matcher = HttpRequestRegexMatcher(
         method=method,
         url_base=url_base,
-        url_path_pattern=re.escape(url_path) if url_path else None,
+        url_path_pattern=url_path_pattern,
         params=params,
         headers=headers,
     )

201-208: Consider adding type hints for the request parameter.

The error handling is good, but type hints would make it clearer what types are supported. wdyt?

-    def __call__(self, request: Any) -> bool:
+    def __call__(self, request: Union[requests.Request, requests.PreparedRequest]) -> bool:

464-475: Consider using a list comprehension for matchers.

The string representation could be more concise. What do you think about this optimization?

-        matcher_str = ", ".join(f"{matcher}" for matcher in self._matchers)
+        matcher_str = ", ".join(map(str, self._matchers))

529-545: Consider using a more specific exception type.

The broad Exception catch could mask important errors. Would it make sense to catch specific exceptions like AttributeError or ValueError?

-        except Exception as e:
+        except (AttributeError, ValueError) as e:

637-638: Consider validating status codes.

HTTP status codes should be between 100 and 599. What do you think about adding validation for the input list?

     def __init__(
         self,
         ratelimit_reset_header: str = "ratelimit-reset",
         ratelimit_remaining_header: str = "ratelimit-remaining",
         status_codes_for_ratelimit_hit: list[int] = [429],
         **kwargs: Any,
     ):
+        if not all(100 <= code <= 599 for code in status_codes_for_ratelimit_hit):
+            raise ValueError("All status codes must be valid HTTP status codes (100-599)")
         self._ratelimit_reset_header = ratelimit_reset_header
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd9ac57 and ea0c51b.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/streams/call_rate.py (11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-the-guardian-api' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/streams/call_rate.py (1)

9-9: LGTM! Good logging setup.

Setting pyrate_limiter's log level to WARNING helps reduce noise in the logs while still capturing important information.

Also applies to: 29-29

airbyte_cdk/sources/streams/call_rate.py Show resolved Hide resolved
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a speedrun review but I'm good with what I saw for the refactor of the HTTP matcher. Thanks for the refactor! I'm sure it'll be easier to manage.

@tolik0 tolik0 merged commit fdcded3 into main Feb 12, 2025
23 checks passed
@tolik0 tolik0 deleted the tolik0/add-api-budget branch February 12, 2025 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants