Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: document that parallelized=True resources with add_limit(x) usually yield x-1 #2142

Closed

Conversation

joscha
Copy link
Contributor

@joscha joscha commented Dec 12, 2024

Most of the time (80%ish or so) parallelized resources that are limited yield one item less.

Copy link

netlify bot commented Dec 12, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit b458594
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/675ac14e103de0000861f505
😎 Deploy Preview https://deploy-preview-2142--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.


limit = 5
result = list(sync_resource1().add_limit(limit))
allowed_result_range = range(limit - int(parallelized), limit + 1)
Copy link
Contributor Author

@joscha joscha Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not ideal.
Ideal would be to make the parallel yields exact. I tried this for a few hours to no avail. Next best would be to run this test x times when parallelized=True and then ensure that the threshold of yields lies below 4.5.

@joscha joscha changed the title docs: document that parallelized resources usually produce one less than the limit docs: document that parallelized=True resources with add_limit(x) usually yield x-1 Dec 12, 2024
@sh-rp
Copy link
Collaborator

sh-rp commented Dec 13, 2024

@joscha thanks for this PR. If you look in the PR list, you can see that I have changed the add_limit implementation. You could check out that branch and see wether you are still seeing those problem there or maybe even add your test there. I think it might actually be resolved.

@joscha
Copy link
Contributor Author

joscha commented Dec 13, 2024

Will give your branch a try!

@joscha
Copy link
Contributor Author

joscha commented Dec 13, 2024

@joscha thanks for this PR. If you look in the PR list, you can see that I have changed the add_limit implementation. You could check out that branch and see wether you are still seeing those problem there or maybe even add your test there. I think it might actually be resolved.

yes, current code of #2131 (3738c29) reliably produces exacltly limit results in both parallelized and non-parallelized resources.

@sh-rp
Copy link
Collaborator

sh-rp commented Dec 13, 2024

@joscha amazing, your helping out on this is very much appreciated :)

@joscha
Copy link
Contributor Author

joscha commented Dec 13, 2024

@joscha amazing, your helping out on this is very much appreciated :)

I am glad it now produces exactly limit results to be honest. I was quite stumped for a bit, thought the REST api I was using had an issue. I might make sense to merge the test in this pull request after merging #2131, so we pick up regressions. As far as I can tell this is currently not asserted anywhere. With your changes in 3738c29 the code in this PR becomes:

@pytest.mark.parametrize("parallelized", [False, True])
def test_limit_sync_resource(parallelized: bool) -> None:
    @dlt.resource(parallelized=parallelized)
    def sync_resource1():
        for i in range(1, 10):
            yield i

    limit = 5
    result = list(sync_resource1().add_limit(limit))
    assert len(result) == limit

which is precise enough to keep.

@sh-rp
Copy link
Collaborator

sh-rp commented Dec 16, 2024

@joscha, I think this case is already tested with the async resource test (as a parallized resource is just that under the hood) but I added this case to my PR. I'll close this one now :)

@sh-rp sh-rp closed this Dec 16, 2024
@joscha joscha deleted the joscha/parallelized-sync-sources branch December 16, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants