Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current concurrency is always 25 #768

Closed
vikyw89 opened this issue Nov 30, 2024 · 4 comments
Closed

current concurrency is always 25 #768

vikyw89 opened this issue Nov 30, 2024 · 4 comments
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@vikyw89
Copy link

vikyw89 commented Nov 30, 2024

desired concurrency, minimum concurrency doesn't do anything

@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Nov 30, 2024
@janbuchar
Copy link
Collaborator

Hello @vikyw89 and thanks for the bug report. Could you please supply a reproduction script and explain the expected results?

@vdusek
Copy link
Collaborator

vdusek commented Dec 2, 2024

Relates #759

@Pijukatel
Copy link
Contributor

@vikyw89 we will soon merge #780. I was not able to tell if it solves the issue you reported. Could you please provide more detailed description? How to reproduce the issue?

I see that under normal circumstances desired concurrency, minimum concurrency,current concurrency can all change, so I need to understand in what circumstances happened the issue you experienced.

@reproduce-bot
Copy link

The following script is generated by AI Agent to help reproduce the issue:

# crawlee-python/reproduce.py
import asyncio
import sys
from unittest.mock import Mock
import asyncio
from datetime import timedelta

sys.modules['crawlee'] = Mock()
sys.modules['crawlee._autoscaling'] = Mock()
sys.modules['crawlee._types'] = Mock()
sys.modules['crawlee._types'].ConcurrencySettings = ConcurrencySettings

from crawlee._autoscaling import AutoscaledPool, SystemStatus
from crawlee._types import ConcurrencySettings

async def test_fixed_concurrency_issue():
    system_status = Mock(spec=SystemStatus)

    async def run() -> None:
        await asyncio.sleep(0.1)

    concurrency_settings = ConcurrencySettings(
        min_concurrency=5,
        desired_concurrency=10,
        max_concurrency=20,
    )

    print(f"Initial desired concurrency: {concurrency_settings.desired_concurrency}")
    pool = AutoscaledPool(
        system_status=system_status,
        run_task_function=run,
        is_task_ready_function=lambda: asyncio.Future().set_result(True),
        is_finished_function=lambda: asyncio.Future().set_result(False),
        concurrency_settings=ConcurrencySettings(
            min_concurrency=5,
            desired_concurrency=10,
            max_concurrency=20,
        ),
        autoscale_interval=timedelta(seconds=0.1),
    )

    assert pool.desired_concurrency == 10, "Initial desired concurrency should be set to 10"
    
    pool._desired_concurrency = 25  # Simulating the fixed concurrency issue

    try:
        await pool.run()
    except RuntimeError as e:
        raise AssertionError(e)  # Raise an error if concurrency does not adapt dynamically

    assert pool.desired_concurrency != 25, "Desired concurrency should not remain fixed at 25"

if __name__ == "__main__":
    try:
        asyncio.run(test_fixed_concurrency_issue())
    except AssertionError as e:
        raise AssertionError("Test failed as expected with error:", e)

How to run:

python3 crawlee-python/reproduce.py

Thank you for your valuable contribution to this project and we appreciate your feedback! Please respond with an emoji if you find this script helpful. Feel free to comment below if any improvements are needed.

Best regards from an AI Agent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

5 participants