Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPLT_3672: [Smart concurrency] Scaling up/down asset discovery queue #1849

Merged
merged 27 commits into from
Jan 29, 2025

Conversation

this-is-shivamsingh
Copy link
Contributor

@this-is-shivamsingh this-is-shivamsingh commented Jan 23, 2025

Context:

  1. We are seen issue related to missing-resource, network-idle timeout and page load failed to less disk space/CPU ( CI resources ).
  2. Generally first suggestion given by Dev's /CE's is to lower the concurrency to validate if this helps
  3. Therefore from now, we are going to monitor system metrics from machine/container and ourselves control the concurrency of that.

Changes:

  1. start() -> when percy cli is started. start system metrics
  2. When a job is pushed in discovery queue, before that check for cpu/mem info and update concurrency if needed
  3. starts monitoring if monitoring was stopped due to inactivity on cli
  4. stop monitoring after 5 mins of no-activity, means ( no jobs are pushed in discovery queue )
  5. We are checking for 2 major metrics here ( 1. CPU usage %, and 2. memory usage % )
  6. stop() -> stop monitoring if not already stopped

Client info collection details

  1. cpuInfo:
    • totalcores
    • arch
    • model
  2. memoryInfo:
    • totalDiskSpace
    • swapMemory
  3. OS

New package used

  1. systeminformation: This was done as in macOSX, we had observed at osx, allocated large chunk of memory as buffer cache and not use it, but while getting used memory it also gets included in that. This package helps and handles are those cases for different cpu arch and os

Certain assumptions

  1. Interval for monitoring is every 5 secs
  2. stop monitoring for no-activity after 5 mins

What is left ?

  • Add an env to disable this
  • Add user OS and CPU arch and ENV details to client info [ This will help to reduce to and fro from user ]
  • Add a logging for pod level envirnment
  • Writing specs for the same

Dev Testing:

  • Tested on Linux[debian/ubuntu[/Mac OS
    image
  • Scaling is applied
    image
  • Run sanity with snapshot command, and from sdk flow both

QA Testing:

  • Run sanity on linux OS
  • Run sanity on win OS

@this-is-shivamsingh this-is-shivamsingh requested a review from a team as a code owner January 23, 2025 05:39
@this-is-shivamsingh this-is-shivamsingh marked this pull request as draft January 23, 2025 05:39
fix: minor code bugs && will continue with that
@this-is-shivamsingh this-is-shivamsingh changed the title feat: added functions to all the methods for cpuLoad and memoryUsage PPLT_3672: [Smart concurrency] Scaling up/down asset discovery queue Jan 23, 2025
@this-is-shivamsingh this-is-shivamsingh marked this pull request as ready for review January 27, 2025 06:01
Copy link
Contributor

@prklm10 prklm10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. 🍍 Please share the QA plan.

@this-is-shivamsingh this-is-shivamsingh merged commit e7748bb into master Jan 29, 2025
38 checks passed
@this-is-shivamsingh this-is-shivamsingh deleted the PPLT_3672 branch January 29, 2025 10:24
@this-is-shivamsingh this-is-shivamsingh added the ✨ enhancement New feature or request label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants