Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diegorusso-aarch64-bigmem worker #546

Merged
merged 4 commits into from
Oct 29, 2024

Conversation

diegorusso
Copy link
Contributor

@diegorusso diegorusso commented Oct 24, 2024

This worker avoids builds to be started between 10pm and 2am. They will be scheduled at 2am.

This addresses issue #473

@diegorusso
Copy link
Contributor Author

Folks, I'm not an expert on buildbot. With this worker we want to avoid builds between 10pm and 2am because at midnight we run pyperformance to push data to speed.python.org.
The worker is set at UTC but I'm not sure what's the timezone of the buildbot master. Can anyone shed some lights? if the timezone is not UTC, we need to tweak the time interval.

@diegorusso
Copy link
Contributor Author

diegorusso commented Oct 24, 2024

Also this worker has:

  • Cores: 80 but the latest 16 CPUs (ID 64-79) are for running pyperformance and we want to make sure that pyperformance runs with affinity on one of these CPUs. The machine is configured with isolcpus=64-79 rcu_nocbs=64-79 at kernel level. Hence usable cores for building CPython are 0-63.
  • RAM: 250GB

@diegorusso diegorusso requested a review from ambv October 24, 2024 16:56
master/custom/workers.py Show resolved Hide resolved
master/master.cfg Outdated Show resolved Hide resolved
master/master.cfg Outdated Show resolved Hide resolved
@@ -125,6 +126,8 @@
("aarch64 RHEL8 LTO", "cstratak-RHEL8-aarch64", LTONonDebugUnixBuild),
("aarch64 RHEL8 LTO + PGO", "cstratak-RHEL8-aarch64", LTOPGONonDebugBuild),

("aarch64 Ubuntu 22.04 BigMem", "diegorusso-aarch64-bigmem", UnixBigmemBuild),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to start in the unstable set, around line 245. Once we've proven that things work as expected, we can move back here to the stable set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

class UnixBigmemBuild(UnixBuild):
buildersuffix = ".bigmem"
testFlags = ["-M60g", "-j4", "-uall,extralargefile"]
factory_tags = ["aarch64", "bigmem"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aarch64 tag doesn't really belong here; there's nothing specific to aarch64 in this builder factory (it would work perfectly well with a similarly beefy x86_64 machine). We do need to fix how tags are assigned to include tags from the worker as well, though.

Copy link
Contributor Author

@diegorusso diegorusso Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, removed.

master/master.cfg Show resolved Hide resolved
This worker avoids builds to be started between 10pm and 2am. They will
be scheduled at 2am.
@diegorusso diegorusso force-pushed the diegorusso-aarch64-bigmem branch from 8c33f7f to 5c4e84b Compare October 25, 2024 11:03
@diegorusso diegorusso force-pushed the diegorusso-aarch64-bigmem branch from 29b8998 to 8b9ee92 Compare October 25, 2024 11:12
@diegorusso diegorusso requested review from zware and picnixz October 25, 2024 19:16
@zware
Copy link
Member

zware commented Oct 26, 2024

Please note, GitHub's review interface is useless with force-pushes. This repository does squash merges just like cpython, so it's better to just push new commits with review changes.

name="diegorusso-aarch64-bigmem",
tags=['linux', 'unix', 'ubuntu', 'arm', 'arm64', 'aarch64', 'bigmem'],
not_branches=['3.9', '3.10', '3.11', '3.12', '3.13'],
parallel_tests=60,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticing this value, it seems a little extreme for a bigmem builder. Theoretically, this could call on 3600GB of RAM (60 tests x 60GB each from the -M60g option). In practice there aren't enough bigmem tests to actually request that, but this is a big enough number of tests that it becomes likely that multiple (possibly all) bigmem tests will be running in parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my bad. Decreased to 4, so we should be good (60*4=240)

def no_builds_between(start, end):
def f(builder, requests):
now = datetime.now()
if start <= now.hour < end:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will never be true with start=22 and end=2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I wrote the prototype but then left as is. I'll fix it tomorrow.

Copy link
Contributor Author

@diegorusso diegorusso Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully now the code is better to read and understand. Now it works when the time range spans over midnight.

@diegorusso
Copy link
Contributor Author

diegorusso commented Oct 28, 2024

Please note, GitHub's review interface is useless with force-pushes. This repository does squash merges just like cpython, so it's better to just push new commits with review changes.

I know, the force push were to fix a rebase I did on top of main. I really don't like the merge commit that Github does when "updating" the branch when outdated. IIRC the other force push was to fix a typo in the commit message.
Anyway, I'll be more careful with those as I'm aware that force pushes are not great (in general and) in Github.

Copy link
Member

@zware zware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

One question, though: is there any way to do the scheduling trickery at the worker level rather than the builder level? I feel like that might be a cleaner long term solution, but as I'm not sure it's possible I'm not going to push for it :). If we do determine later that it's doable (or we have issues with what's here) we can always change it later.

@zware zware merged commit d5e7bc5 into python:main Oct 29, 2024
1 check passed
@zware
Copy link
Member

zware commented Oct 29, 2024

@diegorusso
Copy link
Contributor Author

LGTM.

One question, though: is there any way to do the scheduling trickery at the worker level rather than the builder level? I feel like that might be a cleaner long term solution, but as I'm not sure it's possible I'm not going to push for it :). If we do determine later that it's doable (or we have issues with what's here) we can always change it later.

Thanks for merging the PR. The worker is now online! https://buildbot.python.org/#/workers/114

Regarding your comment, I agree with you we should do this at the worker level but I'm no expert in buildbot and this is a way I found to do it. If anyone out there could point me to the right direction, I'll be more than happy to do it.

@diegorusso diegorusso deleted the diegorusso-aarch64-bigmem branch October 30, 2024 11:29
# Schedule the build later
builder.master.reactor.callLater(
int(delay),
builder.buildset_manager.submitBuildSet,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting an exception here:

    Traceback (most recent call last):
      File "/srv/buildbot/venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 980, in _startRunCallbacks
        self._runCallbacks()
      File "/srv/buildbot/venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1074, in _runCallbacks
        current.result = callback(  # type: ignore[misc]
      File "/srv/buildbot/venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1960, in _gotResultInlineCallbacks
        _inlineCallbacks(r, gen, status, context)
      File "/srv/buildbot/venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 2014, in _inlineCallbacks
        result = context.run(gen.send, result)
    --- <exception caught here> ---
      File "/srv/buildbot/venv/lib/python3.9/site-packages/buildbot/process/buildrequestdistributor.py", line 262, in _getNextUnclaimedBuildRequest
        nextBreq = yield self.nextBuild(self.bldr, breqs)
      File "/srv/buildbot/master/master.cfg", line 216, in f
        builder.buildset_manager.submitBuildSet,
    builtins.AttributeError: 'Builder' object has no attribute 'buildset_manager'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the nextBuild and Twisted docs correctly, this should return a Deferred like this:

from twisted.internet import defer
...
            deferred = defer.Deferred()
            builder.master.reactor.callLater(
                int(delay),
                deferred.callback,
                requests[0],
            )
            return deferred

But, it would be nice to get confirmation from someone who's used Twisted before :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants