RHELPLAN-48018 - CI - use github library in test-harness #89

nhosoi · 2020-07-15T18:46:07Z

Using PyGitHub to take advantage of pagination.
Replacing the original Session handle with the pyGitHub handle as gh.

~~Note: it's still in the wip stage.~~

~~Using PyGitHub to take advantage of pagination.~~

~~2 issues arose in the effort.~~
~~1) To avoid the race among the multiple instances of the script, handle_task~~
~~sets status to pending with the timestamp. Then the status is read just~~
~~after that. The task handling goes forward only if the timestamp matches~~
~~the one the instance sets.~~
~~ I could not make the logic work since the PyGitHub does not reread the~~
~~status since it's already read once. That makes the two timestamps not~~
~~match and the following task never be executed.~~
~~Also, to update the status, tried Commit.create_statuses. But the status~~
~~was not retrieved by the following Commit.get_status, either...~~
~~2) In the current usage, dry-run mode does not require token, but PyGitHub~~
~~always require an authentication.~~

richm · 2020-07-15T19:16:24Z

test/run-tests

-        random.shuffle(pulls)
+        pyghrepo = pygh.get_repo(f"{owner}/{repo}")
+        pulls = pyghrepo.get_pulls(state='open')
+        # random.shuffle(pulls) --- cannot shuffle PaginatedList


Can we convert this into a regular python list, then do the shuffle? e.g. pulls = list(pyghrepo.get_pulls(state='open')) ?

Your proposed code works!
I'm wondering if there are more than one page size (30 by default), list function still works or not... What do you think? Or it's hard to imagine there are more than 30 open pull requests in a repo?

Your proposed code works!
I'm wondering if there are more than one page size (30 by default), list function still works or not...

I don't know - you'll have to look at the source code to see if it is implemented as a generator or something like that.

What do you think? Or it's hard to imagine there are more than 30 open pull requests in a repo?

Easier to imagine that github decides to change the page size to 10 or something like that :P

That is - we should make sure it works, regardless of the page size.

In the run-tests script, there are other objects we retrieve from github that may require paging - like statuses and comments - do all of these use the same page size? It should be pretty easy to create more than 30 comments in a PR and see how paging works with comments.

Sorry, my question was not accurate. It was just about list(pyghrepo.get_pulls(state='open')). Other PaginatedList's are handled in the loop, so there should not be an issue regardless of the page size or page counts.
But probably, it might be a good idea to add a converter function from PaginatedList to List...

Worst case, you have to do an explicit loop e.g.
pulls = [pr for pr in pyghrepo.get_pulls(state='open')]
that will presumably page through all of the pages and return you a list which contains all of the prs from all of the pages.

richm · 2020-07-15T19:26:42Z

Is pygh not sending a request to github? Does it have some sort of tracing or debug logging that shows the actual http requests being sent and received? Does it have some sort of cache? If it has a cache, is there some way to clear it or tell it "do not cache this request"?
Alternately - maybe the status string is not the best way to coordinate multiple workers? Maybe they have some other "locking" mechanism?
I think that's ok for now - we can go back later and see if there is a way to make it work without a token - we may have to hack pygh to do so

nhosoi · 2020-07-16T23:56:06Z

1. Is pygh not sending a request to github?  Does it have some sort of tracing or debug logging that shows the actual http requests being sent and received?  Does it have some sort of cache?  If it has a cache, is there some way to clear it or tell it "do not cache this request"?
   Alternately - maybe the status string is not the best way to coordinate multiple workers?  Maybe they have some other "locking" mechanism?

Issue 1 in the commit afe2957 was not an issue.
Retrieving statuses with pyGitHub get_statuses, it returns the statuses in the
order starting from the oldest, which made the timestamp mismatch the one set
just before. By setting reversed and getting the status from the newest, the
expected status is retrieved first and the timestamp mismatch problem is solved.

Using PyGitHub to take advantage of pagination. Replacing the original Session handle with the pyGitHub handle as gh.

nhosoi · 2020-07-17T23:08:11Z

@richm, I revisited the token issue. It seems I made some mistake in the beginning... It turned out there's no problem to use pyGitHub with no auth.

>>> from github import Github
>>> g = Github()
>>> repo = g.get_repo("linux-system-roles/logging")
>>> print(repo.get_pull(153).head)
PullRequestPart(sha="dbf43d522cb9f4b497f8f529a5c8a14d23320278")

Sorry for the noise. I think there's no concern in using pyGitHub. Could you please review this pr?

richm · 2020-07-17T23:21:14Z

test/run-tests



-def get_statuses(gh, owner, repo, sha, max_num_statuses):
+def get_statuses(commits):
    """
    Fetches all statuses of the given commit in a repository and returns a dict
    mapping context to its most recent status.
    Will return at most max_num_statuses.   By default, github will return 30, but
    some of our PRs have more than that.


This comment should be changed.

richm · 2020-07-17T23:21:22Z

test/run-tests

-        if comment_update > after:
-            new_comments.append(comment)
-    comments = new_comments
+    statues = []


richm · 2020-07-17T23:22:47Z

test/run-tests

-            new_comments.append(comment)
-    comments = new_comments
+    statues = []
+    for commit in commits:


This is for multiple commits? Not just head?

Ah, you are right... Somehow, I was thinking I should look into all, but I should not have. Let me fix it.

richm · 2020-07-17T23:25:52Z

test/run-tests

@@ -601,21 +573,21 @@ def check_commit_needs_testing(status, commands):
        return True

    # or the status is pending without a hostname in the description
-    if status["state"] == "pending" and not status.get("description"):
+    if status.state == "pending" and not status.description:


the gh api will return None if you use status.description and the status has no description field?

It returns an empty string, not None. Is it ok?

>>> for status in statuses: ... if status.description == '': ... print(status.state, 'empty') ... else: ... print(status.state, status.description) <<snip>> pending empty failure linux-system-roles-staging-123-9blzw@2020-07-16 04:59:59.853483

yes - so

if status.state == "pending" and not status.description:

is ok

richm · 2020-07-17T23:28:59Z

test/run-tests

        logging.debug(
            "PR will be tested because it is in state 'pending' and has no description"
        )
        return True

    # or a generic re-check was requested:
-    if status["updated_at"] < commands[COMMENT_CMD_TEST_ALL]:
+    if status.updated_at.strftime("%Y%m%d-%H%M%S") < commands[COMMENT_CMD_TEST_ALL]:


The gh api returns the status.updated_at as a DateTime object?

I suggest converting this to a string once e.g.

updated_at = status.updated_at.strftime("%Y%m%d-%H%M%S") if updated_at < commands[COMMENT_CMD_TEST_ALL]: ...

The gh api returns the status.updated_at as a DateTime object?

Yes.

| updated_at | :type: datetime.datetime

richm · 2020-07-17T23:32:16Z

test/run-tests

-            {"context": status_context, "state": "pending", "description": description},
+        commit = gh.get_repo(f"{task.owner}/{task.repo}").get_commit(task.head)
+        commit.create_status(
+            state="pending", context=status_context, description=description


Note that this now takes 2 requests instead of 1 - I'm assuming it is doing a request to get the commit, then another request to create the status

maybe 3 requests - get_repo then get_commit then create_status

Can we stash objects such as commit in Task?

richm · 2020-07-17T23:35:45Z

Not sure if there is a way to get it to dump the number of actual requests made - some of them might be cached - you could use your private github account token instead of the systemroller account, and use https://developer.github.com/v3/rate_limit/ to count how many requests are made for each Task.

richm · 2020-07-18T02:48:47Z

test/run-tests

                continue

-            statuses = get_statuses(gh, owner, repo, head, args.max_num_statuses)
-            commands = get_comment_commands(gh, owner, repo, number)
+            statuses = get_statuses(pull.get_commits())


to get the list of statuses, you need to get the commit using e.g. commit = repo.get_commit(pull.head.sha), then use commit.get_combined_status().statuses to get the list of statuses.

nhosoi · 2020-07-19T06:06:26Z

Not sure if there is a way to get it to dump the number of actual requests made - some of them might be cached - you could use your private github account token instead of the systemroller account, and use https://developer.github.com/v3/rate_limit/ to count how many requests are made for each Task.

I set the logging level to DEBUG and counted the GET and POST counts (assuming both Session and PyGitHub dumps every GET and POST).
Using the same scenario, this pr is calling GET 3 times as much...

     master   pr89
----+------+------
POST      4      4
GET       3     12
----+------+------

More precisely, this is the GET list from the master:

https://api.github.com:443 "GET /repos/linux-system-roles/logging/pulls/154 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f/status?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f/status?per_page=60 HTTP/1.1" 200 None

vs. pr89:

https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/pulls/154 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None

I think I need to investigate some more to eliminate the redundant GET calls...

richm · 2020-07-19T14:51:42Z

It seems that the main engineering effort around CI is an optimization problem, to minimize the number of calls to github - the main limiting factor is the ratelimit. @i386x @pcahyna you have worked with this for a while - perhaps you already came to this conclusion with your previous efforts with CI.

It's possible that using pygh will make the rate limit problem worse . . . it seems the api is designed to get objects like this:
get repo -> get pr -> get commit -> get statuses
that is, it is object oriented - you get the parent object first, then get the child objects, and so on - this is more programmer friendly, but uses many more requests, so is more rate limit unfriendly.

With the current rest based api design, you only get the data you need at the time you need it - the calls to the rest api are minimized - not as programmer friendly, but more rate limit friendly.

It may be that pygh is not designed to limit the number of requests, and it might be quite difficult to use it in such a way as to minimize the number of requests.

My current approach for testing multiple versions of ansible is to have separate deployments (OpenShift deploymentConfig). However, each pod increases the number of concurrent github requests by a factor of 2, increasing the risk of hitting the ratelimit. If we are going to test for multiple versions of ansible in such a way that we minimize the number of requests to github in a given time period, we might need to take a radically different approach - have a single deployment that can test multiple different versions of ansible.

https://github.com/linux-system-roles/test-harness/blob/master/test/run-tests#L414 - here, instead of calling ansible-playbook directly, have a loop that calls multiple versions of ansible:

DEFAULT_ANSIBLE_PLAYBOOK_SCRIPTS = ["ansible-playbook-2.7", "ansible-playbook-2.8", "ansible-playbook-2.9"]
... allow the user to override these in the config ...
for playbook_script in config["ansible_playbook_scripts"]:
    ansible_log = f"{artifactsdir}/{playbook_script}.log"
    for playbook in sorted(playbooks):
            print(f"Testing {playbook}...", end="")
            with redirect_output(ansible_log, mode="a"):
                 result = run(
                    playbook_script,
                    "-vv",
                    f"--inventory={inventory}",

nhosoi · 2020-07-20T05:23:16Z

I did some more research if there is any chance to reduce the GET calls in this PR. I could eliminate 4. But it still requires 8 of them. There could be more to go, but I'd think we cannot achieve as little as our original implementation.

https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/pulls/154 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None

And I should mention that the test duration time is almost doubled compared to the master.
master: Finished in 227 seconds task linux-system-roles/logging/154:rhel-8
pr89: Finished in 429 seconds task linux-system-roles/logging/154:rhel-8

Sad to say, but I'd guess pyGitHub does not fit our needs. It is intuitive and maybe good for some interactive use cases, but not for the tools/services which require performance?

i386x · 2020-07-20T12:39:57Z

@richm, @pcahyna, @nhosoi What about using just one pod as a proxy between GitHub and the rest of pods? Proxy pod will check GitHub and delegate the work to other pods. It also can do some level of caching. This can decrease the number of request dramatically, but on the other hand it can also enforce us to totally rework our CI.

richm · 2020-07-20T12:51:13Z

And I should mention that the test duration time is almost doubled compared to the master.
master: Finished in 227 seconds task linux-system-roles/logging/154:rhel-8
pr89: Finished in 429 seconds task linux-system-roles/logging/154:rhel-8

do you have the ansible logs from those tests?

Sad to say, but I'd guess pyGitHub does not fit our needs. It is intuitive and maybe good for some interactive use cases, but not for the tools/services which require performance?

I think the main problem with pygh is that it makes the ratelimit problem worse. afaict, it shouldn't make the performance much worse - so I'm curious why there is such a difference in the results above

richm · 2020-07-20T12:53:15Z

@richm, @pcahyna, @nhosoi What about using just one pod as a proxy between GitHub and the rest of pods? Proxy pod will check GitHub and delegate the work to other pods. It also can do some level of caching. This can decrease the number of request dramatically, but on the other hand it can also enforce us to totally rework our CI.

That is also a possibility - could you explain more about how that would work?

nhosoi · 2020-07-20T18:35:11Z

And I should mention that the test duration time is almost doubled compared to the master.
master: Finished in 227 seconds task linux-system-roles/logging/154:rhel-8
pr89: Finished in 429 seconds task linux-system-roles/logging/154:rhel-8

do you have the ansible logs from those tests?

Sad to say, but I'd guess pyGitHub does not fit our needs. It is intuitive and maybe good for some interactive use cases, but not for the tools/services which require performance?

I think the main problem with pygh is that it makes the ratelimit problem worse. afaict, it shouldn't make the performance much worse - so I'm curious why there is such a difference in the results above

Sorry, there were some mistakes in my testing. I commented out the time.sleep calls in the scripts and reran the tests. With pyGitHub, it tends to be slower, but not doubled.

         rhel-8   rhel-x
master 222(sec) 364(sec) 
pr89   227(sec) 365(sec)

richm · 2020-07-20T19:07:42Z

And I should mention that the test duration time is almost doubled compared to the master.
master: Finished in 227 seconds task linux-system-roles/logging/154:rhel-8
pr89: Finished in 429 seconds task linux-system-roles/logging/154:rhel-8

do you have the ansible logs from those tests?

Sad to say, but I'd guess pyGitHub does not fit our needs. It is intuitive and maybe good for some interactive use cases, but not for the tools/services which require performance?

I think the main problem with pygh is that it makes the ratelimit problem worse. afaict, it shouldn't make the performance much worse - so I'm curious why there is such a difference in the results above

Sorry, there were some mistakes in my testing. I commented out the time.sleep calls in the scripts and reran the tests. With pyGitHub, it tends to be slower, but not doubled.
         rhel-8   rhel-x
master 222(sec) 364(sec) 
pr89   227(sec) 365(sec)

ok - that's what I would expect - so the main issue is that it makes many more requests which risks ratelimiting

nhosoi · 2020-07-20T19:38:41Z

Closing this PR since there is a better way to improve the CI test script.

Unless there are any stronger demands on using the github library, this issue could be closed, as well.
#64

richm reviewed Jul 15, 2020

View reviewed changes

nhosoi force-pushed the pyGitHub branch 5 times, most recently from 6a11fd4 to dfe88f9 Compare July 16, 2020 23:18

nhosoi changed the title ~~[WIP] - RHELPLAN-48018 - CI - use github library in test-harness~~ RHELPLAN-48018 - CI - use github library in test-harness Jul 16, 2020

RHELPLAN-48018 - CI - use github library in test-harness

9dff4c1

Using PyGitHub to take advantage of pagination. Replacing the original Session handle with the pyGitHub handle as gh.

nhosoi force-pushed the pyGitHub branch from dfe88f9 to 9dff4c1 Compare July 17, 2020 23:00

richm reviewed Jul 17, 2020

View reviewed changes

richm reviewed Jul 18, 2020

View reviewed changes

Fixing bugs pointed out in the reviews by @richm.

48be883

nhosoi force-pushed the pyGitHub branch from 45c6ec8 to 48be883 Compare July 19, 2020 05:41

Reducing the GET calls.

304a979

nhosoi closed this Jul 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHELPLAN-48018 - CI - use github library in test-harness #89

RHELPLAN-48018 - CI - use github library in test-harness #89

nhosoi commented Jul 15, 2020 •

edited

Loading

richm Jul 15, 2020

nhosoi Jul 15, 2020

richm Jul 15, 2020

nhosoi Jul 15, 2020

richm Jul 15, 2020

richm commented Jul 15, 2020

nhosoi commented Jul 16, 2020

nhosoi commented Jul 17, 2020

richm Jul 17, 2020

richm Jul 17, 2020

richm Jul 17, 2020

nhosoi Jul 17, 2020

richm Jul 17, 2020

nhosoi Jul 17, 2020

richm Jul 18, 2020

richm Jul 17, 2020 •

edited

Loading

richm Jul 17, 2020

nhosoi Jul 17, 2020

richm Jul 17, 2020

richm Jul 17, 2020

nhosoi Jul 18, 2020

richm Jul 18, 2020

richm commented Jul 17, 2020

richm Jul 18, 2020

nhosoi commented Jul 19, 2020

richm commented Jul 19, 2020

nhosoi commented Jul 20, 2020

i386x commented Jul 20, 2020

richm commented Jul 20, 2020

richm commented Jul 20, 2020

nhosoi commented Jul 20, 2020

richm commented Jul 20, 2020

nhosoi commented Jul 20, 2020

RHELPLAN-48018 - CI - use github library in test-harness #89

RHELPLAN-48018 - CI - use github library in test-harness #89

Conversation

nhosoi commented Jul 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richm commented Jul 15, 2020

nhosoi commented Jul 16, 2020

nhosoi commented Jul 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richm Jul 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richm commented Jul 17, 2020

Choose a reason for hiding this comment

nhosoi commented Jul 19, 2020

richm commented Jul 19, 2020

nhosoi commented Jul 20, 2020

i386x commented Jul 20, 2020

richm commented Jul 20, 2020

richm commented Jul 20, 2020

nhosoi commented Jul 20, 2020

richm commented Jul 20, 2020

nhosoi commented Jul 20, 2020

nhosoi commented Jul 15, 2020 •

edited

Loading

richm Jul 17, 2020 •

edited

Loading