Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHELPLAN-48018 - CI - use github library in test-harness #89

Closed
wants to merge 3 commits into from

Conversation

nhosoi
Copy link
Contributor

@nhosoi nhosoi commented Jul 15, 2020

Using PyGitHub to take advantage of pagination.
Replacing the original Session handle with the pyGitHub handle as gh.

Note: it's still in the wip stage.

Using PyGitHub to take advantage of pagination.

2 issues arose in the effort.
1) To avoid the race among the multiple instances of the script, handle_task
sets status to pending with the timestamp. Then the status is read just
after that. The task handling goes forward only if the timestamp matches
the one the instance sets.
~~ I could not make the logic work since the PyGitHub does not reread the~~
status since it's already read once. That makes the two timestamps not
match and the following task never be executed.
Also, to update the status, tried Commit.create_statuses. But the status
was not retrieved by the following Commit.get_status, either...
2) In the current usage, dry-run mode does not require token, but PyGitHub
always require an authentication.

test/run-tests Outdated
random.shuffle(pulls)
pyghrepo = pygh.get_repo(f"{owner}/{repo}")
pulls = pyghrepo.get_pulls(state='open')
# random.shuffle(pulls) --- cannot shuffle PaginatedList
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we convert this into a regular python list, then do the shuffle? e.g. pulls = list(pyghrepo.get_pulls(state='open')) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposed code works!
I'm wondering if there are more than one page size (30 by default), list function still works or not... What do you think? Or it's hard to imagine there are more than 30 open pull requests in a repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposed code works!
I'm wondering if there are more than one page size (30 by default), list function still works or not...

I don't know - you'll have to look at the source code to see if it is implemented as a generator or something like that.

What do you think? Or it's hard to imagine there are more than 30 open pull requests in a repo?

Easier to imagine that github decides to change the page size to 10 or something like that :P

That is - we should make sure it works, regardless of the page size.

In the run-tests script, there are other objects we retrieve from github that may require paging - like statuses and comments - do all of these use the same page size? It should be pretty easy to create more than 30 comments in a PR and see how paging works with comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my question was not accurate. It was just about list(pyghrepo.get_pulls(state='open')). Other PaginatedList's are handled in the loop, so there should not be an issue regardless of the page size or page counts.
But probably, it might be a good idea to add a converter function from PaginatedList to List...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worst case, you have to do an explicit loop e.g.
pulls = [pr for pr in pyghrepo.get_pulls(state='open')]
that will presumably page through all of the pages and return you a list which contains all of the prs from all of the pages.

@richm
Copy link
Contributor

richm commented Jul 15, 2020

  1. Is pygh not sending a request to github? Does it have some sort of tracing or debug logging that shows the actual http requests being sent and received? Does it have some sort of cache? If it has a cache, is there some way to clear it or tell it "do not cache this request"?
    Alternately - maybe the status string is not the best way to coordinate multiple workers? Maybe they have some other "locking" mechanism?

  2. I think that's ok for now - we can go back later and see if there is a way to make it work without a token - we may have to hack pygh to do so

@nhosoi nhosoi force-pushed the pyGitHub branch 5 times, most recently from 6a11fd4 to dfe88f9 Compare July 16, 2020 23:18
@nhosoi
Copy link
Contributor Author

nhosoi commented Jul 16, 2020

1. Is pygh not sending a request to github?  Does it have some sort of tracing or debug logging that shows the actual http requests being sent and received?  Does it have some sort of cache?  If it has a cache, is there some way to clear it or tell it "do not cache this request"?
   Alternately - maybe the status string is not the best way to coordinate multiple workers?  Maybe they have some other "locking" mechanism?

Issue 1 in the commit afe2957 was not an issue.
Retrieving statuses with pyGitHub get_statuses, it returns the statuses in the
order starting from the oldest, which made the timestamp mismatch the one set
just before. By setting reversed and getting the status from the newest, the
expected status is retrieved first and the timestamp mismatch problem is solved.

@nhosoi nhosoi changed the title [WIP] - RHELPLAN-48018 - CI - use github library in test-harness RHELPLAN-48018 - CI - use github library in test-harness Jul 16, 2020
Using PyGitHub to take advantage of pagination.
Replacing the original Session handle with the pyGitHub handle as gh.
@nhosoi
Copy link
Contributor Author

nhosoi commented Jul 17, 2020

@richm, I revisited the token issue. It seems I made some mistake in the beginning... It turned out there's no problem to use pyGitHub with no auth.

>>> from github import Github
>>> g = Github()
>>> repo = g.get_repo("linux-system-roles/logging")
>>> print(repo.get_pull(153).head)
PullRequestPart(sha="dbf43d522cb9f4b497f8f529a5c8a14d23320278")

Sorry for the noise. I think there's no concern in using pyGitHub. Could you please review this pr?

test/run-tests Outdated


def get_statuses(gh, owner, repo, sha, max_num_statuses):
def get_statuses(commits):
"""
Fetches all statuses of the given commit in a repository and returns a dict
mapping context to its most recent status.
Will return at most max_num_statuses. By default, github will return 30, but
some of our PRs have more than that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be changed.

test/run-tests Outdated
if comment_update > after:
new_comments.append(comment)
comments = new_comments
statues = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statuses

test/run-tests Outdated
new_comments.append(comment)
comments = new_comments
statues = []
for commit in commits:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for multiple commits? Not just head?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are right... Somehow, I was thinking I should look into all, but I should not have. Let me fix it.

@@ -601,21 +573,21 @@ def check_commit_needs_testing(status, commands):
return True

# or the status is pending without a hostname in the description
if status["state"] == "pending" and not status.get("description"):
if status.state == "pending" and not status.description:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gh api will return None if you use status.description and the status has no description field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns an empty string, not None. Is it ok?

>>> for status in statuses:
...   if status.description == '':
...     print(status.state, 'empty')
...   else:
...     print(status.state, status.description)
<<snip>>
pending empty
failure linux-system-roles-staging-123-9blzw@2020-07-16 04:59:59.853483

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - so

     if status.state == "pending" and not status.description:

is ok

test/run-tests Outdated
logging.debug(
"PR will be tested because it is in state 'pending' and has no description"
)
return True

# or a generic re-check was requested:
if status["updated_at"] < commands[COMMENT_CMD_TEST_ALL]:
if status.updated_at.strftime("%Y%m%d-%H%M%S") < commands[COMMENT_CMD_TEST_ALL]:
Copy link
Contributor

@richm richm Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gh api returns the status.updated_at as a DateTime object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest converting this to a string once e.g.

updated_at = status.updated_at.strftime("%Y%m%d-%H%M%S")
if updated_at < commands[COMMENT_CMD_TEST_ALL]:
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gh api returns the status.updated_at as a DateTime object?

Yes.

 |  updated_at
 |      :type: datetime.datetime

{"context": status_context, "state": "pending", "description": description},
commit = gh.get_repo(f"{task.owner}/{task.repo}").get_commit(task.head)
commit.create_status(
state="pending", context=status_context, description=description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this now takes 2 requests instead of 1 - I'm assuming it is doing a request to get the commit, then another request to create the status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 3 requests - get_repo then get_commit then create_status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stash objects such as commit in Task?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@richm
Copy link
Contributor

richm commented Jul 17, 2020

Not sure if there is a way to get it to dump the number of actual requests made - some of them might be cached - you could use your private github account token instead of the systemroller account, and use https://developer.github.com/v3/rate_limit/ to count how many requests are made for each Task.

test/run-tests Outdated
continue

statuses = get_statuses(gh, owner, repo, head, args.max_num_statuses)
commands = get_comment_commands(gh, owner, repo, number)
statuses = get_statuses(pull.get_commits())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to get the list of statuses, you need to get the commit using e.g. commit = repo.get_commit(pull.head.sha), then use commit.get_combined_status().statuses to get the list of statuses.

@nhosoi
Copy link
Contributor Author

nhosoi commented Jul 19, 2020

Not sure if there is a way to get it to dump the number of actual requests made - some of them might be cached - you could use your private github account token instead of the systemroller account, and use https://developer.github.com/v3/rate_limit/ to count how many requests are made for each Task.

I set the logging level to DEBUG and counted the GET and POST counts (assuming both Session and PyGitHub dumps every GET and POST).
Using the same scenario, this pr is calling GET 3 times as much...

     master   pr89
----+------+------
POST      4      4
GET       3     12
----+------+------

More precisely, this is the GET list from the master:

https://api.github.com:443 "GET /repos/linux-system-roles/logging/pulls/154 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f/status?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f/status?per_page=60 HTTP/1.1" 200 None

vs. pr89:

https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/pulls/154 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None

I think I need to investigate some more to eliminate the redundant GET calls...

@richm
Copy link
Contributor

richm commented Jul 19, 2020

It seems that the main engineering effort around CI is an optimization problem, to minimize the number of calls to github - the main limiting factor is the ratelimit. @i386x @pcahyna you have worked with this for a while - perhaps you already came to this conclusion with your previous efforts with CI.

It's possible that using pygh will make the rate limit problem worse . . . it seems the api is designed to get objects like this:
get repo -> get pr -> get commit -> get statuses
that is, it is object oriented - you get the parent object first, then get the child objects, and so on - this is more programmer friendly, but uses many more requests, so is more rate limit unfriendly.

With the current rest based api design, you only get the data you need at the time you need it - the calls to the rest api are minimized - not as programmer friendly, but more rate limit friendly.

It may be that pygh is not designed to limit the number of requests, and it might be quite difficult to use it in such a way as to minimize the number of requests.

My current approach for testing multiple versions of ansible is to have separate deployments (OpenShift deploymentConfig). However, each pod increases the number of concurrent github requests by a factor of 2, increasing the risk of hitting the ratelimit. If we are going to test for multiple versions of ansible in such a way that we minimize the number of requests to github in a given time period, we might need to take a radically different approach - have a single deployment that can test multiple different versions of ansible.

https://github.com/linux-system-roles/test-harness/blob/master/test/run-tests#L414 - here, instead of calling ansible-playbook directly, have a loop that calls multiple versions of ansible:

DEFAULT_ANSIBLE_PLAYBOOK_SCRIPTS = ["ansible-playbook-2.7", "ansible-playbook-2.8", "ansible-playbook-2.9"]
... allow the user to override these in the config ...
for playbook_script in config["ansible_playbook_scripts"]:
    ansible_log = f"{artifactsdir}/{playbook_script}.log"
    for playbook in sorted(playbooks):
            print(f"Testing {playbook}...", end="")
            with redirect_output(ansible_log, mode="a"):
                 result = run(
                    playbook_script,
                    "-vv",
                    f"--inventory={inventory}",

@nhosoi
Copy link
Contributor Author

nhosoi commented Jul 20, 2020

I did some more research if there is any chance to reduce the GET calls in this PR. I could eliminate 4. But it still requires 8 of them. There could be more to go, but I'd think we cannot achieve as little as our original implementation.

https://api.github.com:443 "GET /repos/linux-system-roles/logging HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/pulls/154 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/commits/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None
https://api.github.com:443 "GET /repos/linux-system-roles/logging/statuses/c907a05b6721e23ba1d65fb5bcaf70bf39e4d51f?per_page=60 HTTP/1.1" 200 None

And I should mention that the test duration time is almost doubled compared to the master.
master: Finished in 227 seconds task linux-system-roles/logging/154:rhel-8
pr89: Finished in 429 seconds task linux-system-roles/logging/154:rhel-8

Sad to say, but I'd guess pyGitHub does not fit our needs. It is intuitive and maybe good for some interactive use cases, but not for the tools/services which require performance?

@i386x
Copy link
Contributor

i386x commented Jul 20, 2020

@richm, @pcahyna, @nhosoi What about using just one pod as a proxy between GitHub and the rest of pods? Proxy pod will check GitHub and delegate the work to other pods. It also can do some level of caching. This can decrease the number of request dramatically, but on the other hand it can also enforce us to totally rework our CI.

@richm
Copy link
Contributor

richm commented Jul 20, 2020

And I should mention that the test duration time is almost doubled compared to the master.
master: Finished in 227 seconds task linux-system-roles/logging/154:rhel-8
pr89: Finished in 429 seconds task linux-system-roles/logging/154:rhel-8

do you have the ansible logs from those tests?

Sad to say, but I'd guess pyGitHub does not fit our needs. It is intuitive and maybe good for some interactive use cases, but not for the tools/services which require performance?

I think the main problem with pygh is that it makes the ratelimit problem worse. afaict, it shouldn't make the performance much worse - so I'm curious why there is such a difference in the results above

@richm
Copy link
Contributor

richm commented Jul 20, 2020

@richm, @pcahyna, @nhosoi What about using just one pod as a proxy between GitHub and the rest of pods? Proxy pod will check GitHub and delegate the work to other pods. It also can do some level of caching. This can decrease the number of request dramatically, but on the other hand it can also enforce us to totally rework our CI.

That is also a possibility - could you explain more about how that would work?

@nhosoi
Copy link
Contributor Author

nhosoi commented Jul 20, 2020

And I should mention that the test duration time is almost doubled compared to the master.
master: Finished in 227 seconds task linux-system-roles/logging/154:rhel-8
pr89: Finished in 429 seconds task linux-system-roles/logging/154:rhel-8

do you have the ansible logs from those tests?

Sad to say, but I'd guess pyGitHub does not fit our needs. It is intuitive and maybe good for some interactive use cases, but not for the tools/services which require performance?

I think the main problem with pygh is that it makes the ratelimit problem worse. afaict, it shouldn't make the performance much worse - so I'm curious why there is such a difference in the results above

Sorry, there were some mistakes in my testing. I commented out the time.sleep calls in the scripts and reran the tests. With pyGitHub, it tends to be slower, but not doubled.

         rhel-8   rhel-x
master 222(sec) 364(sec) 
pr89   227(sec) 365(sec)

@richm
Copy link
Contributor

richm commented Jul 20, 2020

And I should mention that the test duration time is almost doubled compared to the master.
master: Finished in 227 seconds task linux-system-roles/logging/154:rhel-8
pr89: Finished in 429 seconds task linux-system-roles/logging/154:rhel-8

do you have the ansible logs from those tests?

Sad to say, but I'd guess pyGitHub does not fit our needs. It is intuitive and maybe good for some interactive use cases, but not for the tools/services which require performance?

I think the main problem with pygh is that it makes the ratelimit problem worse. afaict, it shouldn't make the performance much worse - so I'm curious why there is such a difference in the results above

Sorry, there were some mistakes in my testing. I commented out the time.sleep calls in the scripts and reran the tests. With pyGitHub, it tends to be slower, but not doubled.

         rhel-8   rhel-x
master 222(sec) 364(sec) 
pr89   227(sec) 365(sec)

ok - that's what I would expect - so the main issue is that it makes many more requests which risks ratelimiting

@nhosoi
Copy link
Contributor Author

nhosoi commented Jul 20, 2020

Closing this PR since there is a better way to improve the CI test script.

Unless there are any stronger demands on using the github library, this issue could be closed, as well.
#64

@nhosoi nhosoi closed this Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants