Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull Precheck #45

Open
0x4007 opened this issue Sep 14, 2024 · 30 comments · May be fixed by ubiquity-os-marketplace/command-ask#11
Open

Pull Precheck #45

0x4007 opened this issue Sep 14, 2024 · 30 comments · May be fixed by ubiquity-os-marketplace/command-ask#11

Comments

@0x4007
Copy link
Member

0x4007 commented Sep 14, 2024

We have waves of contributors that open pulls sometimes keeping our review team busy.

We can save valuable man hours by having the bot preemptively check if the pull achieves the specification.

Pull Flow

  1. Assuming that the contributor follows directions, the pull must be initially opened up as a draft.
  2. When it is ready for review, they should turn it into a finalized pull request.
  3. The bot should consume the issue specification and then the pull diff into its context.
  4. The bot should return actionable feedback for what is missing from the specification. It should leave the review state as requested changes. It should also convert the pull back into a draft. However if it passes, it should leave as just "commented" (not approval).
  5. If a collaborator (reviewer on the team) converts the pull back from a draft into a finalized pull, then the bot should back off and stop intervening. Basically the inspection should only occur on pull.created and when it's converted from draft to finalized by the pull author.

Bonus but maybe can be handled in other tasks:

  1. Ensure CI is passing. I didn't include this here because sometimes there are problems out of the control of the pull author.
  2. Limits: some novice contributors might abuse the code review feature and burn out credits by continuously requesting reviews every time they make a tiny change that doesn't achieve the specification. We have seen folks do hundreds of commits for tasks that required a small amount of lines of code. We can limit the reviews to one per day for ChatGPT. However it isn't clear to me how we can automatically have it check again the next day if they request a review. Maybe that can be handled by some later task.

We should use o1-mini.

@ubadineke
Copy link

/start

Copy link

ubiquity-os bot commented Sep 18, 2024

! Please set your wallet address with the /wallet command first and try again.

Copy link

ubiquity-os bot commented Sep 18, 2024

! No wallet address found

@ubadineke
Copy link

/wallet 0x3Ea855E4D6440D937117c776501e7653a770b759

Copy link

ubiquity-os bot commented Sep 18, 2024

+ Successfully registered wallet address

@ubadineke
Copy link

/start

Copy link

ubiquity-os bot commented Sep 18, 2024

DeadlineWed, Oct 2, 9:41 PM UTC
Beneficiary 0x3Ea855E4D6440D937117c776501e7653a770b759
Tips:
  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.

@ubadineke
Copy link

@Keyrxng should I implement this feature in a new file? and also what should happen if the pull is not initially opened as a draft?

@Keyrxng
Copy link
Member

Keyrxng commented Sep 19, 2024

@0x4007 I had gpt re-rewrite for brevity but it removed what I'd call context but this still makes sense.

Discussion:

Spec consideration > Embedding effectiveness > Bot intent & current model

https://chatgpt.com/share/66ec086f-7d70-8000-a914-991b77a819b0

Conclusion:
The current model (embedding-based) struggles to provide detailed, code-specific feedback. While embeddings are useful for high-level semantic comparisons, they lack the precision needed for effective PR reviews. You'll need additional tools to meet your goal.


Task Understanding:

  • Capture PR diffs (optimize by excluding files like yarn.lock)
  • Compare diff embedding with task spec embedding to identify issues.
  • If PR fails, convert it to draft and provide actionable feedback.
  • Stop intervention when a reviewer finalizes the draft.

This is very similar to /review, and separately, /gpt currently has the ability to process full PR diffs, spec, and conversation per query across multiple issues/repos. However, embeddings aren't ideal for this. Raw text analysis (task spec + diff) is more effective since we need precise feedback, not high-level similarity or semantic meaning.

The plugin's effectiveness must be high to avoid becoming annoying or useless. Therefore, it makes sense to re-spec or close this idea since embeddings aren’t suitable.


To be clear, the intention of this spec was to automate the initial review process and is feasible but better done without embeddings in my opinion. If the /review approach is being replaced by this then I think the following workflow works best:

Suggested Workflow:

  • Plugin triggers on pull_request.created (can run a review here or not and on whether it's opened as a draft or ready_for_review)
  • Plugin triggers on pull_request.ready_for_review.
  • Use the task spec and sanitized PR diff to prompt GPT for code review.
  • Post review comment and convert PR to draft if GPT identifies issues; add commented status otherwise.
  • Define a consistent comment format for GPT reviews, since all GPT tasks I've handled had pretty specific input/output formats/prompt structures introduced.
  • If a collaborator (not the assignee) finalizes the PR, stop intervening.
  • Limit bot reviews to once every 24 hours by detecting if a previous AI review comment exists or a timeline review event does.

Additionally the task is way overpriced, ubiquity/ubiquibot#746 is the /review command I authored and was priced at $150 and the /gpt command is priced at $200 lmaoo, and I spent days/weeks on these. $400 max, we are simply feeding two bodies of text to GPT with a custom prompt to have it perform a review with a couple of conditionals afterwards to add a comment and update a review status.


should I implement this feature in a new file? and also what should happen if the pull is not initially opened as a draft?

@ubadineke I'm unsure personally at this point as I did not write this task specification. I have left comments and once @0x4007 replies I'll have a much better idea of how this task should be implemented.

  1. Yes this feature will span multiple new files, try to keep things clean and tidy.
  2. Personally I'd account for this and just run a review on pull_request.created which will allow it through or kick it back into draft just like 0x4007 said. Future reviews would be tied to pull_request.ready_for_review distanced by at least 24hrs between each using the timeline or review comments.

@ubadineke
Copy link

ubadineke commented Sep 19, 2024

@Keyrxng @0x4007 What if the PR is created and the related issue is not immediately mentioned in the body. We just move it back to draft right and tell the user to edit the PR and specify issue?

The existing event payload style doesn't have a field for PR Number, maybe it was just designed for comments. Can modifications be made?
If the issue must be specified from the payload then the first paragraph in my comment is redundant.

@Keyrxng
Copy link
Member

Keyrxng commented Sep 19, 2024

I guess that's a good idea as we like to strongly enforce the linked issue although it's not a guarantee each PR has or needs this such as quick fixes for example.

I'd say don't do anything, our pull request template enforces this format and legit cases exist where it is not used.


The existing event payload style doesn't have a field for PR Number, maybe it was just designed for comments. Can modifications be made?
If the issue must be specified from the payload then the first paragraph in my comment is redundant.

I'm not sure I understand but issues and PRs are both considered as an issue by the GitHub api, there is no PR number only issue.number. You can get linked issues via the GraphQL API most reliably.

If a PR does not have an associated task specification then without having codebase embeddings to compare the diff against we don't have a lot of context other than what they have changed via the diff. Which can still be reviewed by GPT on it's own for logic errors and optimizations it'll just be more general review without the guard rail of the spec.

@0x4007
Copy link
Member Author

0x4007 commented Sep 20, 2024

I think it would be more appropriate for you to extend your /gpt command logic. You can handle it in a separate pull. Over on your old pull I said to merge and let's test in production. From there you can extend the logic as part of this task.

This task makes it more seamlessly integrated (but looks like your previous work set a lot of relevant ground work)

This requires more deeper integration into GitHub review workflow such as by setting review states and switching between draft and non draft etc.

I have extra time because the prompt engineering alone I'm sure will take a couple days to get the results we want.

Yes let's keep this plaintext no embeddings needed.

@0x4007 0x4007 assigned Keyrxng and unassigned ubadineke Sep 20, 2024
Copy link

ubiquity-os bot commented Sep 20, 2024

@Keyrxng the deadline is at Fri, Oct 4, 2:47 AM UTC

@0x4007 0x4007 transferred this issue from ubiquity-os-marketplace/text-vector-embeddings Sep 20, 2024
@ubadineke
Copy link

/start

Copy link

ubiquity-os bot commented Sep 20, 2024

! This issue is already assigned. Please choose another unassigned task.

@ubadineke
Copy link

Hey @0x4007,

I noticed Keyrxng might be busy, so I went ahead and took on the task. I’ve put together an initial version of the implementation, which you can check out here.

Right now, it only triggers on pull_request.created, and I still need to add the logic for pull_request.ready_for_review. I also extended the GPT logic like you mentioned before.

I’d really like to see this through to the end! If you could create a repo for this plugin, I can make a PR and keep working on it and then we can make adjustments to the prompting.

Thanks!

Copy link

ubiquity-os bot commented Sep 21, 2024

@ubadineke, @Keyrxng the deadline is at Sat, Sep 28, 8:00 AM UTC

@0x4007
Copy link
Member Author

0x4007 commented Sep 21, 2024

@Keyrxng you should be able to make the repo I'm not on computer

@Keyrxng
Copy link
Member

Keyrxng commented Sep 21, 2024

I think it would be more appropriate for you to extend your /gpt command logic

If that's the case then no new repo is required.

I noticed Keyrxng might be busy, so I went ahead and took on the task.

Not typically how things go around here, if you are unassigned from a task you shouldn't technically be allowed to work it again.

I’ve put together an initial version of the implementation, which you can check out here.

Your implementation seems to be more or less a copy/paste of my super early /review command which if we are extending /gpt which seems like a better idea as it has much more reach with context although it will require guidance with context in terms of review etc.

I think it's best that I resolve this task solo today by refactoring/upgrading /gpt, apologizes @ubadineke I appreciate your enthusiasm tho.


@0x4007 you said it should reply to @ubiquityos instead of using the /gpt or any kind of slash command invocation?

Does that mean that you would invoke a review or review should be automate as discussed in this spec? My /gpt QA showed me asking for it to perform a review but instead this new upgraded functionality should be as per this spec, reviews built-in and triggered on events.

Ideally we keep within /gpt but have it now respond to a direct mention. We keep it within because of the context handling that it has, we can easily customize it to allow only x-depth of referenced context through instead of the unlimited depth that /gpt was made to have. As opposed to rebuilding another custom context collection system etc.

@0x4007
Copy link
Member Author

0x4007 commented Sep 22, 2024

I think unlimited depth is interesting to experiment with. I always side on providing more context to llms

Let's not use that slash command and instead replace it to respond to its @UbiquityOS tag

reviews built-in and triggered on events.

Sounds good

Copy link

ubiquity-os bot commented Sep 27, 2024

@Keyrxng, this task has been idle for a while. Please provide an update.

@Keyrxng
Copy link
Member

Keyrxng commented Sep 27, 2024

waiting for ubiquity-os-marketplace/command-ask#1

@0x4007
Copy link
Member Author

0x4007 commented Sep 28, 2024

What are you waiting for? Is it stable?

@Keyrxng
Copy link
Member

Keyrxng commented Oct 23, 2024

I'm held back by review in my other PRs so to ensure progress since I am free, want to push forward and I'm currently assigned to this task (the only task I'm assigned without a PR waiting review) I'm going to start work on this now.

Copy link

@Keyrxng, this task has been idle for a while. Please provide an update.

@Keyrxng
Copy link
Member

Keyrxng commented Nov 2, 2024

Just pushed code for 12 days straight, taking 2 days off (this is the 2nd)

@0x4007
Copy link
Member Author

0x4007 commented Nov 2, 2024

As a heads up we updated the algorithm and it will be following up much faster due to the priority level of this task.

I still think we might need to slow down the follow ups by setting the follow up time to 7 days and disqualify to 14.

With a high priority task you divide them by 3 to get their clock speed.

This would be a perfect moment to connect my https://github.com/0x4007/sync-configs-agent to a plugin, and have the natural language router in the kernel to handle this.

Copy link

@Keyrxng, this task has been idle for a while. Please provide an update.

1 similar comment
Copy link

@Keyrxng, this task has been idle for a while. Please provide an update.

Copy link

Passed the deadline and no activity is detected, removing assignees: @Keyrxng.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants