Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Improve prompt #179

Merged

Conversation

jaykayudo
Copy link
Contributor

Resolves #147

@jaykayudo
Copy link
Contributor Author

@gentlementlegen here is the new fix.

@Keyrxng
Copy link
Member

Keyrxng commented Oct 31, 2024

In my opinion the system message should contain only the instruction and the comments should be sent as the query.

Right now there is no query being sent as User to the model, it's just the system message which is not really how you are supposed to generate responses.

messages or chatHistory whatever, needs to be system > user > assistant > user > ...repeating. Right now it's just system I'm not even sure how we are getting a response back.

Please checkout command-ask as it has multiple completions endpoint methods. Both with multi-step input chat history.

Groundtruths pass in the repo deps as a stringified JSON object as the query, with the system message being the instruction and example. We should probs do the same here with comments.

/ask passes in the user query and fills the system message with chat history and instruction (I've suggested we split this into multi-step too).

Copy link

@jaykayudo, this task has been idle for a while. Please provide an update.

@jaykayudo
Copy link
Contributor Author

Prompt

Evaluate the relevance of GitHub comments to an issue. Provide a JSON object with comment IDs and their relevance scores.

Issue: In the v1 of the Ubiquibot, when a result gets evaluated, a recap is posted to the related issue with a summary and details of the rewards as well as the link to the rewards themselves Example: ubiquity/cloudflare-deploy-action#9 (comment)

The same logic should be applied in the v2 version by creating a new Module responsible to post that comment. This module will receive a similar input than the one mentioned here

The module should be:

  • possible to enable / disable
  • eventually configurable (what data to show / hide)
  • coming with tests

All comments:
2030164289 - 0x4007: "@whilefoo rfc on how we can deal with comment outputs. Perhaps we can have a standard recognized property on the output interface? Then the kernel can decide whether to pass it around or something.

interface PluginOutput {
  comment: string; // html comment
  rewards: Rewards; // { "whilefoo": "500", "token": "0x0" } etc
}
```"
2033404518 - gentlementlegen: "This needs https://github.com/ubiquibot/conversation-rewards/pull/7 to be merged first. Also probably needs https://github.com/ubiquibot/permit-generation/pull/2 to be able to generate the permits properly."
2033488255 - 0x4007: "> This needs https://github.com/ubiquibot/conversation-rewards/pull/7 to be merged first. Also probably needs https://github.com/ubiquibot/permit-generation/pull/2 to be able to generate the permits properly.

I think you should fork from and overtake that second pull due to us being behind schedule "
2035427134 - whilefoo: "> @whilefoo rfc on how we can deal with comment outputs. Perhaps we can have a standard recognized property on the output interface? Then the kernel can decide whether to pass it around or something.
> 
> ```ts
> interface PluginOutput {
>   comment: string; // html comment
>   rewards: Rewards; // { "whilefoo": "500", "token": "0x0" } etc
> }
> ```

there are a couple of options:
1. we let the conversation-rewards plugin generate and post the comment
2. we put comment as output and then another module is responsible for posting it or let conversation-rewards generate rewards and permit-generation generate permits and a third module that uses output from previous plugins to make a comment and post it
3. we let the conversation-rewards plugin generate the comment and pass it as a standard property like you suggested.

In theory 2. option sounds good to separate concerns but it's another plugin which means another call to github actions thus more latency, so for the sake of speed it'd go with option 1 or 3, but going with these 2 options would mean there will 1 comment for rewards summary and 1 comment for permits.
I'm not sure if option 3 is any better than option 1 because the plugin already has a token that has permissions to post comments so passing it to the kernel doesn't make much difference.
"
2036174312 - gentlementlegen: "To me 1 is the most straightforward to do for few reasons:
- the comment reward plugin has all the needed data already
- it can import the https://github.com/ubiquibot/permit-generation to generate permits itself
- if it is done this way it can be used as a complete standalone without the kernel

3 might make more sense in terms of architecture however. In such case the kernel should pass down results. It is more of an architecture question. Although, if we ever have other plugins in the flow that have influence on the total incentives, it would make sense to go through the kernel to aggregate the total result."
2036355445 - 0x4007: "> 
> there are a couple of options:
> 
> 1. we let the conversation-rewards plugin generate and post the comment

I think the most pure architecture would be that plugins can NOT inherit (write) authentication (only read if possible) of the kernel. As a consequence, no plugin should be able to post directly any issue. Ideally it should only be the kernel with a direct interface to issues. Plugins should just output comment HTML and the kernel can post it all in a single comment at the end of the webhook event invocation chain.

> 2. we put comment as output and then another module is responsible for posting it or let conversation-rewards generate rewards and permit-generation generate permits and a third module that uses output from previous plugins to make a comment and post it

Could be interesting to have a dedicated plugin to handle commenting, but because this seems like such an essential capability, I would more carefully consider the pros/cons of including this within the kernel. 

- On one hand, I really like the idea of making the kernel as pure and lean as possible
- On the other hand there are practical considerations that if every plugin standard output interface includes `comment:` and then we don't have the `comment` capability available (the comment plugin is not included in the config) then why is this a standard output property required on every plugin?

> 3. we let the conversation-rewards plugin generate the comment and pass it as a standard property like you suggested.
> 
> In theory 2. option sounds good to separate concerns but it's another plugin which means another call to github actions thus more latency, so for the sake of speed it'd go with option 1 or 3, but going with these 2 options would mean there will 1 comment for rewards summary and 1 comment for permits. I'm not sure if option 3 is any better than option 1 because the plugin already has a token that has permissions to post comments so passing it to the kernel doesn't make much difference.

I'm not concerned about latency now. Besides we can port any plugin to Cloudflare Workers (or combine several inside a single Worker) when we are ready to fix latency issues. To be honest though, except for setting labels, I have no problem with latency (like with permit generation) so there's no need to overengineer yet. I also love the GitHub Actions logs being available for easy debugging, and the fact that they are super generous with the compute. 

Given the entire conversation and all the considerations, I definitely think we should do option 3. 

This separates concerns the best, which will allow our plugin ecosystem to grow the fastest. "
2036367126 - gentlementlegen: "I think each plugin should output JSON not html as it is not reliable to parse nor manipulate and requires `window` instance to be instantiated which is annoying on `node` based projects. 

Having a plugin handling commenting seems quite weird as commenting is done by calling Octokit and the REST api which is already a library by itself, so no need to encapsulate it within another one to do the same thing.

My view on this, is to finalize https://github.com/ubiquibot/permit-generation/issues/5 to import it withing the `conversation-reward` that will post the comment itself as well, otherwise the architecture will be quite convoluted doing ping pong with everything."
2036370459 - 0x4007: "> I think each plugin should output JSON not html as it is not reliable to parse nor manipulate

I know JSON makes things more complicated than it needs to be with serialization/transport. Then we have problems like this https://github.com/ubiquibot/conversation-rewards/issues/18. If the comment property is ONLY meant to be used for GitHub comments, then transporting HTML only is the most sensible. Besides, there's no other metadata thats necessary within the `comment` property, which is the main point of sending a JSON object (it has multiple properties.) 

As a somewhat side note: there should also not be any variables inside of the HTML. We could look it as like server-side-rendering (or in this case, plugin-side-rendering) handle producing the final HTML output and then output it as a single string of html entities. 

> requires window instance to be instantiated which is annoying on node based projects.

We are currently using `mdast` in `@ubiquibot/comment-incentives` to generate virtual DOMs. No `window` needed.

> My view on this, is to finalize https://github.com/ubiquibot/permit-generation/issues/5 to import it withing the conversation-reward that will post the comment itself as well, otherwise the architecture will be quite convoluted doing ping pong with everything.

If we agree that this is considered as technical debt, and that we are accruing this so that we can get back on schedule, ok."
2036385985 - gentlementlegen: "If you want to manipulate and convey data, HTML really is not made for this. If you want something formatted similarly but made for data we can use XML format.
The new comment reward actually does instantiate a DOM through [JSDOM](https://github.com/ubiquibot/conversation-rewards/blob/ba434761281446a23566cd02c68bd3b0e79d4eb1/src/parser/formatting-evaluator-module.ts#L80) to make things way simpler instead of using Regex everywhere which is highly unreliable. But there it makes sense because we are parsing comments from an HTML page content.

Biggest advantage from this is to have the comment reward fully standalone, while easy to integrate with the kernel.

If we do something that handles the comment it means each and every module has to send it there and that module should understand every different content / format we send which would be way easier if the module itself handled its own comments, formatting wise."
2036393020 - 0x4007: "> If you want to manipulate and convey data, HTML really is not made for this.

Going back to my "plugin-side-rendering" mention, the data manipulation happens inside of the logic of the plugin. Then when the plugin is finally done with all of its compute, it emits a single string in the `comment` output property. This string is the final, rendered HTML. 

The `comment` output is not intended to be consumed by other plugins. In most cases, the kernel will concatenate all the `comment` outputs from every plugin and post a single comment at the end. Plugins will consume the `metadata` property output which will include raw values to work with. I forget where I originally proposed this, but imagine something like:
```typescript

type HtmlString = string;
type RequestedPermits = { username: string; amount: string; token: string; }[]
 
interface PluginOutput {
   metadata: Record<string, any>;
   comment: HtmlString;
   rewards: RequestedPermits;
}
```"
2036411811 - gentlementlegen: "But then how do we consider the formatting of that output?

Practical case: we want to post a comment when a user queries a wallet. That comment is 'user name': 'wallet 0x0'
Kernel calls the comment plugin, saying that it wants a comment to be posted. Should the Kernel send the rendering it wants, should the comment plugin transform the data to HTML?

Then, comment-reward wants to post the results. Should ask the Kernel to call the comment plugin, but then formatting is different. Should the Kernel notify the comment plugin that it wants a different output formatting? Should the Kernel compute beforehand the HTML and send it to the comment plugin?

Might be something I don't grasp there. Because I do understand your use case but it seems to be very deterministic on what is the purpose of the plugins which kinda defeats the purpose of having plugins, looks more like a monolithic approach to me"
2036433646 - 0x4007: "> But then how do we consider the formatting of that output?

The proposed `comment` output is intended for ease of composability for MOST situations, basically that the output from one plugin would just be appended to the final rendered comment. For simple plugin configurations (plugins not modifying the output results of others) this seems like straightforward architecture. 

However, as we know, there will be situations where a subsequent plugin will consider the results of a previous plugin, which means it would need to change the comment that is rendered. 

In this situation the subsequent plugin should clobber the output of the previous plugin. It is now clear to be that this will be a new challenge to express to the kernel that it should ignore the comment output of a previous plugin. I suppose it would be straightforward in the metadata using the plugin ID i.e. `{ "silences": ["ubiquibot/conversation-rewards"] }`.

> Kernel calls the comment plugin,

I think that comments should be handled from within the kernel. There should not be a separate comment plugin. Read my explanation [here](https://github.com/ubiquibot/conversation-rewards/issues/5#issuecomment-2036355445). 


### Inputs

For completeness of my [previous comment](https://github.com/ubiquibot/conversation-rewards/issues/5#issuecomment-2036393020):

```typescript

type PluginId = string; // i.e. `ubiquibot/conversation-rewards` `ubiquibot/assistive-pricing`
 
interface PluginInput {
   metadata: Record<PluginId, any>;
   context: GitHubEventContext;
}

Notice that we should not pass in the HTML of other plugins. Instead, just the metadata ("computed") values from the previous plugins."
2036458775 - gentlementlegen: "This can work, but we skyrocket coupling and to me defeat purpose of plugins that should be unaware of each other. If any plugin has to understand the result of a previous plugin, it means these plugins have necessarily to co-exist so basically they become a single plugin with no purpose to split them."

Comments to evaluate:
2036516869: "> so basically they become a single plugin with no purpose to split them.

I understand your concern and I would need to put more thought into composability. Maybe the comment proposal is bad after all due to your point. It's only useful with simple plugins, but for more complex ones your point is valid.

I need help to think through how any partner in the future can create new plugins that modify

  1. permit reward amounts
  2. XP reward amounts

I don't like the idea of having a single monolithic rewards generation plugin that wraps all the ways possible to compute rewards (i.e. time estimation, comment rewards, review rewards etc)" -
2036535332: "@pavlovcik To mitigate that that's why inside the comment reward itself I also integrated that Module principle so code is not coupled tightly and easy to build on. There is as usual pros and cons to both approaches (splitting or not) but biggest pro is that comments get evaluated once in the same spot, so we save calls to OpenAPI and speed up the process. Also makes it only one configuration file in one location. We need to think about our best options there." -
2051094255: "I realized that to carry this task properly we need to handle flags for comment more delicately as they only indicate if the comment is ISSUE | REVIEW with the level MEMBER | CONTRIBUTOR etc. but doesn't specify if it is from a task, a specification and so on. Tags should be added to the config properly as well." -
2053332029: "> I realized that to carry this task properly we need to handle flags for comment more delicately as they only indicate if the comment is ISSUE | REVIEW with the level MEMBER | CONTRIBUTOR etc. but doesn't specify if it is from a task, a specification and so on. Tags should be added to the config properly as well.

I see, so you're suggesting that we must annotate each comment as well in order to properly handle the scoring at the end?

Off hand I think there's:

  1. Specification
  2. Issue comment
  3. Pull request comment (a normal comment on a pull request, not related to posting a "review"
  4. Pull request review comment (associated with a "review state" I.e. approved, left comments, requested changes)

I suppose we need a better naming convention for the pull related ones. They are considered as separate entities according to the GitHub api. They require different methods to obtain both types. " -
2054424028: "Agreed, I think currently there are 3 possible things to annotate on the comments:

  • SPECIFICATION for the issue itself, TASK for the related PR fixing it, or simply COMMENT
  • MEMBER or CONTRIBUTOR for the status of the member
  • REVIEW or ISSUE for where the comment was added

I think this shall cover all cases." -
2055783331: "Consider calling it "contributor" and "collaborator" as that is how it is presented on the GitHub APIs as I recall.

Also I think you forgot about the "review comments"" -

Instructions:

  1. Read all comments carefully, considering their context and content.
  2. Evaluate each comment in the "Comments to evaluate" section.
  3. Assign a relevance score from 0 to 1 for each comment:
    • 0: Not related (e.g., spam)
    • 1: Highly relevant (e.g., solutions, bug reports)
  4. Consider:
    • Relation to the issue description
    • Connection to other comments
    • Contribution to issue resolution
  5. Handle GitHub-flavored markdown:
    • Ignore text beginning with '>' as it references another comment
    • Distinguish between referenced text and the commenter's own words
    • Only evaluate the relevance of the commenter's original content
  6. Return only a JSON object: {ID: score}

Notes:

  • Even minor details may be significant.
  • Comments may reference earlier comments.
  • The number of entries in the JSON response must equal 6.

Response

(after 10 consecutive requests)

{
  "2036516869": 0.8,
  "2036535332": 0.9,
  "2051094255": 0.6,
  "2053332029": 0.8,
  "2054424028": 0.9,
  "2055783331": 0.7
}

@0x4007 0x4007 removed their request for review November 5, 2024 21:52
@0x4007
Copy link
Member

0x4007 commented Nov 5, 2024

@sshivaditya2019 @gentlementlegen please decide. I think your testing module for this could be useful?

I think it mostly looks fine at this point. Would be great to get some others to agree because it's a lot to wrangle.

@gentlementlegen
Copy link
Member

gentlementlegen commented Nov 6, 2024

@0x4007 I do not want to merge this without testing, it broke all the generation last time we did so. I want to see a real run first.

@jaykayudo Why are the comments not sent on a JSON format anymore? It avoided any ambiguity on the comment content.

@sshivaditya2019
Copy link

@jaykayudo, could you share some results for less obvious examples where the relevance is expected to be high? For instance, this Issue

@jaykayudo
Copy link
Contributor Author

@jaykayudo Why are the comments not sent on a JSON format anymore? It avoided any ambiguity on the comment content.

@0x4007 suggested this new format.

@jaykayudo
Copy link
Contributor Author

@sshivaditya2019 Here is the result from the issue you provided. the last 7 comments was evaluated

Prompt

Evaluate the relevance of GitHub comments to an issue. Provide a JSON object with comment IDs and their relevance scores.

Issue: The bot shows a warning message on tasks that were opened recently, as seen here. It should only display this warning above a certain threshold, which comes from the configuration.

A possible cause would be that value missing in the current configuration. If that is the case, the default threshold should be set, and definitely above 0 days.


Tasks to be carried out:

  • display a warning message when the task is above the configuration threshold
  • do not display the warning if it is under that threshold
  • change the configuration to accept a string representing a duration instead of a time stamp
  • related tests

All comments:
2243997569 - gentlementlegen: "@Keyrxng would this be the fix for it? ubiquity/work.ubq.fi#74 (comment)"
2244009998 - Keyrxng: "yeah @gentlementlegen ubiquity-os-marketplace/command-start-stop@aae3cba resolved here in #10"
2244011846 - gentlementlegen: "Ok cool thank you, I will close this one after we got your fix merged."
2244019369 - 0x4007: "So the font is wrong be sure to match the styles of the old assign message exactly. "
2244026707 - gentlementlegen: "I triple checked I cannot see any font difference.
image
image
image
image

Maybe there is a difference on mobile devices if you are using one?"
2244031438 - Keyrxng: "registered wallet seems to wrap only when the warning is visible for some reason, nothing springs to mind on how to fix that

ubq-testing/command-start-stop#5 (comment)

image
"
2244181664 - 0x4007: "Use <code> not <samp>

image

image"
2244396724 - gentlementlegen: "@0x4007 Thanks for the screenshot, it seems to be more specific to mobile for the font. Maybe a font fallback happening there."
2245455562 - Keyrxng: "Been reading through the GFM and CommonMark specs, GH docs etc but haven't been able to find any other workarounds other than <samp>. We'll just have to live with the whitespace if using <code> blocks so which of the below options is best, included <samp> for easy comparison. I can't find any other workarounds with these tags or others such as <kbd> etc.


1

Warning!This task was created over 10 days ago. Please confirm that this issue specification is accurate before starting.
DeadlineMon, Jul 22, 10:25 PM UTC
Registered WalletRegister your wallet address using the following slash command: `/wallet 0x0000...0000`

2

Warning!This task was created over 10 days ago. Please confirm that this issue specification is accurate before starting.
DeadlineMon, Jul 22, 10:25 PM UTC
Registered WalletRegister your wallet address using the following slash command: `/wallet 0x0000...0000`

3

Warning!This task was created over 10 days ago. Please confirm that this issue specification is accurate before starting.
DeadlineMon, Jul 22, 10:25 PM UTC
Registered WalletRegister your wallet address using the following slash command: `/wallet 0x0000...0000`

---"
2246822367 - 0x4007: "> Been reading through the GFM and CommonMark specs, GH docs etc but haven't been able to find any other workarounds other than <samp>. We'll just have to live with the whitespace if using <code> blocks so which of the below options is best, included <samp> for easy comparison. I can't find any other workarounds with these tags or others such as <kbd> etc.

If its only a problem on mobile then perhaps <samp> is the best decision!"
2246845672 - gentlementlegen: "Number 3 seems to be the best display in your propositions, if that works on mobile then let's go with it."
2247594793 - Keyrxng: "> If its only a problem on mobile then perhaps is the best decision!

Yeah it's only mobile that <samp> seems to have slightly different rendering

Number 3 seems to be the best display in your propositions, if that works on mobile then let's go with it.

I agree it looks best but no it is somewhat different although personally I think <samp> is the way to go because it'll most commonly be seen via desktop vs mobile and I'd rather no whitespace than to accommodate the lesser used platform.

Sounds like option 3 is preferred and that's how the code is right now so no changes needed in this regard"
2285336634 - gentlementlegen: "@Keyrxng You mentioned that the bug was fixed, but now the opposite might be happening, see message here: ubiquity/pay.ubq.fi#237 (comment)

I also suggest that we change the configuration format to take a string representing a duration and not a number which is hard to understand and error prone."
2285383795 - Keyrxng: "> @Keyrxng You mentioned that the bug was fixed, but now the opposite might be happening, see message here: ubiquity/pay.ubq.fi#237 (comment)

I also suggest that we change the configuration format to take a string representing a duration and not a number which is hard to understand and error prone.

@gentlementlegen you want to add it into the spec/title here and I'll handle both?"

Comments to evaluate:
2285387841: "@Keyrxng Done. Also, I guess the font issue should be carried out in a different issue most likely, if there is not once created already." -
2285389471: "> Also, I guess the font issue should be carried out in a different issue most likely, if there is not once created already.

I'm certain that we landed on sticking with <samp> and supporting the most used platform as it's only an issue on mobile so no need to do anything with that"
2308672581: "I think we need to tweak the qualitative analysis. Somehow I got 0 relevance on my comments which didn't seem to be the case before with gpt3.5 10x samples.

Also I should be getting img credit.

Seems like there's problems with quantitative analysis as well

@gentlementlegen " -
2308673448: "For the img it is currently set to 0 so you can just change the configuration. For the relevance I will read the logs more in detail.


After reading the logs, it is possible that the allocated max_tokens fell short with the dummy test leading to empty relevances here. I guess we could add a margin (say for example 10 extra tokens) to make sure there is a result." -
2308752680: "Margin seems imprecise, which leads me to believe that the approach is wrong. Perhaps the approach should be reevaluated so that the math is perfect. " -
2308755339: "@0x4007 It was supposed to be because the length of results is predictable, but when allocating only one token the result seems to be {} without the content (should be { "1234" : 0.5 }. I'll investigate." -

Instructions:

  1. Read all comments carefully, considering their context and content.
  2. Evaluate each comment in the "Comments to evaluate" section.
  3. Assign a relevance score from 0 to 1 for each comment:
    • 0: Not related (e.g., spam)
    • 1: Highly relevant (e.g., solutions, bug reports)
  4. Consider:
    • Relation to the issue description
    • Connection to other comments
    • Contribution to issue resolution
  5. Handle GitHub-flavored markdown:
    • Ignore text beginning with '>' as it references another comment
    • Distinguish between referenced text and the commenter's own words
    • Only evaluate the relevance of the commenter's original content
  6. Return only a JSON object: {ID: score}

Notes:

  • Even minor details may be significant.
  • Comments may reference earlier comments.
  • The number of entries in the JSON response must equal 7.

Response

{
  "2285387841": 0.5,
  "2285389471": 0.6,
  "2308672581": 0.1,
  "2308673448": 0.3,
  "2308752680": 0.2,
  "2308755339": 0.4,
  "2243997569": 0.7
}

@gentlementlegen
Copy link
Member

Prompt

Evaluate the relevance of GitHub comments to an issue. Provide a JSON object with comment IDs and their relevance scores.

Issue: In the v1 of the Ubiquibot, when a result gets evaluated, a recap is posted to the related issue with a summary and details of the rewards as well as the link to the rewards themselves Example: ubiquity/cloudflare-deploy-action#9 (comment)

The same logic should be applied in the v2 version by creating a new Module responsible to post that comment. This module will receive a similar input than the one mentioned here

The module should be:

  • possible to enable / disable
  • eventually configurable (what data to show / hide)
  • coming with tests

All comments: 2030164289 - 0x4007: "@whilefoo rfc on how we can deal with comment outputs. Perhaps we can have a standard recognized property on the output interface? Then the kernel can decide whether to pass it around or something.

interface PluginOutput {
  comment: string; // html comment
  rewards: Rewards; // { "whilefoo": "500", "token": "0x0" } etc
}
```"
2033404518 - gentlementlegen: "This needs https://github.com/ubiquibot/conversation-rewards/pull/7 to be merged first. Also probably needs https://github.com/ubiquibot/permit-generation/pull/2 to be able to generate the permits properly."
2033488255 - 0x4007: "> This needs https://github.com/ubiquibot/conversation-rewards/pull/7 to be merged first. Also probably needs https://github.com/ubiquibot/permit-generation/pull/2 to be able to generate the permits properly.

I think you should fork from and overtake that second pull due to us being behind schedule "
2035427134 - whilefoo: "> @whilefoo rfc on how we can deal with comment outputs. Perhaps we can have a standard recognized property on the output interface? Then the kernel can decide whether to pass it around or something.
> 
> ```ts
> interface PluginOutput {
>   comment: string; // html comment
>   rewards: Rewards; // { "whilefoo": "500", "token": "0x0" } etc
> }
> ```

there are a couple of options:
1. we let the conversation-rewards plugin generate and post the comment
2. we put comment as output and then another module is responsible for posting it or let conversation-rewards generate rewards and permit-generation generate permits and a third module that uses output from previous plugins to make a comment and post it
3. we let the conversation-rewards plugin generate the comment and pass it as a standard property like you suggested.

In theory 2. option sounds good to separate concerns but it's another plugin which means another call to github actions thus more latency, so for the sake of speed it'd go with option 1 or 3, but going with these 2 options would mean there will 1 comment for rewards summary and 1 comment for permits.
I'm not sure if option 3 is any better than option 1 because the plugin already has a token that has permissions to post comments so passing it to the kernel doesn't make much difference.
"
2036174312 - gentlementlegen: "To me 1 is the most straightforward to do for few reasons:
- the comment reward plugin has all the needed data already
- it can import the https://github.com/ubiquibot/permit-generation to generate permits itself
- if it is done this way it can be used as a complete standalone without the kernel

3 might make more sense in terms of architecture however. In such case the kernel should pass down results. It is more of an architecture question. Although, if we ever have other plugins in the flow that have influence on the total incentives, it would make sense to go through the kernel to aggregate the total result."
2036355445 - 0x4007: "> 
> there are a couple of options:
> 
> 1. we let the conversation-rewards plugin generate and post the comment

I think the most pure architecture would be that plugins can NOT inherit (write) authentication (only read if possible) of the kernel. As a consequence, no plugin should be able to post directly any issue. Ideally it should only be the kernel with a direct interface to issues. Plugins should just output comment HTML and the kernel can post it all in a single comment at the end of the webhook event invocation chain.

> 2. we put comment as output and then another module is responsible for posting it or let conversation-rewards generate rewards and permit-generation generate permits and a third module that uses output from previous plugins to make a comment and post it

Could be interesting to have a dedicated plugin to handle commenting, but because this seems like such an essential capability, I would more carefully consider the pros/cons of including this within the kernel. 

- On one hand, I really like the idea of making the kernel as pure and lean as possible
- On the other hand there are practical considerations that if every plugin standard output interface includes `comment:` and then we don't have the `comment` capability available (the comment plugin is not included in the config) then why is this a standard output property required on every plugin?

> 3. we let the conversation-rewards plugin generate the comment and pass it as a standard property like you suggested.
> 
> In theory 2. option sounds good to separate concerns but it's another plugin which means another call to github actions thus more latency, so for the sake of speed it'd go with option 1 or 3, but going with these 2 options would mean there will 1 comment for rewards summary and 1 comment for permits. I'm not sure if option 3 is any better than option 1 because the plugin already has a token that has permissions to post comments so passing it to the kernel doesn't make much difference.

I'm not concerned about latency now. Besides we can port any plugin to Cloudflare Workers (or combine several inside a single Worker) when we are ready to fix latency issues. To be honest though, except for setting labels, I have no problem with latency (like with permit generation) so there's no need to overengineer yet. I also love the GitHub Actions logs being available for easy debugging, and the fact that they are super generous with the compute. 

Given the entire conversation and all the considerations, I definitely think we should do option 3. 

This separates concerns the best, which will allow our plugin ecosystem to grow the fastest. "
2036367126 - gentlementlegen: "I think each plugin should output JSON not html as it is not reliable to parse nor manipulate and requires `window` instance to be instantiated which is annoying on `node` based projects. 

Having a plugin handling commenting seems quite weird as commenting is done by calling Octokit and the REST api which is already a library by itself, so no need to encapsulate it within another one to do the same thing.

My view on this, is to finalize https://github.com/ubiquibot/permit-generation/issues/5 to import it withing the `conversation-reward` that will post the comment itself as well, otherwise the architecture will be quite convoluted doing ping pong with everything."
2036370459 - 0x4007: "> I think each plugin should output JSON not html as it is not reliable to parse nor manipulate

I know JSON makes things more complicated than it needs to be with serialization/transport. Then we have problems like this https://github.com/ubiquibot/conversation-rewards/issues/18. If the comment property is ONLY meant to be used for GitHub comments, then transporting HTML only is the most sensible. Besides, there's no other metadata thats necessary within the `comment` property, which is the main point of sending a JSON object (it has multiple properties.) 

As a somewhat side note: there should also not be any variables inside of the HTML. We could look it as like server-side-rendering (or in this case, plugin-side-rendering) handle producing the final HTML output and then output it as a single string of html entities. 

> requires window instance to be instantiated which is annoying on node based projects.

We are currently using `mdast` in `@ubiquibot/comment-incentives` to generate virtual DOMs. No `window` needed.

> My view on this, is to finalize https://github.com/ubiquibot/permit-generation/issues/5 to import it withing the conversation-reward that will post the comment itself as well, otherwise the architecture will be quite convoluted doing ping pong with everything.

If we agree that this is considered as technical debt, and that we are accruing this so that we can get back on schedule, ok."
2036385985 - gentlementlegen: "If you want to manipulate and convey data, HTML really is not made for this. If you want something formatted similarly but made for data we can use XML format.
The new comment reward actually does instantiate a DOM through [JSDOM](https://github.com/ubiquibot/conversation-rewards/blob/ba434761281446a23566cd02c68bd3b0e79d4eb1/src/parser/formatting-evaluator-module.ts#L80) to make things way simpler instead of using Regex everywhere which is highly unreliable. But there it makes sense because we are parsing comments from an HTML page content.

Biggest advantage from this is to have the comment reward fully standalone, while easy to integrate with the kernel.

If we do something that handles the comment it means each and every module has to send it there and that module should understand every different content / format we send which would be way easier if the module itself handled its own comments, formatting wise."
2036393020 - 0x4007: "> If you want to manipulate and convey data, HTML really is not made for this.

Going back to my "plugin-side-rendering" mention, the data manipulation happens inside of the logic of the plugin. Then when the plugin is finally done with all of its compute, it emits a single string in the `comment` output property. This string is the final, rendered HTML. 

The `comment` output is not intended to be consumed by other plugins. In most cases, the kernel will concatenate all the `comment` outputs from every plugin and post a single comment at the end. Plugins will consume the `metadata` property output which will include raw values to work with. I forget where I originally proposed this, but imagine something like:
```typescript

type HtmlString = string;
type RequestedPermits = { username: string; amount: string; token: string; }[]
 
interface PluginOutput {
   metadata: Record<string, any>;
   comment: HtmlString;
   rewards: RequestedPermits;
}
```"
2036411811 - gentlementlegen: "But then how do we consider the formatting of that output?

Practical case: we want to post a comment when a user queries a wallet. That comment is 'user name': 'wallet 0x0'
Kernel calls the comment plugin, saying that it wants a comment to be posted. Should the Kernel send the rendering it wants, should the comment plugin transform the data to HTML?

Then, comment-reward wants to post the results. Should ask the Kernel to call the comment plugin, but then formatting is different. Should the Kernel notify the comment plugin that it wants a different output formatting? Should the Kernel compute beforehand the HTML and send it to the comment plugin?

Might be something I don't grasp there. Because I do understand your use case but it seems to be very deterministic on what is the purpose of the plugins which kinda defeats the purpose of having plugins, looks more like a monolithic approach to me"
2036433646 - 0x4007: "> But then how do we consider the formatting of that output?

The proposed `comment` output is intended for ease of composability for MOST situations, basically that the output from one plugin would just be appended to the final rendered comment. For simple plugin configurations (plugins not modifying the output results of others) this seems like straightforward architecture. 

However, as we know, there will be situations where a subsequent plugin will consider the results of a previous plugin, which means it would need to change the comment that is rendered. 

In this situation the subsequent plugin should clobber the output of the previous plugin. It is now clear to be that this will be a new challenge to express to the kernel that it should ignore the comment output of a previous plugin. I suppose it would be straightforward in the metadata using the plugin ID i.e. `{ "silences": ["ubiquibot/conversation-rewards"] }`.

> Kernel calls the comment plugin,

I think that comments should be handled from within the kernel. There should not be a separate comment plugin. Read my explanation [here](https://github.com/ubiquibot/conversation-rewards/issues/5#issuecomment-2036355445). 


### Inputs

For completeness of my [previous comment](https://github.com/ubiquibot/conversation-rewards/issues/5#issuecomment-2036393020):

```typescript

type PluginId = string; // i.e. `ubiquibot/conversation-rewards` `ubiquibot/assistive-pricing`
 
interface PluginInput {
   metadata: Record<PluginId, any>;
   context: GitHubEventContext;
}

Notice that we should not pass in the HTML of other plugins. Instead, just the metadata ("computed") values from the previous plugins." 2036458775 - gentlementlegen: "This can work, but we skyrocket coupling and to me defeat purpose of plugins that should be unaware of each other. If any plugin has to understand the result of a previous plugin, it means these plugins have necessarily to co-exist so basically they become a single plugin with no purpose to split them."

Comments to evaluate: 2036516869: "> so basically they become a single plugin with no purpose to split them.

I understand your concern and I would need to put more thought into composability. Maybe the comment proposal is bad after all due to your point. It's only useful with simple plugins, but for more complex ones your point is valid.

I need help to think through how any partner in the future can create new plugins that modify

  1. permit reward amounts
  2. XP reward amounts

I don't like the idea of having a single monolithic rewards generation plugin that wraps all the ways possible to compute rewards (i.e. time estimation, comment rewards, review rewards etc)" - 2036535332: "@pavlovcik To mitigate that that's why inside the comment reward itself I also integrated that Module principle so code is not coupled tightly and easy to build on. There is as usual pros and cons to both approaches (splitting or not) but biggest pro is that comments get evaluated once in the same spot, so we save calls to OpenAPI and speed up the process. Also makes it only one configuration file in one location. We need to think about our best options there." - 2051094255: "I realized that to carry this task properly we need to handle flags for comment more delicately as they only indicate if the comment is ISSUE | REVIEW with the level MEMBER | CONTRIBUTOR etc. but doesn't specify if it is from a task, a specification and so on. Tags should be added to the config properly as well." - 2053332029: "> I realized that to carry this task properly we need to handle flags for comment more delicately as they only indicate if the comment is ISSUE | REVIEW with the level MEMBER | CONTRIBUTOR etc. but doesn't specify if it is from a task, a specification and so on. Tags should be added to the config properly as well.

I see, so you're suggesting that we must annotate each comment as well in order to properly handle the scoring at the end?

Off hand I think there's:

  1. Specification
  2. Issue comment
  3. Pull request comment (a normal comment on a pull request, not related to posting a "review"
  4. Pull request review comment (associated with a "review state" I.e. approved, left comments, requested changes)

I suppose we need a better naming convention for the pull related ones. They are considered as separate entities according to the GitHub api. They require different methods to obtain both types. " - 2054424028: "Agreed, I think currently there are 3 possible things to annotate on the comments:

  • SPECIFICATION for the issue itself, TASK for the related PR fixing it, or simply COMMENT
  • MEMBER or CONTRIBUTOR for the status of the member
  • REVIEW or ISSUE for where the comment was added

I think this shall cover all cases." - 2055783331: "Consider calling it "contributor" and "collaborator" as that is how it is presented on the GitHub APIs as I recall.

Also I think you forgot about the "review comments"" -

Instructions:

  1. Read all comments carefully, considering their context and content.

  2. Evaluate each comment in the "Comments to evaluate" section.

  3. Assign a relevance score from 0 to 1 for each comment:

    • 0: Not related (e.g., spam)
    • 1: Highly relevant (e.g., solutions, bug reports)
  4. Consider:

    • Relation to the issue description
    • Connection to other comments
    • Contribution to issue resolution
  5. Handle GitHub-flavored markdown:

    • Ignore text beginning with '>' as it references another comment
    • Distinguish between referenced text and the commenter's own words
    • Only evaluate the relevance of the commenter's original content
  6. Return only a JSON object: {ID: score}

Notes:

  • Even minor details may be significant.
  • Comments may reference earlier comments.
  • The number of entries in the JSON response must equal 6.

Response

(after 10 consecutive requests)

{
  "2036516869": 0.8,
  "2036535332": 0.9,
  "2051094255": 0.6,
  "2053332029": 0.8,
  "2054424028": 0.9,
  "2055783331": 0.7
}

@0x4007 Why removing the json format? It seems to make the prompt very confusing to understand the beginning and end of a comment, at least to me.

Copy link

@jaykayudo, this task has been idle for a while. Please provide an update.

@jaykayudo
Copy link
Contributor Author

@jaykayudo, this task has been idle for a while. Please provide an update.

@gentlementlegen @0x4007 does this PR meet the requirements to resolve the issue or is there more needed to be done?

@gentlementlegen
Copy link
Member

@jaykayudo please resolve the conflict. Also would like the reasoning for removing the JSON output @0x4007

@jaykayudo
Copy link
Contributor Author

jaykayudo commented Nov 10, 2024

@gentlementlegen I have resolved the conflict and from the previous PR, this was the reason for removing this JSON comments.

${allCommentsMap.join("\n")}

Comments to evaluate:
${commentsMap.join("\n")}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern about comments to evaluate is here. A line break seems weak to separate comments, which can lead to the result not having the same length as the provided comments, you never had such issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line break basically has no effect in separating the comments. The LLM is intelligent enough to tell the start of the comment and the end of the comment. i once thought it might be an issue as well but surprisingly after testing a lot of prompts, there was no issue at all. you can look at the prompts posted above for better clarification. if you want it to be reversed to JSON notation, no problem.

@0x4007
Copy link
Member

0x4007 commented Nov 11, 2024

JSON is extremely redundant compared to plaintext/markdown. For humans and LLMs JSON is not the most efficient way to express (most) ideas.

@gentlementlegen
Copy link
Member

Then I suppose this can be merged and tested, and reverted like last time if anything goes wrong. @jaykayudo please merge the dist file in this branch.

@jaykayudo
Copy link
Contributor Author

Then I suppose this can be merged and tested, and reverted like last time if anything goes wrong. @jaykayudo please merge the dist file in this branch.

Done

@gentlementlegen gentlementlegen merged commit 17b0cdf into ubiquity-os-marketplace:development Nov 11, 2024
3 checks passed
@ubiquity-os-beta ubiquity-os-beta bot mentioned this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Prompt
5 participants