Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refinements #97

Closed
0x4007 opened this issue Aug 25, 2024 · 61 comments · Fixed by #123
Closed

Refinements #97

0x4007 opened this issue Aug 25, 2024 · 61 comments · Fixed by #123

Comments

@0x4007
Copy link
Member

0x4007 commented Aug 25, 2024

Qualitative and quantitative analysis have unexpected results according to how I implemented in v1. Research, and refine.

I think we need to tweak the qualitative analysis. Somehow I got 0 relevance on my comments which didn't seem to be the case before with gpt3.5 10x samples.

Also I should be getting img credit.

Seems like there's problems with quantitative analysis as well

Originally posted by @0x4007 in ubiquity-os-marketplace/command-start-stop#14 (comment)

@0x4007
Copy link
Member Author

0x4007 commented Aug 25, 2024

Made it a week to guarantee a good job. This is a core feature that needs to work at least as well as before.

@0x4007
Copy link
Member Author

0x4007 commented Aug 31, 2024

Seems like test with known samples is a good next step here.

@sshivaditya2019
Copy link

/start

Copy link
Contributor

ubiquity-os bot commented Sep 6, 2024

DeadlineFri, Sep 13, 11:02 AM UTC
Registered Wallet 0xDAba6e01D15Db560b88C8F426b016801f79e1F69
Tips:
  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.

@sshivaditya2019
Copy link

sshivaditya2019 commented Sep 6, 2024

@0x4007

  • Could you share same sample comments which previously had high relevance scores ?
  • Also, Could you point me to the section of the code that gave credits for images ? If it does exist.

@0x4007
Copy link
Member Author

0x4007 commented Sep 8, 2024

@0x4007

  • Could you share same sample comments which previously had high relevance scores ?

Unfortunately you or I would just have to manually check old completed tasks and see their rewards. None in particular come to mind, but I would pay attention to those posted by "ubiquibot" instead of "ubiquity-os" as those used an older version of conversation rewards that seemed more accurate.

  • Also, Could you point me to the section of the code that gave credits for images ? If it does exist.

It is under the "formatting score" or "quantitative scoring" section. You might be able to search for these keywords in the codebase. I am mobile so pointing to code is not feasible. @gentlementlegen perhaps you can help with this point.

@gentlementlegen
Copy link
Member

We don't give credit for the image itself but apply a different value and multiplier based on the html entities, like <img/>.
The configuration object is located here, and the multiplier would be applied here.

Copy link
Contributor

ubiquity-os bot commented Sep 10, 2024

@sshivaditya2019, this task has been idle for a while. Please provide an update.

@0x4007
Copy link
Member Author

0x4007 commented Sep 10, 2024

@sshivaditya2019, this task has been idle for a while. Please provide an update.

@gentlementlegen Really nice to see this finally working as expected. Except the revision hash in the metadata is undefined. This should be fixed!

@sshivaditya2019
Copy link

@0x4007

I need to evaluate the relevance of GitHub contributors' comments to a specific issue specification. Specifically, I'm interested in how much each comment helps to further define the issue specification or contributes new information or research relevant to the issue. Please provide a float between 0 and 1 to represent the degree of relevance. A score of 1 indicates that the comment is entirely relevant and adds significant value to the issue, whereas a score of 0 indicates no relevance or added value. A stringified JSON is given below that contains the specification and contributors' comments. Each comment in the JSON has a unique ID and comment content.{ "specification": "Do not show warning message on tasks that were not created a long time ago", "comments": [ { "id": 1, "comment": "Ok cool thank you, I will close this one after we got your fix merged." }, { "id": 2, "comment": "So the font is wrong be sure to match the styles of the old assign message exactly." }, { "id": 3, "comment": "Updating the password recovery process could help in user management." } ] } To what degree are each of the comments in the conversation relevant and valuable to further defining the issue specification? Please reply with ONLY a JSON where each key is the comment ID given in JSON above, and the value is a float number between 0 and 1 corresponding to the comment. The float number should represent the degree of relevance and added value of the comment to the issue. The total number of properties in your JSON response should equal exactly 3.

I tried with this prompt, for models gpt-4o, gpt-3.5-Turbo, chatgpt, almost all of the models give the same relevance values.I think the problem is that there isn't enough context. On its own, the comment might not seem relevant to the issue description and details

I would suggest a better approach would be reduce the temperature and top_p values, perhaps a better way would be evaluate all the comments together in a single block instead of testing them in isolation. Following are the results which I got from the GPT-4o and has the same values has GPT 3.5-Turbo

{
  "1": 0.1,
  "2": 0.2,
  "3": 0.0
}

Explanation:

  • Comment 1: "Ok cool thank you, I will close this one after we got your fix merged."
    This comment is mostly administrative and does not directly contribute to defining the issue or adding new information. It is somewhat relevant because it acknowledges the fix, but it doesn't provide any new insights or suggestions. Relevance: 0.1

  • Comment 2: "So the font is wrong be sure to match the styles of the old assign message exactly."
    This comment addresses a style issue (font) but does not directly relate to the core issue of warning messages or task creation timing. However, it could be tangentially related if the warning message involves UI elements, so it has some minor relevance. Relevance: 0.2

  • Comment 3: "Updating the password recovery process could help in user management."
    This comment is entirely unrelated to the issue specification, which focuses on warning messages and task creation timing. It introduces a different topic (password recovery), so it has no relevance to the current issue. Relevance: 0.0

@0x4007
Copy link
Member Author

0x4007 commented Sep 10, 2024

I would suggest a better approach would be reduce the temperature and top_p values,

Great idea except if temp is set too low I know it repeats and crashes. I'm pretty sure I played with these settings in my original implementation (see the repo called comment-incentives)

perhaps a better way would be evaluate all the comments together in a single block instead of testing them in isolation.

I'm pretty sure it's implemented this way. I know for a fact in my original implementation I had them all evaluate in one shot.

@sshivaditya2019
Copy link

sshivaditya2019 commented Sep 10, 2024

I would suggest a better approach would be reduce the temperature and top_p values,

Great idea except if temp is set too low I know it repeats and crashes. I'm pretty sure I played with these settings in my original implementation (see the repo called comment-incentives)

perhaps a better way would be evaluate all the comments together in a single block instead of testing them in isolation.

I'm pretty sure it's implemented this way. I know for a fact in my original implementation I had them all evaluate in one shot.

So I tested few examples a temperature value of 0.2 works fine for now with GPT4o model. I don't think the current implementation does that, prompt expects a { specification: issue, comments: comments } object, with the comments being of type { id: number; comment: string }[]. I can probably rewrite that part, if that's fine.

@0x4007
Copy link
Member Author

0x4007 commented Sep 11, 2024

I would suggest a better approach would be reduce the temperature and top_p values,

Great idea except if temp is set too low I know it repeats and crashes. I'm pretty sure I played with these settings in my original implementation (see the repo called comment-incentives)

perhaps a better way would be evaluate all the comments together in a single block instead of testing them in isolation.

I'm pretty sure it's implemented this way. I know for a fact in my original implementation I had them all evaluate in one shot.

@gentlementlegen

@gentlementlegen
Copy link
Member

Depends what is meant by "all together". Now it is all together, but by user, not all comment from the whole issue / PR in one block. Original implementation was the same except that there was a batch of 10 attemps averaged.

@0x4007
Copy link
Member Author

0x4007 commented Sep 12, 2024

I see. Given the extended context windows of the latest models, perhaps we should do it all in one shot?

@gentlementlegen
Copy link
Member

If that enhances precision and gives more context for better results it is nice, however I wonder if we would easily burst through the max tokens doing so for long reviews.

@0x4007
Copy link
Member Author

0x4007 commented Sep 12, 2024

Context window is too long these days I am pretty sure it will be fine

@gentlementlegen
Copy link
Member

The problem is that this tests runs on real data which is subject to change. What could be done is including this test but excluding it from the test run. However, there is this issue that should eventually get that covered, didn't have time to look into it.

@sshivaditya2019
Copy link

The problem is that this tests runs on real data which is subject to change. What could be done is including this test but excluding it from the test run. However, there is this issue that should eventually get that covered, didn't have time to look into it.

Let me know if you’re not working on it right now, I can take a look at it then.

@sshivaditya2019
Copy link

@0x4007. I think the issue spec for this one is a bit vague. I think relevance issue should be fixed with the PR. The img credit is a configuration issue. But, I am not sure about the outcomes for this ticket.

Is the goal for the ticket to rewrite the entire comment-evaluator-module with the embeddings and vector search? Or is it something else ?

@0x4007
Copy link
Member Author

0x4007 commented Sep 19, 2024

@0x4007. I think the issue spec for this one is a bit vague.

For vague specs which happen occasionally, we are to share research and concerns on here. We all get credited for it.

I think relevance issue should be fixed with the PR. The img credit is a configuration issue. But, I am not sure about the outcomes for this ticket.

Is the goal for the ticket to rewrite the entire comment-evaluator-module with the embeddings and vector search? Or is it something else ?

Some recent observations:

  1. We are very likely going to entirely remove the word counter feature and instead generate embeddings and understand how much value a comment adds to solving the problem.
  2. We need to change the code to emphasize crediting per HTML tag and then the meaning of the comment with embeddings instead of word count.
  3. We still need to test how far we can get regarding vector embeddings and how well it solves our problem.

@sshivaditya2019
Copy link

sshivaditya2019 commented Sep 20, 2024

@0x4007. I think the issue spec for this one is a bit vague.

For vague specs which happen occasionally, we are to share research and concerns on here. We all get credited for it.

I think relevance issue should be fixed with the PR. The img credit is a configuration issue. But, I am not sure about the outcomes for this ticket.
Is the goal for the ticket to rewrite the entire comment-evaluator-module with the embeddings and vector search? Or is it something else ?

Some recent observations:

  1. We are very likely going to entirely remove the word counter feature and instead generate embeddings and understand how much value a comment adds to solving the problem.

@0x4007 Embedding will not solve the problem, I think the present relevance scoring is the best technique. According to me a better approach would be to use something like Bag of Words model with hierarchal labeling, and assign scores according to the depth of the concept. Let me know about this I can put together a small write up on this.

  1. We need to change the code to emphasize crediting per HTML tag and then the meaning of the comment with embeddings instead of word count.

I think present implementation focuses only on formatting. Effectively this would mean that entire formatting-evaluator-module.ts being rewritten.

  1. We still need to test how far we can get regarding vector embeddings and how well it solves our problem.

I am very skeptical about embeddings in this use case. As I mentioned before, embeddings provide local context, and references, and on their own would not mean anything. I created my own script to plot visualize the embeddings and perform PCA to extract cluster centers.

Original Comments:

image

Embeddings Plot with Comments:
image

Here, you can see three distinctive cluster centers, In the embeddings plot with comment I have added a new comment Something random blah blah, as you can see it is near a cluster center, and would have high similarity in vector search. This should not happen. My suggestion would be to use a nlp method instead of embedding based vector search. Let me know if you want to set this up on your end I can help you with that.

Python Script

@0x4007
Copy link
Member Author

0x4007 commented Sep 20, 2024

My peer suggested some search engine results related algorithm. I'm asking him now to clarify which. This should help us see how on topic it is for the specification. We could consider adding this as one of several dimensions we evaluate the comments by.

Starting to wonder if sub plugins are realistic, or just make npm modules (or if we should just use something like git modules)

ChatGPT is recommending me:

We are exploring methods to evaluate comment readability and conciseness using Flesch-Kincaid readability metrics. These formulas assess the complexity and clarity of text based on sentence length and word syllables:

  1. Flesch Reading Ease: Rates text on a scale from 0 (difficult) to 100 (easy). Scores can help determine how easily a comment can be understood.
  2. Flesch-Kincaid Grade Level: Converts readability into an estimated school grade level required to comprehend the text.

These metrics can help us identify verbose comments vs. concise, high-value inputs

This seems a lot more interesting compared to word count but we should test.

The idea is that we can develop a proprietary algorithm that combines several strategies. Ideally we should make a playground that we can plugin these different modules and run tests against live GitHub issues to tweak it

Strategy ideas:

  1. HTML elements counter
  2. Word counter ❌
  3. Flesch
  4. Something search engine algo related
  5. LLM judge (we do this now for relevance scoring)
  6. Generate concise summary and calculate compression ratio

@gentlementlegen

@gentlementlegen
Copy link
Member

conversation-rewards already supports modules within itself that you can enable / disable to change the final output. You can do as many transforming modules as you want and enable / disable them through the configuration.

@0x4007
Copy link
Member Author

0x4007 commented Sep 21, 2024

My peer got back to me regarding the search engine recommendation


TF-IDF (Term Frequency-Inverse Document Frequency) is a classic algorithm used in search and information retrieval to evaluate how important a word is to a document relative to a collection of documents (often referred to as a "corpus"). It helps identify which terms are most relevant to the context of a specific document.

In the Context of Your Goals: Evaluating GitHub Comments

Given your objective to measure the value of GitHub comments in relation to problem-solving, TF-IDF could be a useful tool to assess the relevance and informational density of individual comments with respect to the overall issue or conversation.

Here's how TF-IDF might be applied in your scenario:

1. How TF-IDF Works

  • Term Frequency (TF): Measures how frequently a term appears in a comment. Higher frequencies suggest that the term is more important within that comment.
$$\[ \text{TF}(t, d) = \frac{\text{Number of times term } t \text{ appears in document } d}{\text{Total number of terms in document } d} \]$$
  • Inverse Document Frequency (IDF): Measures how unique a term is across all comments or documents. Terms that appear frequently across many comments are considered less important.
$$\[ \text{IDF}(t) = \log \left(\frac{\text{Total number of documents}}{\text{Number of documents containing term } t}\right) \]$$
  • TF-IDF Score: Combines the two to assign a weight to each term in a comment:
$$\[ \text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t) \]$$

2. Applying TF-IDF to Evaluate Comment Relevance:

  • Identifying Key Terms in Comments:

    • TF-IDF will highlight terms in each comment that are not just common but are also distinctive in the context of the overall issue. This helps identify comments with unique and relevant insights.
  • Assessing Relevance to the Issue Description:

    • By comparing the TF-IDF scores of words in a comment to those in the issue description, you can measure how closely a comment aligns with the core problem. Comments with terms that have high TF-IDF relevance scores relative to the issue description are more likely to be valuable.
  • Filtering Out Low-Value Contributions:

    • Comments that consist primarily of high-TF but low-IDF terms (e.g., generic phrases or filler words) can be identified as less valuable. This is particularly useful for identifying verbose comments from junior developers that lack unique insights.

3. Enhancing Your Continuum-Based Scoring System:

  • Weighted Relevance Scores: Use TF-IDF scores to assign a relevance weight to each comment, allowing you to rank comments on a continuum of importance rather than using binary relevance.

  • Combining with Other Metrics: Integrate TF-IDF scores with other continuous metrics (e.g., semantic similarity, readability) to create a comprehensive scoring system that reflects both the specificity and value of a comment.

Practical Steps for Implementation:

  1. Preprocess the Data: Tokenize the comments and issue descriptions, remove stop words, and normalize the text (e.g., lowercase conversion).
  2. Calculate TF-IDF: Apply TF-IDF to generate relevance scores for each comment.
  3. Score Aggregation: Aggregate the TF-IDF scores to quantify each comment’s overall contribution to solving the issue.

Benefits for Your Goals:

  • Objective Measurement of Relevance: TF-IDF provides a quantitative way to gauge how closely comments relate to the problem at hand.
  • Filtering Out Noise: Helps distinguish between high-value contributions and generic or off-topic comments.
  • Complementary to Other Techniques: Can be combined with PageRank, readability scores, or semantic similarity measures for a more holistic evaluation.

Using TF-IDF will give you an effective way to measure the informational value and relevance of comments, aligning well with your goal of continuum-based scoring. Let me know if you’d like to dive deeper into any specific aspect of this approach!

@sshivaditya2019
Copy link

sshivaditya2019 commented Sep 22, 2024

My peer got back to me regarding the search engine recommendation

TF-IDF (Term Frequency-Inverse Document Frequency) is a classic algorithm used in search and information retrieval to evaluate how important a word is to a document relative to a collection of documents (often referred to as a "corpus"). It helps identify which terms are most relevant to the context of a specific document.

In this case, I am not sure how this is relevant. Here, we are assigning scores, within a Comment Thread Context and the comments are mutually exclusive from other Comments Thread, in terms of assigning relevance.

In the Context of Your Goals: Evaluating GitHub Comments

Given your objective to measure the value of GitHub comments in relation to problem-solving, TF-IDF could be a useful tool to assess the relevance and informational density of individual comments with respect to the overall issue or conversation.

Here's how TF-IDF might be applied in your scenario:

1. How TF-IDF Works

  • Term Frequency (TF): Measures how frequently a term appears in a comment. Higher frequencies suggest that the term is more important within that comment.

2. Applying TF-IDF to Evaluate Comment Relevance:

  • Identifying Key Terms in Comments:

    • TF-IDF will highlight terms in each comment that are not just common but are also distinctive in the context of the overall issue. This helps identify comments with unique and relevant insights.
  • Assessing Relevance to the Issue Description:

    • By comparing the TF-IDF scores of words in a comment to those in the issue description, you can measure how closely a comment aligns with the core problem. Comments with terms that have high TF-IDF relevance scores relative to the issue description are more likely to be valuable.

Just for context, TF-IDF is a transformation technique, not would give out a real valued vector. With which we then apply some distance metric like cosine similarity. This is almost similar to the Embedding and Vector Search.

  • Filtering Out Low-Value Contributions:

    • Comments that consist primarily of high-TF but low-IDF terms (e.g., generic phrases or filler words) can be identified as less valuable. This is particularly useful for identifying verbose comments from junior developers that lack unique insights.

These are not fixed. In the linked issue spec, Comments were relevant to the topic, but were flagged irrelevant. This will not be a issue as we would either way implement stemming or lemmatizing the input phrases and be tagging for POS (Parts of Speech).

3. Enhancing Your Continuum-Based Scoring System:

  • Weighted Relevance Scores: Use TF-IDF scores to assign a relevance weight to each comment, allowing you to rank comments on a continuum of importance rather than using binary relevance.
  • Combining with Other Metrics: Integrate TF-IDF scores with other continuous metrics (e.g., semantic similarity, readability) to create a comprehensive scoring system that reflects both the specificity and value of a comment.

Practical Steps for Implementation:

  1. Preprocess the Data: Tokenize the comments and issue descriptions, remove stop words, and normalize the text (e.g., lowercase conversion).
  2. Calculate TF-IDF: Apply TF-IDF to generate relevance scores for each comment.
  3. Score Aggregation: Aggregate the TF-IDF scores to quantify each comment’s overall contribution to solving the issue.

Benefits for Your Goals:

  • Objective Measurement of Relevance: TF-IDF provides a quantitative way to gauge how closely comments relate to the problem at hand.
  • Filtering Out Noise: Helps distinguish between high-value contributions and generic or off-topic comments.

I don't think this is possible. We would need to have some dictionary or something (WordNet), to assign values for words. This would not cater to specific words in comments like for eg: bug or fix, these words on its own will not have any value, and may appear as off topic.

  • Complementary to Other Techniques: Can be combined with PageRank, readability scores, or semantic similarity measures for a more holistic evaluation.

Using TF-IDF will give you an effective way to measure the informational value and relevance of comments, aligning well with your goal of continuum-based scoring. Let me know if you’d like to dive deeper into any specific aspect of this approach!

TF-IDF is a good starting point, but I don't believe it suits this problem well. We need to assign scores or relevances to comments, and since no two comment threads will have the same set of high TF-IDF words, this could penalize terms that are highly relevant to the context individually but not as whole for multiple comment threads.

@sshivaditya2019
Copy link

sshivaditya2019 commented Sep 22, 2024

I came up with a new approach to categorize comments into topic bins (Topic would be added using LLM/ML Model). We can then perform a similarity search using the topic, issue_title, and issue_body to generate a Topic-Comment-Alignment score for each comment.

Next, we can assess user engagement for each comment based on various roles, such as reactions and replies. The weight assigned to different types of engagement can vary depending on the role (e.g., Author, Collaborator, etc.).

Additionally, we’ll incorporate a credibility score to evaluate whether a comment was made by a verified member of the organization, a regular collaborator, or an unknown user.

The overall score could be calculated using the following formula:

$$ \text{Final Score} = \frac{(TCA \times E + C)}{W} $$

Where:

  • (TCA) = Topic-Comment-Alignment score
  • (E) = Engagement score (weighted by role)
  • (C) = Credibility score
  • (W) = Normalization factor to scale the score between [0, 1]

This will allow us to effectively evaluate the quality and relevance of comments.

@0x4007
Copy link
Member Author

0x4007 commented Sep 22, 2024

Credibility score we can adjust to the author no matter their position/relationship with the organization.

The spec author generally has the clearest vision on the task so if what is commented aligns with them (i.e. they agree) then more credit should be offered (this of course is only in the context of funded tasks)

Reactions we usually have a very limited amount of but I think reactions from author and core team could be a positive indicator.

If we can attribute block quotes that can be interesting. The problem there is that I generally comment from mobile and block quotes can be inconvenient but sometimes I make sure to do in order to enhance clarity. I would be more curious to experiment with attributing block quote crediting.

@sshivaditya2019
Copy link

sshivaditya2019 commented Sep 22, 2024

Credibility score we can adjust to the author no matter their position/relationship with the organization.

The spec author generally has the clearest vision on the task so if what is commented aligns with them (i.e. they agree) then more credit should be offered (this of course is only in the context of funded tasks)

Reactions we usually have a very limited amount of but I think reactions from author and core team could be a positive indicator.

If we can attribute block quotes that can be interesting. The problem there is that I generally comment from mobile and block quotes can be inconvenient but sometimes I make sure to do in order to enhance clarity. I would be more curious to experiment with attributing block quote crediting.

Otherwise is the method and scoring criteria fine ? @Keyrxng rfc, I think this should be good enough

Copy link
Contributor

ubiquity-os bot commented Sep 23, 2024

@sshivaditya2019, this task has been idle for a while. Please provide an update.

@0x4007
Copy link
Member Author

0x4007 commented Sep 23, 2024

Otherwise is the method and scoring criteria fine ? @Keyrxng rfc, I think this should be good enough

Sure yes let's try.

I have a feeling it might make sense to prototype strategies with a command line tool and then live test against real examples before we fully integrate it into our system.

Some strategies might prove to be bad and it would be unfortunate to invest in building and integrating them and then scrapping them right away.

@sshivaditya2019
Copy link

Is there an organization-wide execution time limit on plugins? I tried a version of this, but it's not very efficient since most processing is happening locally.

Is this plugin designed to run on GitHub Actions or on Cloudflare Workers?

@0x4007
Copy link
Member Author

0x4007 commented Sep 26, 2024

Conversation rewards plug-in is designed to run on GitHub actions. This is because we need to generate a virtual dom for every comment which can take anywhere between 100 and 500 ms in my testing long ago.

For long issues and linked pulls you can imagine that the rendering time can be quite substantial. With Cloudflare workers we are limited to ~500 ms as I recall.

We essentially have no limits for GitHub actions (6 hours per job, 20 concurrent jobs allowed, unlimited per day)

@sshivaditya2019
Copy link

sshivaditya2019 commented Sep 26, 2024

I think then timing would not be much of an issue then. Right now, a very crude version takes around 150 to 600ms depending on thread length. I can get the exact performance benchmark, but roughly for around 50 msgs it takes around 200ms (avg).

@0x4007
Copy link
Member Author

0x4007 commented Sep 26, 2024

Scoring Criteria to Incorporate

  • Topic-Comment-Alignment
  • User Engagement
  • Flesch-Kincaid Clarity
  • LLM Judge1
  • HTML Elements Counter

Footnotes

  1. Prompt needs to be refined. Right now it's sort of doing topic comment alignment and clarity. I suppose this could be a good first step before properly implementing any strategy since it is easiest to adjust outputs with a few words. So as an extension of this logic we can look at this as "duct tape" that we should rely less on as we refine and correctly implement our strategies.

@sshivaditya2019
Copy link

sshivaditya2019 commented Sep 26, 2024

Scoring Criteria to Incorporate

  • Topic-Comment-Alignment
  • User Engagement
  • Flesch-Kincaid Clarity
  • LLM Judge1
  • HTML Elements Counter

I think the updated prompt works fine as well. You could check that out in the linked PR.

Footnotes

  1. Prompt needs to be refined. Right now it's sort of doing topic comment alignment and clarity. I suppose this could be a good first step before properly implementing any strategy since it is easiest to adjust outputs with a few words. So as an extension of this logic we can look at this as "duct tape" that we should rely less on as we refine and correctly implement our strategies.

@0x4007
Copy link
Member Author

0x4007 commented Sep 26, 2024

I think what might make the most sense for next steps:

  1. We consider this more of a research task where the primary rewards are for our comments here
  2. We break off the sub strategies into separate small priced tasks
  3. We reduce the time estimate on this to something like 1 day (not sure how long it took you to make the prompt and implement the changes) in order to reduce the code credit. This is because the original vision here was to batch everything in this pull but I realize that we ended up coming up with several seemingly viable strategies, so it makes sense to make multiple tasks.

@0x4007
Copy link
Member Author

0x4007 commented Sep 26, 2024

I guess our plugins are down but I anticipated 300 USD price.

Copy link
Contributor

ubiquity-os bot commented Sep 30, 2024

Note

This output has been truncated due to the comment length limit.

 [ 356.382 WXDAI ] 

@sshivaditya2019
Contributions Overview
ViewContributionCountReward
IssueTask1300
IssueComment2256.382
ReviewComment90

 [ 142.738 WXDAI ] 

@0x4007
Contributions Overview
ViewContributionCountReward
IssueSpecification15.85
IssueComment21131.142
ReviewComment65.746

 [ 29.612 WXDAI ] 

@gentlementlegen
Contributions Overview
ViewContributionCountReward
IssueComment829.612

@gentlementlegen
Copy link
Member

@0x4007 I noticed the comment reward tends to be truncated a lot. Maybe we should look into optimizing the contents somehow.

@0x4007
Copy link
Member Author

0x4007 commented Oct 5, 2024

Sure you can make a normal priority task for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants