-
Notifications
You must be signed in to change notification settings - Fork 61
Natural Language Configuration Modifications: E2E Tests. #889
Comments
Is there any sort of docs/whitepaper or anything along those lines that details specifics as to how element scoring works for example? Can additional elements be added in to the comment elements as per the partner's discretion? What are the parameters for what can be changed and what shouldn't be changed? Should the changes be pushed directly into the default_branch or opened as a PR to avoid any mishaps? E2E should cover most cases but committing directly to the working branch seems risky, if there is an issue someone needs to get eyes on anyway, so getting eyes on to approve the review seems like a better UX considering that.
I asked and it added a whole bunch of new elements, is it a lot more restrictive than that I guess? |
What is the preferred structure for tests, I'm assuming just write them into I see that you have your tests for issue in it's own dir, should all handlers that need it be put into their own dir with their respective tests and mocks? |
I'm struggling to conceptualize effective tests atm
My thoughts are:
"issueCreatorMultiplier": 3,
"maxPermitPrice": 1000
P.S: When it comes to NFT permits are they using maxPermitPrice = 1 or are they having their own config object setup? LogsCOMMAND:
It typically enters the inferred key:value incorrect and then after it reads the validation errors (can chain upto 3, 4 times depending on the prompt) it easily resolves them. ' "incentive_elements_scoring": "0-5",\n' +
' "reward_for_well_formatted_responses": "false",\n' +
' "reward_for_fully_featured_responses": "false"\n' +
'
' {\n' +
' "instancePath": "",\n' +
' "schemaPath": "#/additionalProperties",\n' +
' "keyword": "additionalProperties",\n' +
' "params": {\n' +
' "additionalProperty": "reward_for_fully_featured_responses"\n' +
' },\n' +
' "message": "must NOT have additional properties"\n' +
' }\n' |
I can share the philosophy behind this. The idea is that partners can credit comments that are crafted with care. The configuration technically makes this possible to process every comment with granular precision (down to the tag level as you're aware) but it is up to the partner's discretion as to exactly how they are processed and credited. I imagine that we will experiment within Ubiquity and recommend default settings to our partners based on our internal results. I've noticed that comments written with lists generally are higher quality (i.e. more informative and expressive) than those without. Comments with links as context/evidence and images also generally are significantly more informative/valuable than comments with little-to-no-formatting. This is based off of anecdotal evidence. That is the inspiration behind this technology. Regarding how it works, you set a price that is credited every time the HTML tag appears in the comment. You can also choose to ignore crediting of specific HTML tags (e.g. blockquotes, why would you get credited for somebody else's contribution?)
Yes it is designed to be fully configurable with support for every HTML entity.
We can start simple with some of the major ones. I am unsure off hand but probably makes sense to focus on things that are likely to get changed frequently, or are less ambiguous on what are makes sense for sensible values. I wouldn't know 100% without spending time on the code and experimenting.
If it is a stable functionality (runtime tests can help determine this) then it should push to the default branch.
I think you overestimated the LLM's abilities without the context/anecdotal evidence I have of reviewing comments over the years on GitHub. You'll need to somehow provide the LLM with that context in order to produce good results for this type of query.
We should have the tests next to the code that is being tested. That's why as I understand it, Jest etc use globbing to find files with
Using AJV to test the results is very valuable. We should try and define all constraints using AJV. It is a concise and unambiguous way to define expected "correct" results, both for manually changing the configuration or ChatGPT doing so.
I think a more effective query would be the following: /config credit LI $1 each, all header tags (H1-H6) $1 each, and images $5. Everything else should be ignored. |
Make appropriate e2e tests using Jest to ensure reliability.
etc
The text was updated successfully, but these errors were encountered: