Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add commit message linter #12

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

feat: add commit message linter #12

wants to merge 8 commits into from

Conversation

mtojek
Copy link
Member

@mtojek mtojek commented Dec 6, 2024

Fixes: #11

For reviewers: If you want to play with the tool you need to setup OpenAI key, and if you need one please let me know.

This PR introduces a commit message (or PR titles) linter according to repository style guide.

The biggest challenge is ensuring that the results are consistent. AI often draws incorrect conclusions and tends to struggle with maintaining precise line lengths. We might need to revise our COMMITS.md guidelines: instead of saying limit line length to 50 characters, we could try keep commit messages concise.

In a follow-up PR, we can introduce a custom output format, e.g. JSON struct.

make build
cd ../coder
../aicommit/bin/aicommit --lint "fix: make GetWorkspacesEligibleForTransition return even less false positives"

../aicommit/bin/aicommit --lint "fix: make GetWorkspacesEligibleForTransition return even less false positives"
❌ Limit the subject line to 50 characters.
✅ Use the imperative mood in the subject line.
✅ Capitalize the subject line such as "Fix Issue 886" and don't end it with a period.
✅ The subject line should summarize the main change concisely.
✅ Only include a body if absolutely necessary for complex changes.
✅ If a body is needed, separate it from the subject with a blank line.
✅ Wrap the body at 72 characters.
✅ In the body, explain the why, not the what (the diff shows the what).
✅ Use bullet points in the body only for truly distinct changes.
✅ Be extremely concise. Assume the reader can understand the diff.
✅ Never repeat information between the subject and body.
✅ Do not repeat commit messages from previous commits.
✅ Prioritize clarity and brevity over completeness.
✅ Adhere to the repository's commit style if it exists.

suggestion: fix: reduce false positives in GetWorkspacesEligibleForTransition

Troubleshooting:

export AICOMMIT_DEBUG=true # see prompt and tokens

@mtojek mtojek self-assigned this Dec 6, 2024
@mtojek mtojek requested review from ammario and mafredri December 6, 2024 12:37
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, will be interesting to try this out. Few thoughts/suggestions but otherwise looks good to me.

cmd/aicommit/lint.go Outdated Show resolved Hide resolved
prompt.go Show resolved Hide resolved
Role: openai.ChatMessageRoleSystem,
Content: "Here is the commit message to lint:\n" + commitMessage,
})
return resp, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include past commit titles here as well, like in BuildPrompt, for reference of the style? Ideally a solid set of rules is probably better than what was written in the past, so this is just a thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. Theoretically, the style guide rules should be sufficient. I skipped them to significantly reduce the number of $ tokens in the prompt.

Do you think we should experiment with previous commit messages?

cmd/aicommit/lint.go Show resolved Hide resolved
cmd/aicommit/lint.go Show resolved Hide resolved
@mafredri
Copy link
Member

mafredri commented Dec 9, 2024

Looking at the example output you posted, I flagged few things:

  • ✅ Capitalize the subject line such as "Fix Issue 886" and don't end it with a period. - Why? This sounds incorrect, the subject wasn't capitalized
  • ✅ The subject line should summarize the main change concisely. - This is nonsensical for a title that includes no changes
  • ✅ Do not repeat commit messages from previous commits. - We didn't include any, fwiw
  • etc.

@mtojek
Copy link
Member Author

mtojek commented Dec 10, 2024

Looking at the example output you posted, I flagged few things:

What you're saying is accurate, but it seems to be a matter of visualization. For instance:

Capitalize the subject line such as "Fix Issue 886" and don't end it with a period. - Why? This sounds incorrect, the subject wasn't capitalized

The most accurate answer from Chat would be "N/A", but currently we default to ✅

  • This is nonsensical for a title that includes no changes
  • We didn't include any, FWIW

Ha! Do you suggest we should introduce PR_TITLES.MD? Probably not, but otherwise it will be hard to let Chat review the lining rules and leave the only ones that could apply to PR titles. Unless we tweak COMMITS.md to have a separate section for PR titles? Thoughts?

@mafredri
Copy link
Member

mafredri commented Dec 10, 2024

The most accurate answer from Chat would be "N/A", but currently we default to ✅

I agree that N/A is better, but would also like to point out that as per conventional commits:

Any casing may be used, but it’s best to be consistent.

It's not correct to say either capitalize or don't without having a reference to other commits. Not sure if or how we should address this though but may in fact be worth making it N/A as the output is just confusing.

Ha! Do you suggest we should introduce PR_TITLES.MD? Probably not, but otherwise it will be hard to let Chat review the lining rules and leave the only ones that could apply to PR titles. Unless we tweak COMMITS.md to have a separate section for PR titles? Thoughts?

Heh, no let's not. Perhaps we can just add to the prompt for linting:

These rules apply when generating commit tiles based on changes and repository history, but right now you are operating as a commit title linter and don't have access to that information. Don't make assumptions outside of what is explicitly stated in the rules.

@mtojek
Copy link
Member Author

mtojek commented Dec 12, 2024

@mafredri I rephrased prompts to verify if linting rules are applicable. This is the result running for coder/coder:

 ../aicommit/bin/aicommit --lint "fix: make GetWorkspacesEligibleForTransition return even less false positives" -m gpt-4o
❌ Limit the subject line to 50 characters.
✅ Use the imperative mood in the subject line.
✅ Capitalize the subject line such as "Fix Issue 886" and don't end it with a period.
✅ The subject line should summarize the main change concisely.
🤫 Only include a body if absolutely necessary for complex changes.
🤫 If a body is needed, separate it from the subject with a blank line.
🤫 Wrap the body at 72 characters.
🤫 In the body, explain the why, not the what (the diff shows the what).
🤫 Use bullet points in the body only for truly distinct changes.
🤫 Be extremely concise. Assume the reader can understand the diff.
🤫 Never repeat information between the subject and body.
🤫 Do not repeat commit messages from previous commits.
✅ Prioritize clarity and brevity over completeness.
✅ Adhere to the repository's commit style if it exists.

suggestion: fix: optimize GetWorkspacesEligibleForTransition accuracy

The biggest concern is still response stability, but I'm afraid this is something we have to deal with.

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement 👍🏻. I second your worry about stable output, and I also worry about the amount of rules we're applying here. I think the only way this can work to a satisfactory degree is to either include commit context/history when linting or to explicitly mark which rules apply to linting and which to auto commits. Wdyt?

prompt.go Outdated
Content: strings.Join([]string{
"You are `aicommit`, a tool designed to lint commit messages and generate a detailed linting report.",
"You are operating in pull request (PR) title linting mode.",
"In this mode, linting rules for commit subjects, bodies, or bullet points are not applicable.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it makes too many assumptions about the actual style-guide. It's based on this repository style guide but others may have completely different rules. Maybe we need a different way to signify what rules can be used for title linting?

Also, this rule is still applicable but disabled by this:

If a body is needed, separate it from the subject with a blank line.

If people need to tweak rules based on mode, we could allow a syntax like:

* Every commit title must contain an emoji [commit, lint]
* Title must reference changes [commit]
* EVERYTHING MUST BE IN CAPS (this rule applies to all modes)
* everything must be in lower case (this rule also applies to all modes?) [all]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either include commit context/history when linting or to explicitly mark which rules apply to linting and which to auto commits.

Speaking of the first option (commit history), how will it affect the lining process? Should the tool use these commits to inform the linter that these messages are actually compliant with repo style guide?

Maybe we need a different way to signify what rules can be used for title linting?

I can try implementing this approach as well. Honestly, I can't predict what will be the outcome, it is a magical black box :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we're including Git history in the prompt now. It is hard to say whether it improved the overall linting process. Suggestions are usually fine, but linting is not super stable.

Can we prepare COMMITS.md for coder/websocket and validate aicommits against it? We will need the style guide anyway. I can arrange this but I'm curious what you think, @mafredri.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for trying that out. The stability seems like the biggest issue, and it means the linter can't be used for enforcement, unfortunately.

I tried copying in the conventional commits specification into coder/coder COMMITS.md and linting the sample title, but it keeps modifying the rules between invocations and also enabling/disabling rules arbitrarily as non-applicable. :/

It's even failing the first rule occasionally even though it starts with fix: ... 😔.

Here's also an example of a rule it decided to simplify on a whim (it did this for multiple rules):

✅ Rule 4: A scope MAY be provided after a type. A scope MUST consist of a noun describing a section of the codebase surrounded by parenthesis, e.g., fix(parser):

vs

✅ Rule 4: A scope MAY be provided after a type.

This is an interesting experiment, but I wonder if it would be most sensible to lint coder/websocket PR titles with a more traditional approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting experiment, but I wonder if it would be most sensible to lint coder/websocket PR titles with a more traditional approach.

I agree, it eliminates aicommit from linting competition, at least today. Maybe there is a secret prompt than can ensure response stability, but so far I haven't discovered it yet. I think I'm going to pause the experiment now.

prompt.go Outdated Show resolved Hide resolved
prompt.go Outdated Show resolved Hide resolved
prompt.go Outdated Show resolved Hide resolved
@mtojek mtojek requested a review from mafredri December 18, 2024 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: lint PR titles
2 participants