feat: add commit message linter #12

mtojek · 2024-12-06T12:33:55Z

Fixes: #11

For reviewers: If you want to play with the tool you need to setup OpenAI key, and if you need one please let me know.

This PR introduces a commit message (or PR titles) linter according to repository style guide.

The biggest challenge is ensuring that the results are consistent. AI often draws incorrect conclusions and tends to struggle with maintaining precise line lengths. We might need to revise our COMMITS.md guidelines: instead of saying limit line length to 50 characters, we could try keep commit messages concise.

In a follow-up PR, we can introduce a custom output format, e.g. JSON struct.

make build
cd ../coder
../aicommit/bin/aicommit --lint "fix: make GetWorkspacesEligibleForTransition return even less false positives"

../aicommit/bin/aicommit --lint "fix: make GetWorkspacesEligibleForTransition return even less false positives"
❌ Limit the subject line to 50 characters.
✅ Use the imperative mood in the subject line.
✅ Capitalize the subject line such as "Fix Issue 886" and don't end it with a period.
✅ The subject line should summarize the main change concisely.
✅ Only include a body if absolutely necessary for complex changes.
✅ If a body is needed, separate it from the subject with a blank line.
✅ Wrap the body at 72 characters.
✅ In the body, explain the why, not the what (the diff shows the what).
✅ Use bullet points in the body only for truly distinct changes.
✅ Be extremely concise. Assume the reader can understand the diff.
✅ Never repeat information between the subject and body.
✅ Do not repeat commit messages from previous commits.
✅ Prioritize clarity and brevity over completeness.
✅ Adhere to the repository's commit style if it exists.

suggestion: fix: reduce false positives in GetWorkspacesEligibleForTransition

Troubleshooting:

export AICOMMIT_DEBUG=true # see prompt and tokens

mafredri

Nice, will be interesting to try this out. Few thoughts/suggestions but otherwise looks good to me.

cmd/aicommit/lint.go

prompt.go

mafredri · 2024-12-09T09:18:51Z

prompt.go

+		Role:    openai.ChatMessageRoleSystem,
+		Content: "Here is the commit message to lint:\n" + commitMessage,
+	})
+	return resp, nil


Should we include past commit titles here as well, like in BuildPrompt, for reference of the style? Ideally a solid set of rules is probably better than what was written in the past, so this is just a thought.

I'm not sure. Theoretically, the style guide rules should be sufficient. I skipped them to significantly reduce the number of $ tokens in the prompt.

Do you think we should experiment with previous commit messages?

cmd/aicommit/lint.go

mafredri · 2024-12-09T09:32:18Z

Looking at the example output you posted, I flagged few things:

✅ Capitalize the subject line such as "Fix Issue 886" and don't end it with a period. - Why? This sounds incorrect, the subject wasn't capitalized
✅ The subject line should summarize the main change concisely. - This is nonsensical for a title that includes no changes
✅ Do not repeat commit messages from previous commits. - We didn't include any, fwiw
etc.

mtojek · 2024-12-10T08:47:06Z

Looking at the example output you posted, I flagged few things:

What you're saying is accurate, but it seems to be a matter of visualization. For instance:

Capitalize the subject line such as "Fix Issue 886" and don't end it with a period. - Why? This sounds incorrect, the subject wasn't capitalized

The most accurate answer from Chat would be "N/A", but currently we default to ✅

This is nonsensical for a title that includes no changes

We didn't include any, FWIW

Ha! Do you suggest we should introduce PR_TITLES.MD? Probably not, but otherwise it will be hard to let Chat review the lining rules and leave the only ones that could apply to PR titles. Unless we tweak COMMITS.md to have a separate section for PR titles? Thoughts?

mafredri · 2024-12-10T11:57:31Z

The most accurate answer from Chat would be "N/A", but currently we default to ✅

I agree that N/A is better, but would also like to point out that as per conventional commits:

Any casing may be used, but it’s best to be consistent.

It's not correct to say either capitalize or don't without having a reference to other commits. Not sure if or how we should address this though but may in fact be worth making it N/A as the output is just confusing.

Ha! Do you suggest we should introduce PR_TITLES.MD? Probably not, but otherwise it will be hard to let Chat review the lining rules and leave the only ones that could apply to PR titles. Unless we tweak COMMITS.md to have a separate section for PR titles? Thoughts?

Heh, no let's not. Perhaps we can just add to the prompt for linting:

These rules apply when generating commit tiles based on changes and repository history, but right now you are operating as a commit title linter and don't have access to that information. Don't make assumptions outside of what is explicitly stated in the rules.

mtojek · 2024-12-12T11:50:35Z

@mafredri I rephrased prompts to verify if linting rules are applicable. This is the result running for coder/coder:

 ../aicommit/bin/aicommit --lint "fix: make GetWorkspacesEligibleForTransition return even less false positives" -m gpt-4o
❌ Limit the subject line to 50 characters.
✅ Use the imperative mood in the subject line.
✅ Capitalize the subject line such as "Fix Issue 886" and don't end it with a period.
✅ The subject line should summarize the main change concisely.
🤫 Only include a body if absolutely necessary for complex changes.
🤫 If a body is needed, separate it from the subject with a blank line.
🤫 Wrap the body at 72 characters.
🤫 In the body, explain the why, not the what (the diff shows the what).
🤫 Use bullet points in the body only for truly distinct changes.
🤫 Be extremely concise. Assume the reader can understand the diff.
🤫 Never repeat information between the subject and body.
🤫 Do not repeat commit messages from previous commits.
✅ Prioritize clarity and brevity over completeness.
✅ Adhere to the repository's commit style if it exists.

suggestion: fix: optimize GetWorkspacesEligibleForTransition accuracy

The biggest concern is still response stability, but I'm afraid this is something we have to deal with.

mafredri

Nice improvement 👍🏻. I second your worry about stable output, and I also worry about the amount of rules we're applying here. I think the only way this can work to a satisfactory degree is to either include commit context/history when linting or to explicitly mark which rules apply to linting and which to auto commits. Wdyt?

mafredri · 2024-12-12T12:10:05Z

prompt.go

+			Content: strings.Join([]string{
+				"You are `aicommit`, a tool designed to lint commit messages and generate a detailed linting report.",
+				"You are operating in pull request (PR) title linting mode.",
+				"In this mode, linting rules for commit subjects, bodies, or bullet points are not applicable.",


This feels like it makes too many assumptions about the actual style-guide. It's based on this repository style guide but others may have completely different rules. Maybe we need a different way to signify what rules can be used for title linting?

Also, this rule is still applicable but disabled by this:

If a body is needed, separate it from the subject with a blank line.

If people need to tweak rules based on mode, we could allow a syntax like:

* Every commit title must contain an emoji [commit, lint] * Title must reference changes [commit] * EVERYTHING MUST BE IN CAPS (this rule applies to all modes) * everything must be in lower case (this rule also applies to all modes?) [all]

either include commit context/history when linting or to explicitly mark which rules apply to linting and which to auto commits.

Speaking of the first option (commit history), how will it affect the lining process? Should the tool use these commits to inform the linter that these messages are actually compliant with repo style guide?

Maybe we need a different way to signify what rules can be used for title linting?

I can try implementing this approach as well. Honestly, I can't predict what will be the outcome, it is a magical black box :)

Ok, we're including Git history in the prompt now. It is hard to say whether it improved the overall linting process. Suggestions are usually fine, but linting is not super stable.

Can we prepare COMMITS.md for coder/websocket and validate aicommits against it? We will need the style guide anyway. I can arrange this but I'm curious what you think, @mafredri.

Cool, thanks for trying that out. The stability seems like the biggest issue, and it means the linter can't be used for enforcement, unfortunately.

I tried copying in the conventional commits specification into coder/coder COMMITS.md and linting the sample title, but it keeps modifying the rules between invocations and also enabling/disabling rules arbitrarily as non-applicable. :/

It's even failing the first rule occasionally even though it starts with fix: ... 😔.

Here's also an example of a rule it decided to simplify on a whim (it did this for multiple rules):

✅ Rule 4: A scope MAY be provided after a type. A scope MUST consist of a noun describing a section of the codebase surrounded by parenthesis, e.g., fix(parser):

vs

✅ Rule 4: A scope MAY be provided after a type.

This is an interesting experiment, but I wonder if it would be most sensible to lint coder/websocket PR titles with a more traditional approach.

This is an interesting experiment, but I wonder if it would be most sensible to lint coder/websocket PR titles with a more traditional approach.

I agree, it eliminates aicommit from linting competition, at least today. Maybe there is a secret prompt than can ensure response stability, but so far I haven't discovered it yet. I think I'm going to pause the experiment now.

prompt.go

mtojek added 3 commits December 6, 2024 12:49

feat: lint commit message

63e7eb5

unicode chars

0290f3d

exit code

2bf3e0f

mtojek self-assigned this Dec 6, 2024

mtojek requested review from ammario and mafredri December 6, 2024 12:37

mafredri reviewed Dec 9, 2024

View reviewed changes

minor fixes

f6d6294

Rephrase prompts

b414a9b

mafredri reviewed Dec 12, 2024

View reviewed changes

mtojek added 3 commits December 12, 2024 14:44

fix: rule emoji

a79afdc

Rephrase prompt

6e1ff48

Use Git history

ed93239

mtojek requested a review from mafredri December 18, 2024 10:27

mafredri removed their request for review January 20, 2025 09:33

feat: add commit message linter #12

Are you sure you want to change the base?

feat: add commit message linter #12

Uh oh!

Conversation

mtojek commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mafredri Dec 9, 2024

Choose a reason for hiding this comment

Uh oh!

mtojek Dec 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mafredri commented Dec 9, 2024

Uh oh!

mtojek commented Dec 10, 2024

Uh oh!

mafredri commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mtojek commented Dec 12, 2024

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

mafredri Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

mtojek Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

mtojek Dec 18, 2024

Choose a reason for hiding this comment

Uh oh!

mafredri Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

mtojek Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mtojek commented Dec 6, 2024 •

edited

Loading

mafredri commented Dec 10, 2024 •

edited

Loading