-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #344: add multi-thread support to CheckstyleReportsParser #521
Conversation
876c2b1
to
5c67b71
Compare
Running diff on openjdk14 with modules <module name="Indentation"/>
<module name="MagicNumber"/>
<module name="MissingCtor"/>
<module name="RequireThis"/>
<module name="ReturnCount">
<property name="max" value="1"/>
<property name="maxForVoid" value="0"/>
</module>
<module name="SimplifyBooleanExpression"/>
<module name="SimplifyBooleanReturn"/>
<module name="StringLiteralEquality"/>
<module name="VariableDeclarationUsageDistance"/>
<module name="AvoidStarImport"/>
<module name="AvoidStaticImport"/>
<module name="MissingJavadocMethod">
<property name="allowMissingPropertyJavadoc" value="true"/>
</module>
<module name="MissingJavadocPackage"/>
<module name="MissingJavadocType"/>
<module name="JavaNCSS"/>
<module name="NPathComplexity"/> produces checkstyle-result.xml with size 140M Diff generation timings (best result from 10 runs)Master:
Patch:
|
Heh, from 13 second to 4 seconds, in worse case. |
@@ -76,6 +77,7 @@ public Statistics getStatistics() { | |||
public void addRecords(List<CheckstyleRecord> newRecords, | |||
String filename) { | |||
if (!newRecords.isEmpty()) { | |||
newRecords.sort(new PositionOrderComparator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are using PositionOrderComparator
for sorting lines within a file?
This comparator uses only line
and column
, as a result there may be false positives for
line 100, col 100, messageA
line 100, col 100, messageB
versus
line 100, col 100, messageB
line 100, col 100, messageA
I'm going to use same specificCompare
method here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did not review this code thoroughly, it is just a code that works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Well, now i wonder why CheckstyleRecord
does not implement Comparable
but instead exposes a method used in sorting? This should be reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fix it if you think it weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
5c67b71
to
1b3ea10
Compare
1b3ea10
to
9889f52
Compare
9889f52
to
5c70a4b
Compare
It really depends on the differences. When I created the issue, I had a 1 hour run and MT reduced it to 20 minutes. |
* Severity level for a {@link CheckstyleRecord}. | ||
* Used to compare two records in predefined order: "info", "warning", "error", all other. | ||
*/ | ||
public enum SeverityLevel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diff report tool has no dependency on Checkstyle itself. This enum is to provide an order other then alphabetical to record severity. May be this enum is redundant and confusing. I'm going to review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. @rnveach please review
patch-diff-report-tool/src/main/java/com/github/checkstyle/data/DiffReport.java
Outdated
Show resolved
Hide resolved
5c70a4b
to
611ef49
Compare
statistics.incrementUniqueMessageCount(rec.getIndex()); | ||
} | ||
} | ||
records.entrySet().parallelStream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parallelizing this loop gives 5% more speed for large diffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upd: not in this PR. The class Statistics
is not thread-safe.
fd8f6ba
to
b65b85a
Compare
b65b85a
to
f48cea1
Compare
The diff report is available at https://pbludov.github.io/ The first report is built with patch-diff-report-tool from They are all equal. |
To be updated as jobs run: Files used: http://rveach.no-ip.org/checkstyle/regression/reports/294/ (Warning XML is 169 megs) (Patch and master XMLs are 100% identical)
master run:
this PR's run:
my PR's run with 6 processors:
|
@pbludov I do like your changes more. The thing is your changes aren't multi-threading. It is just pure optimization and so it all still runs single threaded. The quick run is just from 2 exact same files, I would like to see what is the time when everything is different. I assume it would then result in O(N^2) time since we have to go through each list twice, plus adding a sorting step which may result in longer times the more random the data gets. I am fine if we want to keep this but still look into some way to multi-thread this. |
Issue #344
@rnveach @romani @strkkk
Here is my attempt to speed up the diff report generation
CheckstyleRecord
now implementsComparable
specificEquals
nowComparable::compareTo
as we need the order of recordsproduceDiff
now is O(N) as both lists are sortedPositionOrderComparator
removedproduceDiff
now is a generic method and extracted toDiffUtils
classCliPaths
renamed toCliOptions
TODO: