Skip to content

Commit

Permalink
Optimize start position of DOMPage findRecord()
Browse files Browse the repository at this point in the history
At least in the use case that we have, we tend to scan the page for the same record, or one that comes later than it in position. So instead of beginning each scan from 0, record the last position which succeeded, and start from there. Loop round to scan the unscanned region if that fails.

That seems to save us a significant amount of time;
DOMFile$DOMPage$findRecord was 25% of CPU in our “before” workload, it is 3% after.
DOMFile.findRecord was 41% of CPU, and is now 23% of CPU.
Query timings have gone from about 4.9s to about 3.6s
  • Loading branch information
alanpaxton authored and adamretter committed Dec 1, 2024
1 parent cd3bc3a commit 26fe77f
Showing 1 changed file with 28 additions and 2 deletions.
30 changes: 28 additions & 2 deletions exist-core/src/main/java/org/exist/storage/dom/DOMFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import java.text.NumberFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
Expand Down Expand Up @@ -3048,6 +3049,8 @@ protected final class DOMPage implements Cacheable {
// set to true if the page has been removed from the cache
boolean invalidated = false;

AtomicInteger lastFound = new AtomicInteger(0);

DOMPage() {
this.page = createNewPage();
this.pageHeader = (DOMFilePageHeader) page.getPageHeader();
Expand Down Expand Up @@ -3093,10 +3096,33 @@ Page createNewPage() {
}
}

/**
* Optimize scanning for records in a page
* Based on the fact that we are looking for the same thing again,
* Or we are looking for something after the last thing.
*
* So, start the scan where we left off before.
*
* @param targetId the tuple id we are looking for in the page
*
* @return a record describing the tuple, if we found it, otherwise null
*/
RecordPos findRecord(final short targetId) {
final int dlen = pageHeader.getDataLength();

int startScan = lastFound.get();
RecordPos rec = findRecordInRange(targetId, startScan, pageHeader.getDataLength());
if (rec == null) {
rec = findRecordInRange(targetId, 0, startScan);
}
if (rec != null) {
// start from here again next time; step back over the tuple id
lastFound.set(rec.offset - LENGTH_TID);
}
return rec;
}
RecordPos findRecordInRange(final short targetId, final int from, final int to) {
RecordPos rec = null;
for (int pos = 0; pos < dlen;) {
for (int pos = from; pos < to;) {
final short tupleID = ByteConversion.byteToShort(data, pos);
pos += LENGTH_TID;
if (ItemId.matches(tupleID, targetId)) {
Expand Down

0 comments on commit 26fe77f

Please sign in to comment.