-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential for significant perf improvements in large repos #53
Comments
cc @wmhilton |
I'm on the same boat as @JacksonKearl .
I can see there is a separate effort #28, to support native fs in the future, but that feature seems to be still in a quite early phase of adoption. I can understand the concept of batching write operations does kind of break the @wmhilton , I'd love to hear your opinion and suggestions. Thanks, |
I don't see how this violates the FS contract? The contract only says that once I've called writeFile, I get a promise, and when that promise is resolved I can read the file. It doesn't require anything about the backend. |
@JacksonKearl I apologise if I haven't fully understood your proposal, but I got the idea was to wrap several file write operations in a single IDB transaction, rather than having a separate transaction for each file. In a pure |
I don't know what you mean by "pure" here. Any asynchronous filesystem (which is pretty much all of them, besides solely in-memory ones) doesn't guarantee being able to read until the write promise resolves. That's kinda the whole point really. That property is preserved here. |
I was referring to reading from a file already saved in the context of the current transaction, rather than the file being currently written. By wrapping all file write operations in a single transaction, the overall time required to write all files is significantly reduced but the time required to be able to read from the first file in the batch is significantly increased. It's a trade off. Operations like |
@JacksonKearl That sounds really promising for speeding up git checkout! Have you got a branch I could look at? Off the top of my head, I don't see how you could implement |
@wmhilton no unfortunately we opted for a different route (downloading via archive endpoints) and are no longer using this project. The code sample I included in the original post (and below) has an implementation of batching with no extra cost for single operations. Basically you send off operations immediately if there are none in progress, or batch them together if there are some in progress, and send them all off when the in progress batch finishes. It's the kind of thing that you'd think the DB would do for you but I guess not yet. It's interesting to note that in my testing when firing a sequence of N single put requests the first one wont resolve until the last one is fired, so there is still some sort of batching going on, but just much less efficiently than grouping all changes into a single transaction. The Batcher class: class Batcher<T> {
private ongoing: Promise<void> | undefined
private items: { item: T, onProcessed: () => void }[] = []
constructor(private executor: (items: T[]) => Promise<void>) { }
private async process() {
const toProcess = this.items;
this.items = [];
await this.executor(toProcess.map(({ item }) => item))
toProcess.map(({ onProcessed }) => onProcessed())
if (this.items.length) {
this.ongoing = this.process()
} else {
this.ongoing = undefined
}
}
async queue(item: T): Promise<void> {
const result = new Promise<void>((resolve) => this.items.push({ item, onProcessed: resolve }))
if (!this.ongoing) this.ongoing = this.process()
return result
}
} (this is roughly inspired by the Throttler class used extensively internally in vscode) |
Another thing to watch out for is memory footprint, specially when dealing with fast emitting sources. I think the File System Access API adoption (#28), plus support for writable streams, would bring an important leap in performance. I would personally put my efforts on that front :) |
First, thanks for your work on this project 🙂
The current implementation is fairly slow with large repos, for instance vscode, which has around 5000 files or typescript, which has around 50k. It takes about a minute to clone vscode with
--singleBranch
and--depth 1
, and doesn't manage to clone typescript in the ~15 minutes I waited.By adding batching to the indexdb writes (Put all writes into a single transaction rather than one transaction per file) and changing the autoinc in the cachefs to increment a counter rather than search for the highest inode (the search means writing N files is N^2 time), I am able to see vscode clone in ~20 seconds and typescript clone in about 2 minutes. this is approx 3x slower than native for vscode and 6x slower than native for typescript.
Batching:
Counter:
Please let me know if you'd consider incorporating these changes... the batching should be safe, I'm not super sure about the autoinc, but I don't see a reason why it would cause issues (the main difference is deleting a file would free up its inode value in the original implementation but doesn't here, but that shouldn't be a problem AFAIK)
The text was updated successfully, but these errors were encountered: