-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance when checking if track already exists #8
Comments
So I made quick test, open file, rename it and try to
|
That takes it down from 13 to 1.5 minutes. I would still like to try my idea. But this little change can be implemented nonetheless and is a quick win. It would even improve my first implementation, where I would fall back to the old method in many scenarios. |
Originally new files data would overwrite old ones, but that got lost when I changed things, since files rarely change ignoring existing files seems like a good idea they definitely shouldn't be parsed again. On path mapping, I didn't even know that relative file paths were exposed and I am certainly not opposed on seeing it implemented as a prototype. Also comparing logic should probably move to the worker thread, so it doesn't block main thread as much. And only new tracks are sent to the main thread. But that's for the future. |
For that, I would store File.lastModified in the track data and compare that on second import.
It doesn't, but you can write something that concatenates the names of the folders and files while iterating over them. interface SnaeFile {
rootDirectoryHandle: FileSystemDirectoryHandle;
directoryHandle: FileSystemDirectoryHandle;
fileHandle: FileSystemFileHandle;
}
const getAllFileHandlesFromDirectory = async (
directoryHandle: FileSystemDirectoryHandle,
path = directoryHandle.name,
rootDirectoryHandle = directoryHandle,
) => {
const fileSystemHandles: Record<string, SnaeFile> = {};
for await (const fileSystemHandle of directoryHandle.values()) {
if (fileSystemHandle.kind === 'directory') {
Object.assign(
fileSystemHandles,
await getAllFileHandlesFromDirectory(
fileSystemHandle,
`${path}/${fileSystemHandle.name}`,
rootDirectoryHandle,
),
);
} else {
fileSystemHandles[`${path}/${fileSystemHandle.name}`] = {
rootDirectoryHandle,
directoryHandle,
fileHandle: fileSystemHandle,
};
}
}
return fileSystemHandles;
};
Yeah, even saving things to the indexed db could be done in a worker. Maybe an extra SharedWorker just for accessing the database would be a good idea. |
To access file from In MDN there is example of directoryHandle.resolve(handle) I am not quite sure what it does but seems to return relative paths of given file handle. That probably is still slow, and your approach is likely faster.
We can't use Currently all data is managed inside SolidJS createStore, IndexedDB is only called when data inside that store changes. Basically a redux. Doing the reverse is tricky, indexedDB doesn't provide a way to subscribe to changes, so you still need notify main thread manually, with new data. |
@lennyanders could you check by how much #9 improves things? Moving diffing to web worker and using relative paths for comparison are still thing I would want to do. |
I left a comment in the PR. I also have a test now for my method, and it already works really well: chrome-fast-fs-diffing One thing that would be good for this to work properly, is some kind of list with the selected directories in the settings/library page. Just a simple list with the selected directories, with an x icon or something to remove each directory. This would bring the Player also more close to other "traditional" music players. |
I really like the approach of saving
I had that at the beginning, when moving from Electron to web app. I thought it was bit confusing since you could only can show directory names, since paths are unavailable, so what if you add folders with same name:
How do you know which is which? Maybe showing track count in each folder could help. I wish web supported open this folder api, but besides that I am in favor on adding this. |
When you look into my repo, you can see that I already use directoryHandle.resolve() (thanks for finding that cool function btw). i only need to check the case where a new directory is a parent of multiple directories. I have it working for one.
Brilliant how you had the same thoughts as me, just way earlier than me.
This is also the only thing I had in mind and I think it's better than not having a list of directories and not having the ability to re-scan the selected directories. |
Opps didn't mean to close this. While performance is good enough for now it still can be improved. Sorry I only briefly looked at your repo before. The whole approach looks good. The cover art thing would be a separate discussion at the later time, but yeah your after these changes it would be hard to scan for artwork files.. I wonder even if app itself should store artwork as hidden files on the filesystem instead of in indexedDB. |
I updated my repo now, so if you select multiple directories on one layer and then select a parent directory of these, the old filenames get updated and than everything gets scanned again and I can check which files are new and also which are removed. |
For the most part everything looks good and I think it's worth doing it. Storing relative paths as ids is great idea. Right now I have pending moderate code and ui design changes which impact a lot places. So maybe next week, after that is done, we can start looking into how to make it work here. |
Hey, I finally looked a bit more into this, and implemented checking for changed files too. It was pretty easy too, I just save the Overall I think my current logic for getting and updating track data is pretty good (I think I can do some smaller optimizations and code reductions but nothing major) and you can look it up here: https://github.com/lennyanders/euphonium/blob/4a6adc1293ccbc5f5e069c52956b8ee1d4e49141/src/worker/library.ts#L78 I guess that would be everything from me. I think it would be really cool if you implement it, but yeah, I build my own thing now, so for me personally, it's not that important anymore. |
Currently, when adding new tracks to the library, each tracks file is compared to each of the already existing tracks files. isSameEntry is used for this and is the only by the browser provided method I know to do this. Sadly, this is really slow. When I compare ~3500 files to the same ~3500 files, it takes around 13 minutes.
On top of that, the added files get first parsed and only after that get compared to the existing tracks. I would say that, at the time of importing (selecting a directory) the files should be checked, so that after that only the new files get parsed.
For improving the main problem, I have an idea, but I don't know if it will work yet. I'm open to other ideas.
My plan would be, to save the path of each file, relative to the selected directory, and map that to their corresponding FileSystemHandle (or a custom object containing that somewhere). Then all the path mappings should be saved in the indexed db. Also, the directly imported directories need to be saved in the db and should have some kind of relation to the file path mappings.
After that, when importing the same folder again, we could check if that newly imported folder exists in our database and could the compare the newly generated file path mappings to the one belonging to the existing directory in the db.
When selecting a directory that is a parent of a directory in the db, we could check the difference in file path mappings from the point we get to the directory that already is in the db. This would mean that we needed to check every sub-directory when importing against our saved import directories. This should still be way more performant than the current implementation, but is not ideal.
I also don't have a good idea how to handle an import of a sub-directory of an import directory that is already in the db. We could probably store all scanned directories in the db, and compare every directory to every directory, but I think performance would get with a relatively big amount of directories pretty unperformant too.
The first step would be to implement just the path mapping from the imported folder and use my hopefully performant concept when the exact same directory is used again, and fall back to the current method if not.
I would be interested in implementing, or helping with the implementation of, this. I will try it nonetheless and will probably share a CodePen or something here, if my plan works out.
I hope everything I wrote here makes sense.
The text was updated successfully, but these errors were encountered: