Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Processing Indexer #91

Open
felixhummel opened this issue Jun 3, 2013 · 3 comments
Open

Multi Processing Indexer #91

felixhummel opened this issue Jun 3, 2013 · 3 comments

Comments

@felixhummel
Copy link
Collaborator

Hi,

I hacked around a bit to make shiva-indexer run on multiple CPUs. "Rough around the edges" would be a compliment, because it breaks lastfm and took some heavy refactoring, but I am happy with the results.

I'm running the following command and set my DB to /dev/shm/shiva.db to remove some I/O from the timings:

python setup.py develop && rm -f /dev/shm/shiva.db && shiva-indexer > /dev/shm/indexer.log && tail /dev/shm/indexer.log

On master:

Run in 23 seconds. Avg 0.006s/track.
Found 4088 tracks. Skipped: 0. Indexed: 4088.
flac: 14 tracks
mp3: 32 tracks
ogg: 4042 tracks

On my multi-processing branch:

Run in 7 seconds. Avg 0.002s/track.
Found 4088 tracks. Skipped: 0. Indexed: 4088.

Yes, I also removed the counters for now.

Problem is that instance methods do not work with Pool.map without heavy workarounds.

Question: Should I run further in this direction? I think another evening and we are back on track. I also began writing some tests for MediaDirs, because I needed a flat file list for map to work.

Cheers,

Felix

@tooxie
Copy link
Owner

tooxie commented Jun 4, 2013

Man, I'm impressed 😮 Great work!

I've been checking out your code, looks like the indexer has to be rewritten, we should be careful with that. To begin with, I've set up Travis to test with python 2.6 and 2.7. Looks good, once that's merged we'll have travis testing every PR.

Of course for that to work we need tests first 😅 I'll add some unit tests, but I still don't have clear how to test the indexer, I don't like the idea of including test music files in the project.

Anyway, going back to your multi-processing branch, have you thought of using a non-blocking network I/O framework, like Tornado? It has neat async features that may simplify things quite a bit.

Cool initiative! 👍

@felixhummel
Copy link
Collaborator Author

Thanks! I'm looking forward to having Travis.

About testing: See #93.

I do not see the point of having non-blocking I/O. Four cores --> four long-running process via multiprocessing from stdlib. I find that simple enough for the indexing process.

Another story would be incremental indexing using inotify or the like.

@felixhummel
Copy link
Collaborator Author

Please have a look at https://github.com/felixhummel/shiva-server/blob/multi-processing/thoughts_about_the_indexer.rst and let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants