Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDX files generated are not sorted #2

Open
thomaspreece opened this issue Oct 20, 2017 · 3 comments
Open

CDX files generated are not sorted #2

thomaspreece opened this issue Oct 20, 2017 · 3 comments

Comments

@thomaspreece
Copy link

thomaspreece commented Oct 20, 2017

Similar to the wayback indexer, this indexer doesn't produce a sorted CDX file so when you try to use it on pywb it fails to find links correctly. Just wondering whether there was a particular design decision that was taken for why it works this way?

I should add that I am only looking at using CDX files. This is because I want to test out pywb and openwayback and as far as I can find out (from docs/code), openwayback 2.3.2 doesn't support CDXJ. I found some mention of CDXJ and openwayback in reference to openwayback 3.0.0 but as it is a stale branch on github I assume it has been abandoned.

@ikreymer
Copy link
Member

This package is still in development and just haven't had a chance to add sorting yet.

Perhaps it should be the default option, or via -s flag (consistent with the cdx-indexer in pywb).

Of course, you can also just pipe the index through a cmdline sort tool, `cdxj-indexer | sort > file.cdx

Or, you could use the cdx-indexer in pywb actually, it defaults to regular CDX. Basically, this package is an effort to split that functionality into its own package in a cleaner way, but haven't had a chance to make as much progress on it.

@thomaspreece
Copy link
Author

Ah, didn't realise pywb cdx-indexer would run separate to pywb. Doing some testing I'm finding that in all cases your cdx(j) indexers are significantly faster then the openwayback versions, good job! :)

@ikreymer
Copy link
Member

Several years later, coming back to this project... this will be fixed in the 1.1.0 release, finally :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants