Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance is non-linear, thus. very bad as input gets large #18

Open
james-antill opened this issue Jun 20, 2020 · 5 comments
Open

Performance is non-linear, thus. very bad as input gets large #18

james-antill opened this issue Jun 20, 2020 · 5 comments

Comments

@james-antill
Copy link

james-antill commented Jun 20, 2020

% dd if=/dev/random of=abcd10 bs=1 count=10000
10000+0 records in
10000+0 records out
10000 bytes transferred in 0.051649 secs (193615 bytes/sec)
% dd if=/dev/random of=abcd100 bs=1 count=100000
100000+0 records in
100000+0 records out
100000 bytes transferred in 0.423879 secs (235916 bytes/sec)
% dd if=/dev/random of=abcd1000 bs=1 count=1000000
1000000+0 records in
1000000+0 records out
1000000 bytes transferred in 4.197053 secs (238262 bytes/sec)
% time ./base58 -i abcd10 | wc -c
   13658
./base58 -i abcd10  0.12s user 0.01s system 93% cpu 0.132 total
% time ./base58 -i abcd100 | wc -c
  136567
./base58 -i abcd100  11.44s user 0.17s system 95% cpu 12.184 total
% time ./base58 -i abcd1000 | wc -c
 1365659
./base58 -i abcd1000  1298.50s user 20.66s system 93% cpu 23:33.18 total
@james-antill
Copy link
Author

Note that this isn't just you, "github.com/btcsuite/btcutil/base58" gives:

% time ./base58-2 -i abcd10 | wc -c
   13658
./base58-2 -i abcd10  0.27s user 0.01s system 96% cpu 0.288 total
% time ./base58-2 -i abcd100 | wc -c
  136567
./base58-2 -i abcd100  26.02s user 0.37s system 95% cpu 27.780 total

it's just very unintuitive, esp. given the performance of base64 is roughly the same as base16.

@ribasushi
Copy link
Contributor

it's just very unintuitive, esp. given the performance of base64 is roughly the same as base16.

Base64/32/16/8 use a radically different algorithm - they map every even-sized chunk of input to a corresponding set of outputs, thus "streaming" the result a "chunk" at a time.

Base58/base36/and any other leading-zero-integer-based algorithm reads the entire input, treats it as a single very large integer, converts its representation base, and then spits it out.

While there is a number of ways to speed this conversion up, it is bound to remain non-linear, as the larger the input: the more work to convert it to an internal integer representation, and the longer it is to swap its base out.

@james-antill
Copy link
Author

Yeh, it's probably worth pointing that out somewhere obvious. I ran across it when I heard about multihash/ipfs and the information about it mostly suggested it as a better replacement for base64 for humans ... wikipedia does mention it converts to large numbers, but I assumed that's because the other implementations used bignum and it could be implemented using groups of 29 bytes.

@ribasushi
Copy link
Contributor

Nope, this really converts to "bignum". In IPFS all the hashes are extremely short ( 512bits at most + some padding for multiformats ) so this doesn't become a factor.

For actual binary encoding into ascii with minimal amount of "bloat" - look into https://en.wikipedia.org/wiki/Ascii85#Example_for_Ascii85

@james-antill
Copy link
Author

I ended up writing a base50 API, which had the features I wanted from base58 ... probably overkill. Feel free to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants