Performance is non-linear, thus. very bad as input gets large #18

james-antill · 2020-06-20T22:13:03Z

% dd if=/dev/random of=abcd10 bs=1 count=10000
10000+0 records in
10000+0 records out
10000 bytes transferred in 0.051649 secs (193615 bytes/sec)
% dd if=/dev/random of=abcd100 bs=1 count=100000
100000+0 records in
100000+0 records out
100000 bytes transferred in 0.423879 secs (235916 bytes/sec)
% dd if=/dev/random of=abcd1000 bs=1 count=1000000
1000000+0 records in
1000000+0 records out
1000000 bytes transferred in 4.197053 secs (238262 bytes/sec)
% time ./base58 -i abcd10 | wc -c
   13658
./base58 -i abcd10  0.12s user 0.01s system 93% cpu 0.132 total
% time ./base58 -i abcd100 | wc -c
  136567
./base58 -i abcd100  11.44s user 0.17s system 95% cpu 12.184 total
% time ./base58 -i abcd1000 | wc -c
 1365659
./base58 -i abcd1000  1298.50s user 20.66s system 93% cpu 23:33.18 total

The text was updated successfully, but these errors were encountered:

james-antill · 2020-06-20T22:21:27Z

Note that this isn't just you, "github.com/btcsuite/btcutil/base58" gives:

% time ./base58-2 -i abcd10 | wc -c
   13658
./base58-2 -i abcd10  0.27s user 0.01s system 96% cpu 0.288 total
% time ./base58-2 -i abcd100 | wc -c
  136567
./base58-2 -i abcd100  26.02s user 0.37s system 95% cpu 27.780 total

it's just very unintuitive, esp. given the performance of base64 is roughly the same as base16.

ribasushi · 2020-06-20T22:30:42Z

it's just very unintuitive, esp. given the performance of base64 is roughly the same as base16.

Base64/32/16/8 use a radically different algorithm - they map every even-sized chunk of input to a corresponding set of outputs, thus "streaming" the result a "chunk" at a time.

Base58/base36/and any other leading-zero-integer-based algorithm reads the entire input, treats it as a single very large integer, converts its representation base, and then spits it out.

While there is a number of ways to speed this conversion up, it is bound to remain non-linear, as the larger the input: the more work to convert it to an internal integer representation, and the longer it is to swap its base out.

james-antill · 2020-06-20T22:44:43Z

Yeh, it's probably worth pointing that out somewhere obvious. I ran across it when I heard about multihash/ipfs and the information about it mostly suggested it as a better replacement for base64 for humans ... wikipedia does mention it converts to large numbers, but I assumed that's because the other implementations used bignum and it could be implemented using groups of 29 bytes.

ribasushi · 2020-06-20T22:51:48Z

Nope, this really converts to "bignum". In IPFS all the hashes are extremely short ( 512bits at most + some padding for multiformats ) so this doesn't become a factor.

For actual binary encoding into ascii with minimal amount of "bloat" - look into https://en.wikipedia.org/wiki/Ascii85#Example_for_Ascii85

james-antill · 2020-06-28T22:39:39Z

I ended up writing a base50 API, which had the features I wanted from base58 ... probably overkill. Feel free to close.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance is non-linear, thus. very bad as input gets large #18

Performance is non-linear, thus. very bad as input gets large #18

james-antill commented Jun 20, 2020 •

edited

Loading

james-antill commented Jun 20, 2020

ribasushi commented Jun 20, 2020

james-antill commented Jun 20, 2020

ribasushi commented Jun 20, 2020

james-antill commented Jun 28, 2020

Performance is non-linear, thus. very bad as input gets large #18

Performance is non-linear, thus. very bad as input gets large #18

Comments

james-antill commented Jun 20, 2020 • edited Loading

james-antill commented Jun 20, 2020

ribasushi commented Jun 20, 2020

james-antill commented Jun 20, 2020

ribasushi commented Jun 20, 2020

james-antill commented Jun 28, 2020

james-antill commented Jun 20, 2020 •

edited

Loading