Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious use of hex.Encode for hashmaps #30

Open
diagprov opened this issue Jan 21, 2020 · 5 comments
Open

Spurious use of hex.Encode for hashmaps #30

diagprov opened this issue Jan 21, 2020 · 5 comments
Labels
C-Enhancement New feature or request P-Low Low priority issue

Comments

@diagprov
Copy link
Member

Currently we're using hex encoding in order to use hashes as keys in a hashmap, e.g.

e4go/client.go

Line 427 in d0f4bc6

topicKeyTs, ok := c.TopicKeys[hashOfHash]

However, this isn't actually necessary, as the following code example shows:

package main

import (
    "fmt"
    "golang.org/x/crypto/sha3"
)

func main() {

    keymap := make(map[string]string)

    data := "some data"
    topic := "some topic"
    topichash := sha3.Sum256([]byte(topic))
    keymap[string(topichash[:])] = data
    fmt.Println(keymap)
}

Here the hash is directly encoded as a string type, albeit not one you can print easily. In this, no conversion between formats is required.

We probably want to encode these keys for json output, but during normal operation I'm not sure we need it. What do you think @odeke-em ? (I would key a hashmap in C by passing the raw bytes through the hash function for the hashmap).

@diagprov diagprov added C-Enhancement New feature or request P-Low Low priority issue labels Jan 21, 2020
@diagprov
Copy link
Member Author

The motivation for the use of hex is the following issue that @daeMOn63 pointed out to me:

package main

import (
	"crypto/rand"
	"encoding/json"
	"fmt"
	"reflect"
)

func main() {
	id := make([]byte, 16)
	rand.Read(id)

	origMap := make(map[string][]byte)
	origMap[string(id[:])] = []byte("something")

	fmt.Printf("Original map:\n%#v\n", origMap)

	mMap, _ := json.Marshal(origMap)

	decodedMap := make(map[string][]byte)
	json.Unmarshal(mMap, &decodedMap)

	fmt.Printf("Decoded map:\n%#v\n", decodedMap)

	if reflect.DeepEqual(decodedMap, origMap) == false {
		fmt.Println("maps are differents")
	}
}

@diagprov
Copy link
Member Author

diagprov commented Jan 23, 2020

Fun fact: this works with yaml:

package main

import (
	"crypto/rand"
	"gopkg.in/yaml.v2"
	"fmt"
	"reflect"
)

func main() {
	id := make([]byte, 16)
	rand.Read(id)

	origMap := make(map[string][]byte)
	origMap[string(id[:])] = []byte("something")

	fmt.Printf("Original map:\n%#v\n", origMap)

	mMap, _ := yaml.Marshal(origMap)

	decodedMap := make(map[string][]byte)
	yaml.Unmarshal(mMap, &decodedMap)

	fmt.Printf("Decoded map:\n%#v\n", decodedMap)

	if reflect.DeepEqual(decodedMap, origMap) == false {
		fmt.Println("maps are different")
	} else {
		fmt.Println("maps are the same")
	}
}

Edit: removed dumb comment. it appears json is mangling the input bytes, whereas yaml does not.

@diagprov
Copy link
Member Author

@daeMOn63 another idea I received through asking, which was to serialize only on I/O as follows:

package main

import (
	"crypto/rand"
	"encoding/json"
	"encoding/base64"
	"fmt"
	"reflect"
)

type Bytes [16]byte

func (b Bytes) MarshalText() ([]byte, error) {
	res := base64.StdEncoding.EncodeToString(b[:])
	return []byte(res), nil
}

func (b *Bytes) UnmarshalText(in []byte) error {
	_, err := base64.StdEncoding.Decode(b[:], in)
	return err
}

func main() {
	var id Bytes
	rand.Read(id[:])

	origMap := make(map[Bytes][]byte)
	origMap[id] = []byte("something")

	fmt.Printf("Original map:\n%#v\n", origMap)

	mMap, err := json.Marshal(origMap)
	if err != nil {
		panic(err)
	}
	
	fmt.Println(string(mMap))

	decodedMap := make(map[Bytes][]byte)
	if err := json.Unmarshal(mMap, &decodedMap); err != nil {
		panic(err)
	}

	fmt.Printf("Decoded map:\n%#v\n", decodedMap)

	if reflect.DeepEqual(decodedMap, origMap) == false {
		fmt.Println("maps are different")
	} else {
		fmt.Println("maps are the same")
	}
}

I wonder if this, or something similar, is what yaml is doing internally.

@daeMOn63
Copy link
Member

The yaml encoder doesn't seems to apply any specific encoding to map keys. It just support utf-16 (for yaml 1.1 - https://yaml.org/spec/1.1/current.html#id868742) or utf-32 (for 1.2 - https://yaml.org/spec/1.2/spec.html#id2771184) that makes it happy with random bytes.

Now on above snippet, this would probably be cleaner to update our code like this, at least to prevent inserting bad characters to the map keys and ensure a consistent encoding / decoding. I would still stick to hex encoding instead of b64, just to avoid those issues with all the different b64 implementations

@diagprov
Copy link
Member Author

Yeah I think the MarshalText is best as well. This way we only incur a cost on encode/decode, which is already expensive relatively speaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Enhancement New feature or request P-Low Low priority issue
Projects
None yet
Development

No branches or pull requests

2 participants