Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wonky performance numbers when encoding the same thing through different interfaces #45

Open
karalabe opened this issue Jun 26, 2024 · 3 comments

Comments

@karalabe
Copy link

karalabe commented Jun 26, 2024

I've tried a few different combinations of types and interfaces to encode the same thing. Interestingly, there's a 30% speed variation depending on what I'm calling, which seems extreme. I'd expect the same performance, independent of where the data enters into the encoder.

BenchmarkMarshal2String-12     	     232	   5220942 ns/op
BenchmarkMarshal2RawJSON-12    	     283	   4093803 ns/op
BenchmarkMarshal2Texter-12     	     222	   5399327 ns/op
BenchmarkMarshal2Jsoner-12     	     265	   4748703 ns/op
BenchmarkMarshal2Jsoner2-12    	     271	   4422361 ns/op
package test 

import (
	"bytes"
	"encoding/hex"
	"encoding/json"
	"testing"

	json2 "github.com/go-json-experiment/json"
	"github.com/go-json-experiment/json/jsontext"
)

func BenchmarkMarshalString(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	str := hex.EncodeToString(src)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json.Marshal(str)
	}
}

func BenchmarkMarshal2String(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	str := hex.EncodeToString(src)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json2.Marshal(str)
	}
}

func BenchmarkMarshalRawJSON(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	msg := json.RawMessage(`"` + hex.EncodeToString(src) + `"`)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json.Marshal(msg)
	}
}

func BenchmarkMarshal2RawJSON(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	msg := json.RawMessage(`"` + hex.EncodeToString(src) + `"`)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json2.Marshal(msg)
	}
}

func BenchmarkMarshalTexter(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	txt := &Texter{str: hex.EncodeToString(src)}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json.Marshal(txt)
	}
}

func BenchmarkMarshal2Texter(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	txt := &Texter{str: hex.EncodeToString(src)}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json2.Marshal(txt)
	}
}

func BenchmarkMarshalJsoner(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	jsn := &Jsoner{str: hex.EncodeToString(src)}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json.Marshal(jsn)
	}
}

func BenchmarkMarshal2Jsoner(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	jsn := &Jsoner{str: hex.EncodeToString(src)}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json2.Marshal(jsn)
	}
}

func BenchmarkMarshal2Jsoner2(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	jsn := &Jsoner2{str: hex.EncodeToString(src)}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		json2.Marshal(jsn)
	}
}

func BenchmarkMarshalCopyString(b *testing.B) {
	src := bytes.Repeat([]byte{'0'}, 4194304)
	str := hex.EncodeToString(src)

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		buf := make([]byte, len(str)+2)
		buf[0] = '"'
		copy(buf[1:], str)
		buf[len(buf)-1] = '"'
	}
}

type Texter struct {
	str string
}

func (t Texter) MarshalText() ([]byte, error) {
	return []byte(t.str), nil
}

type Jsoner struct {
	str string
}

func (j Jsoner) MarshalJSON() ([]byte, error) {
	return []byte(`"` + j.str + `"`), nil
}

type Jsoner2 struct {
	str string
}

func (j Jsoner2) MarshalJSONV2(enc *jsontext.Encoder, opts json2.Options) error {
	return enc.WriteValue([]byte(`"` + j.str + `"`))
}
@dsnet
Copy link
Collaborator

dsnet commented Jun 27, 2024

Presently, I'm unable to reproduce those results on my Ryzen 5900x. I get:

BenchmarkMarshal2String       	     192	   5595562 ns/op	 8485392 B/op	       3 allocs/op
BenchmarkMarshal2RawJSON      	     224	   5672734 ns/op	 8585393 B/op	       3 allocs/op
BenchmarkMarshal2Texter       	     122	   9177080 ns/op	16857143 B/op	       3 allocs/op
BenchmarkMarshal2Jsoner       	     133	   7760851 ns/op	25256777 B/op	       5 allocs/op
BenchmarkMarshal2Jsoner2      	     181	   7379420 ns/op	16841790 B/op	       3 allocs/op

Texter, Jsoner, and Jsoner2 are notably slower because they allocate one (or more) intermediate copies of the string (~8MiB).
In the case of String and RawJSON, the allocated amount approximately matches the string length needed for the output buffer.

@dsnet
Copy link
Collaborator

dsnet commented Jun 27, 2024

Out of curiosity, what's the relationship between the lifetime of these strings and how they're marshaled?

Do you create the strings once, but marshal them multiple times? Or is it a 1:1 relationship where the creation of a string exactly correlates with a single marshal call?

The relevance of this is an idea that @mvdan once had of having jsontext.String precompute properties about the string, allowing future marshaling of the string to bypass certain checks (e.g., whether escaping is necessary). However, this only helps your situation if these large blobs are constructed once and marshaled multiple times.

@karalabe
Copy link
Author

My specific use case is a small control HTTP RPC API that was created between 2 locally (same machine or same LAN) processes. Originally this API was specced to use JSON because it was simple, and within it it represented binary blobs as hex strings (for legacy reasons). The simplifications all came from the necessity to support 9 different implementations of the different sides of this API in different languages, so we've tried to keep it simple.

The purpose of the API is nonetheless "control" so the latency is very relevant (we're expecting LAN-style millisecond latencies, not internet-style 50+ms latencies). The size of our packets were around 50KB on the order of magnitude, so we haven't bothered much about how performant the json package is.

Fast forward a year however, and our small control API sometimes needs to send over 1-2MB blobs of data. 2MB within the OS or a LAN is still very much acceptable, but the json overhead starts to be felt. It's not yet a problem, but it's not irrelevant either.

We have yet another proposal in the works which would introduce another message type that can grow to 10-20MB. Now that's where the latency starts to bite us unacceptably and it seems it originates from Go's hex encoding for one smaller part, and apparently Go's json package for the main part. For me that was very surprising, so started looking into why it's doing what it's doing.

Now, I completely agree that in an internet latency/bandwidth scenario, the package overhead is not relevant. In a local network or within os setting however, it is, so would be nice to address if somehow if possible.

As for JSON being the wrong format for low latency apps, yes, I agree there, and will probably push for replacing it. But still, would be nice to fix json if it's in the works again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants