Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode query params in binary instead of text #295

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jgaskins
Copy link
Contributor

@jgaskins jgaskins commented Dec 1, 2024

Been messing with this the past couple days. I'm not sure yet, but this may actually not be a good idea. Encoding values as binary doesn't seem to get the benefit of autocasting, such as from int4 up to int8. Either that or I've screwed something up.

I'll annotate some of the weirdness I saw in review comments. Maybe someone knows how to fix/work around those things.

It did cut down pretty significantly on the amount of heap memory allocated per query with bind args, though. It was a lot more than I expected for a simple query — about 44% fewer bytes allocated across the complete query execution. There are still some places I want to optimize heap allocations (like in serializing array types), but that was pretty nice to see.

Benchmark code
require "benchmark"
require "../src/pg"

pg = DB.open "postgres:///"
pg.exec <<-SQL
  CREATE TABLE IF NOT EXISTS temp_data (
    id UUID PRIMARY KEY,
    name TEXT NOT NULL,
    login_count INT4 NOT NULL,
    rating FLOAT8 NOT NULL,
    created_at TIMESTAMPTZ NOT NULL
  )
  SQL

Benchmark.ips do |x|
  time = Time.utc
  x.report do
    pg.exec <<-SQL, UUID.v7, "User Name", 42, rand, time
      INSERT INTO temp_data (id, name, login_count, rating, created_at)
      VALUES ($1, $2, $3, $4, $5)
      SQL
  end
end

at_exit { pg.exec "DROP TABLE IF EXISTS temp_data" }

On the master branch:

➜  crystal-pg git:(master) crystal run --release bench/encoding.cr
  30.88k ( 32.38µs) (± 1.64%)  992B/op  fastest

With this PR:

➜  crystal-pg git:(use-binary-encoding) crystal run --release bench/encoding.cr
  31.65k ( 31.59µs) (± 2.15%)  560B/op  fastest

Fixes #294

Comment on lines -23 to +29
test_insert_and_read "float", 12.34
test_insert_and_read "int2", 123i16
test_insert_and_read "int4", 123i32
test_insert_and_read "int8", 123i64

test_insert_and_read "float4", 12.34f32
test_insert_and_read "float8", 12.34f64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't specify the sizes of numeric types, I get errors that I don't understand yet. I figured I'd look into it later.

Spec errors
  1) PG::Driver inserts 123 as int8

       insufficient data left in message (PQ::PQError)
         from src/pq/connection.cr:214:7 in 'handle_error'
         from src/pq/connection.cr:197:7 in 'handle_async_frames'
         from src/pq/connection.cr:173:7 in 'read'
         from src/pq/connection.cr:168:7 in 'read'
         from src/pq/connection.cr:446:31 in 'expect_frame'
         from src/pq/connection.cr:445:5 in 'expect_frame'
         from src/pg/statement.cr:19:5 in 'perform_query'
         from src/pg/statement.cr:35:14 in 'perform_exec'
         from lib/db/src/db/statement.cr:93:9 in 'perform_exec_and_release'
         from lib/db/src/db/statement.cr:78:7 in 'exec:args'
         from lib/db/src/db/pool_statement.cr:19:30 in 'exec:args'
         from lib/db/src/db/query_methods.cr:275:7 in 'exec:args'
         from spec/pg/encoder_spec.cr:12:5 in '->'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/example.cr:50:13 in 'internal_run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/example.cr:38:16 in 'run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/context.cr:20:23 in 'internal_run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/context.cr:346:14 in 'run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/context.cr:20:23 in 'internal_run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/context.cr:155:7 in 'run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/dsl.cr:237:7 in 'execute_examples'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/dsl.cr:220:13 in '->'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/crystal/at_exit_handlers.cr:14:19 in 'run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/crystal/main.cr:53:14 in 'exit'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/crystal/main.cr:48:5 in 'main'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/crystal/main.cr:130:3 in 'main'


  2) PG::Driver inserts 123 as int2

       incorrect binary data format in bind parameter 1 (PQ::PQError)
         from src/pq/connection.cr:214:7 in 'handle_error'
         from src/pq/connection.cr:197:7 in 'handle_async_frames'
         from src/pq/connection.cr:173:7 in 'read'
         from src/pq/connection.cr:168:7 in 'read'
         from src/pq/connection.cr:446:31 in 'expect_frame'
         from src/pq/connection.cr:445:5 in 'expect_frame'
         from src/pg/statement.cr:19:5 in 'perform_query'
         from src/pg/statement.cr:35:14 in 'perform_exec'
         from lib/db/src/db/statement.cr:93:9 in 'perform_exec_and_release'
         from lib/db/src/db/statement.cr:78:7 in 'exec:args'
         from lib/db/src/db/pool_statement.cr:19:30 in 'exec:args'
         from lib/db/src/db/query_methods.cr:275:7 in 'exec:args'
         from spec/pg/encoder_spec.cr:12:5 in '->'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/example.cr:50:13 in 'internal_run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/example.cr:38:16 in 'run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/context.cr:20:23 in 'internal_run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/context.cr:346:14 in 'run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/context.cr:20:23 in 'internal_run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/context.cr:155:7 in 'run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/dsl.cr:237:7 in 'execute_examples'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/spec/dsl.cr:220:13 in '->'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/crystal/at_exit_handlers.cr:14:19 in 'run'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/crystal/main.cr:53:14 in 'exit'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/crystal/main.cr:48:5 in 'main'
         from /opt/homebrew/Cellar/crystal/1.14.0_1/share/crystal/src/crystal/main.cr:130:3 in 'main'

Comment on lines 49 to +52
test_insert_and_read "integer[]", [] of Int32
test_insert_and_read "integer[]", [1, 2, 3]
test_insert_and_read "integer[]", [[1, 2], [3, 4]]
test_insert_and_read "integer[][]", [[1, 2], [3, 4]]
test_insert_and_read "integer[][][]", [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually surprised that the integer[] example worked before with a nested array. It still does when encoded as binary, too, but I figured I would make this more explicit.

spec/pg/encoder_spec.cr Show resolved Hide resolved
spec/pg/encoder_spec.cr Show resolved Hide resolved
Comment on lines -504 to +505
@@decoders = Hash(Int32, PG::Decoders::Decoder).new(ByteaDecoder.new)
# :nodoc:
class_getter decoders = Hash(Int32, PG::Decoders::Decoder).new(ByteaDecoder.new)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because I was going to use decoders as the source for oids, but I didn't end up using it. I can probably undo this change.

binary Bytes.empty, -1
end

def self.encode(val : Bool, into slice : Bytes = Bytes.new(1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to encode a value into an existing slice is important for arrays and also allowed me to encode things like intervals and geo types in terms of numeric types.

end

{% for type in %w[Int16 Int32 Int64 Float32 Float64] %}
def self.encode(val : {{type.id}}, into slice : Bytes = Bytes.new(sizeof(typeof(val))))
IO::ByteFormat::NetworkEndian.encode val, slice
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoever added the ability encode numeric values directly into slices without an intermediate IO gets the world's biggest high-five from me.

src/pq/param.cr Outdated Show resolved Hide resolved
src/pq/param.cr Outdated Show resolved Hide resolved
spec/pg/encoder_spec.cr Show resolved Hide resolved
Comment on lines +154 to +176
private OID_MAP = {
Bool.name => 16, # boolean
Bytes.name => 17, # bytea
Char.name => 18, # char
Int16.name => 21, # int2
Int32.name => 23, # int4
Int64.name => 20, # int8
String.name => 25, # text
Float32.name => 700, # float4
Float64.name => 701, # float8
UUID.name => 2950, # uuid
PG::Geo::Point.name => 600, # point
PG::Geo::Path.name => 602, # path
PG::Geo::Polygon.name => 604, # polygon
PG::Geo::Box.name => 603, # box
PG::Geo::LineSegment.name => 601, # lseg
PG::Geo::Line.name => 628, # line
PG::Geo::Circle.name => 718, # circle
JSON::Any.name => 3802, # jsonb
Time.name => 1184, # timestamptz
Time::Span.name => 1186, # interval
PG::Interval.name => 1186, # interval
} of String => Int32
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was hoping to avoid when making decoders accessible from the PQ namespace. I used type names as keys rather than the types themselves because I wanted to make it extensible, but I don't know how realistic that is.

I think that, in order to do this extensibly, it would require writing both encoders and decoders, because on top of the OID mapping, it also has to know the size of a given value to encode it, and I ended up making a SIZE_MAP below. I don't like that three different data structures have to know about the types available.

This is part of why I don't know whether this PR was a good idea. I may have bitten off more than I had the motivation to chew here and I'm not sure it's worth restructuring the shard to support binary encoding.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I see what you're saying but also 44% less heap can make a lot of things worth it though

This file tests functionality that no lonfer applies
@will
Copy link
Owner

will commented Dec 2, 2024

I like this in general. Not only for the less allocations on the crystal side, but I would have to imagine that postgres not having to parse as much text on the database side can be significant with write heavy workloads.

@jgaskins
Copy link
Contributor Author

jgaskins commented Dec 2, 2024

That's a solid point. Scaling Postgres (especially for writes) is far more challenging than scaling out app servers when you hit a CPU threshold in Crystal app code.

@jgaskins jgaskins mentioned this pull request Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use binary encoding for bind arguments
2 participants