Skip to content

Latest commit

 

History

History
302 lines (234 loc) · 9.25 KB

CDB.md

File metadata and controls

302 lines (234 loc) · 9.25 KB

CockroachDB

Many applications use CockroachDB (CDB) as their datastore. Here are a list of utilities that have been built to support working with CDB.

CDB Single Node

For desktop testing you can install CDB and run a local instance in single-node mode. Download from:

https://www.cockroachlabs.com/docs/releases/

Make sure to download the version that corresponds to the version used in production, which can be checked by looking at the YAML file in the k8s-config repo(This is v22.1 as of Apr 2024).

cd /tmp; cockroach start-single-node --insecure --listen-addr=127.0.0.1

Note that this will start and print out useful information, such as the location of the admin dashboard (webui), and also how to connect to the running instance using the cockroach command line application (RPC client flags):

CockroachDB node starting at 2023-10-04 15:10:35.315566725 +0000 UTC m=+1.911432965 (took 1.6s)
build:               CCL v22.2.3 @ 2023/01/23 19:11:57 (go1.19.1)
webui:               http://127.0.0.1:8080
sql:                 postgresql://[email protected]:26257/defaultdb?sslmode=disable
sql (JDBC):          jdbc:postgresql://127.0.0.1:26257/defaultdb?sslmode=disable&user=root
RPC client flags:    cockroach <client cmd> --host=127.0.0.1:26257 --insecure
logs:                /tmp/cockroach-data/logs
temp dir:            /tmp/cockroach-data/cockroach-temp2650087369
external I/O path:   /tmp/cockroach-data/extern
store[0]:            path=/tmp/cockroach-data
storage engine:      pebble
clusterID:           dbef3f5f-ef2c-48ef-a39b-5cdbeb219e70
status:              initialized new cluster
nodeID:              1

Generating SQL Schema

Defining the schema for the database is done in Go code. By doing this, we have the source of truth as a documented Go struct, which can be used in a more flexible way than having the SQL as the source of truth. For example see:

https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/tables.go

package sql
//go:generate bazelisk run --config=mayberemote //:go -- run ./tosql

...

// Tables represents the full schema of the SQL database.
type Tables struct {
	Alerts      []alertschema.AlertSchema
	Commits     []gitschema.Commit
	ParamSets   []traceschema.ParamSetsSchema
	Postings    []traceschema.PostingsSchema
	Regressions []regressionschema.RegressionSchema
	Shortcuts   []shortcutschema.ShortcutSchema
	SourceFiles []traceschema.SourceFilesSchema
	TraceValues []traceschema.TraceValuesSchema
}

Each SQL table is defined by a public member of the Tables struct.

Let's look at the gitschema.Commit type:

// Commit represents a single commit stored in the database.
//
// JSON annotations make it serialize like the legacy cid.CommitDetail.
type Commit struct {
	CommitNumber types.CommitNumber `sql:"commit_number INT PRIMARY KEY"`
	GitHash      string             `sql:"git_hash TEXT UNIQUE NOT NULL"`
	Timestamp    int64              `sql:"commit_time INT"` // Unix timestamp, seconds from the epoch.
	Author       string             `sql:"author TEXT"`
	Subject      string             `sql:"subject TEXT"`
}

Note that struct tags that define how each element will be stored as SQL in the database. For example, CommitNumber will be stored as an INT and also used as a PRIMARY KEY.

If you need to add other information to the schema, such as secondary indexes, add them as struct tags on private members of the struct. In TraceValuesSchema below you can see how a secondary index was added in the bySourceFileIndex struct tag:

// TraceValuesSchema describes the SQL schema of the TraceValues table.
type TraceValuesSchema struct {
	TraceID           []byte   `sql:"trace_id BYTES"`
	CommitNumber      int64    `sql:"commit_number INT"`
	Value             float32  `sql:"val REAL"`
	SourceFileID      int64    `sql:"source_file_id INT"`
	primaryKey        struct{} `sql:"PRIMARY KEY (trace_id, commit_number)"`
	bySourceFileIndex struct{} `sql:"INDEX by_source_file_id (source_file_id, trace_id)"`
}

Looking back at the tables.go file:

package sql
//go:generate bazelisk run --config=mayberemote //:go -- run ./tosql

...

// Tables represents the full schema of the SQL database.
type Tables struct {
	Alerts      []alertschema.AlertSchema
	Commits     []gitschema.Commit
	ParamSets   []traceschema.ParamSetsSchema
	Postings    []traceschema.PostingsSchema
	Regressions []regressionschema.RegressionSchema
	Shortcuts   []shortcutschema.ShortcutSchema
	SourceFiles []traceschema.SourceFilesSchema
	TraceValues []traceschema.TraceValuesSchema
}

You can see to go:generate command at the top of the file. This generates a new Go file that contains the SQL schema as a Go string, and also provides the column names of each table:

package sql

// Generated by //go/sql/exporter/
// DO NOT EDIT

const Schema = `CREATE TABLE IF NOT EXISTS Alerts (
  id INT PRIMARY KEY DEFAULT unique_rowid(),
  alert TEXT,
  config_state INT DEFAULT 0,
  last_modified INT
);
...
CREATE TABLE IF NOT EXISTS TraceValues (
  trace_id BYTES,
  commit_number INT,
  val REAL,
  source_file_id INT,
  PRIMARY KEY (trace_id, commit_number),
  INDEX by_source_file_id (source_file_id, trace_id)
);
`

var Alerts = []string{
	"id",
	"alert",
	"config_state",
	"last_modified",
}

...

var TraceValues = []string{
	"trace_id",
	"commit_number",
	"val",
	"source_file_id",
}

The full file can be found at:

https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/schema.go

Validating SQL Schema

The above section shows how we can generate the SQL commands to create all the tables in the database. Another aspect we need to care about is having the application check the schema on startup to make sure it has all the tables and columns the current version of the application expects.

To do that we need to export information about the tables, in this case the column names, their types, and any extra indexes.

https://pkg.go.dev/go.skia.org/infra/go/sql/schema#Description

The description can be exported to a JSON file:

https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/sql/exportschema/main.go

Then the JSON file that describes the table structure can be then be embedded into the application:

// Package expectedschema contains the schema the database is expected to have.
package expectedschema

import (
	"embed" // Enable go:embed.
	"encoding/json"

	"go.skia.org/infra/go/skerr"
	"go.skia.org/infra/go/sql/schema"
)

// FS is a filesystem with the schema.json file.
//
//go:embed schema.json
var FS embed.FS

// Load returns the deserialized schema.Description stored in the schema.json file.
func Load() (schema.Description, error) {
	var ret schema.Description
	b, err := FS.ReadFile("schema.json")
	if err != nil {
		return ret, skerr.Wrap(err)
	}

	err = json.Unmarshal(b, &ret)
	if err != nil {
		return ret, skerr.Wrap(err)
	}
	return ret, nil
}

And finally that schema can be compared to the current schema running in production:

// Confirm the database has the right schema.
expectedSchema, err := expectedschema.Load()
if err != nil {
	return nil, skerr.Wrap(err)
}

actual, err := schema.GetDescription(singletonPool, sql.Tables{})
if err != nil {
	return nil, skerr.Wrap(err)
}
if diff := assertdeep.Diff(expectedSchema, *actual); diff != "" {
	return nil, skerr.Fmt("Schema needs to be updated: %s.", diff)
}

https://skia.googlesource.com/buildbot/+/refs/heads/main/perf/go/builders/builders.go#93

Schema Changes

This system does not offer any automatic generation of schema migrations, which would be incomplete and error prone, so migrations need to be manually written, but the above schema description and comparison functions can be used to test a schema migration.

For an example of how to do this see Test_LiveToNextSchemaMigration in

https://skia.googlesource.com/buildbot/+/refs/heads/main/machine/go/machine/store/cdb/cdb_test.go#365

The test below loads the previous version of the schema, applies the migrations, and then compares the resulting schema to the current schema:

func Test_LiveToNextSchemaMigration(t *testing.T) {
	ctx := context.Background()
	db := cdbtest.NewCockroachDBForTests(t, "desc")

	expectedSchema := GetSchema(t, db)

	_, err := db.Exec(ctx, "DROP TABLE IF EXISTS Description")
	require.NoError(t, err)
	_, err = db.Exec(ctx, "DROP TABLE IF EXISTS TaskResult")
	require.NoError(t, err)

	_, err = db.Exec(ctx, LiveSchema)
	require.NoError(t, err)

	_, err = db.Exec(ctx, FromLiveToNext)
	require.NoError(t, err)

	migratedSchema := GetSchema(t, db)

	assertdeep.Equal(t, expectedSchema, migratedSchema)

	// Test the test, make sure at least one known column is present.
	require.Equal(t, "text def: nullable:NO", migratedSchema.ColumnNameAndType["description.machine_id"])
}

Utilities for writing queries

The sqlutil package has utilities that work with the generated column names to make writing some SQL queries easier.

For example, for a table with a lot of columns the insert statement may be easier to write. In the following code presume that Description contains the column names of the Description table:

queryTemplate := fmt.Sprintf(`INSERT INTO Description (%s) VALUES %s`,
   strings.Join(Description, ","), sqlutil.ValuesPlaceholders(len(Description), 1),