Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Change varbinary to varchar in schemas used in examples folder #17318

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

arthmis
Copy link

@arthmis arthmis commented Dec 2, 2024

Description

This changes usage of varbinary to varchar in schemas used in the /examples folder. Schema fields like keyspace_id that use varbinary are excluded from the change.

Related Issue(s)

This PR fixes #15997

Checklist

Copy link
Contributor

vitess-bot bot commented Dec 2, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Dec 2, 2024
@github-actions github-actions bot added this to the v22.0.0 milestone Dec 2, 2024
@arthmis arthmis changed the title Change varbinary to varchar in schemas used for local example #15997 Change varbinary to varchar in schemas used in examples folder #15997 Dec 2, 2024
@arthmis arthmis requested a review from deepthi as a code owner December 2, 2024 18:18
@arthmis arthmis changed the title Change varbinary to varchar in schemas used in examples folder #15997 Feature Request: Change varbinary to varchar in schemas used in examples folder #15997 Dec 4, 2024
@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Examples NeedsWebsiteDocsUpdate What it says and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Dec 5, 2024
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @arthmis !

We should also update the website docs to reflect the new column types where they are referenced:

❯ git grep varbinary | grep 22
content/en/docs/22.0/concepts/shard.md:A `varbinary` of arbitrary length can also be mapped as is to a keyspace id. This is what the `binary` vindex does.
content/en/docs/22.0/get-started/local-mac.md:  sku varbinary(128),
content/en/docs/22.0/get-started/local-mac.md:  description varbinary(128),
content/en/docs/22.0/get-started/local-mac.md:  email varbinary(128),
content/en/docs/22.0/get-started/local-mac.md:  sku varbinary(128),
content/en/docs/22.0/get-started/local.md:  sku varbinary(128),
content/en/docs/22.0/get-started/local.md:  description varbinary(128),
content/en/docs/22.0/get-started/local.md:  email varbinary(128),
content/en/docs/22.0/get-started/local.md:  sku varbinary(128),
content/en/docs/22.0/get-started/operator.md:  sku varbinary(128),
content/en/docs/22.0/get-started/operator.md:  description varbinary(128),
content/en/docs/22.0/get-started/operator.md:  email varbinary(128),
content/en/docs/22.0/get-started/operator.md:  sku varbinary(128),
content/en/docs/22.0/reference/features/global-routing.md:| sku         | varbinary(128) | YES  |     | NULL    |       |
content/en/docs/22.0/reference/features/global-routing.md:| sku         | varbinary(128) | NO   | PRI | NULL    |       |
content/en/docs/22.0/reference/features/global-routing.md:| description | varbinary(128) | YES  |     | NULL    |       |
content/en/docs/22.0/reference/features/global-routing.md:| email       | varbinary(128) | YES  |     | NULL    |       |
content/en/docs/22.0/reference/features/global-routing.md:| sku         | varbinary(128) | YES  |     | NULL    |       |
content/en/docs/22.0/reference/features/global-routing.md:| sku         | varbinary(128) | NO   | PRI | NULL    |       |
content/en/docs/22.0/reference/features/global-routing.md:| description | varbinary(128) | YES  |     | NULL    |       |
content/en/docs/22.0/reference/programs/vtctldclient/vtctldclient_Materialize/vtctldclient_Materialize_create.md:    "create_ddl": "create table sales_by_sku (sku varbinary(128) not null primary key, orders bigint, revenue bigint)"
content/en/docs/22.0/reference/programs/vtctldclient/vtctldclient_Materialize/vtctldclient_Materialize_create.md:vtctldclient --server localhost:15999 materialize --workflow product_sales --target-keyspace commerce create --source-keyspace commerce --table-settings '[{"target_table": "sales_by_sku", "create_ddl": "create table sales_by_sku (sku varbinary(128) not null primary key, orders bigint, revenue bigint)", "source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku"}]' --cells zone1 --cells zone2 --tablet-types replica
content/en/docs/22.0/reference/vreplication/internal/keys.md:  `sku` varbinary(128) DEFAULT NULL,
content/en/docs/22.0/reference/vreplication/materialize.md:vtctldclient --server localhost:15999 Materialize --workflow product_sales --target-keyspace commerce create --source-keyspace commerce --table-settings '[{"target_table": "sales_by_sku", "create_ddl": "create table sales_by_sku (sku varbinary(128) not null primary key, orders bigint, revenue bigint)", "source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku"}]' --cells zone1 --cells zone2 --tablet-types replica
content/en/docs/22.0/reference/vreplication/materialize.md:    "create_ddl": "create table sales_by_sku (sku varbinary(128) not null primary key, orders bigint, revenue bigint)"
content/en/docs/22.0/user-guides/configuration-advanced/createlookupvindex.md:| sku         | varbinary(128) | YES  |     | NULL    |       |
content/en/docs/22.0/user-guides/configuration-advanced/createlookupvindex.md:| sku         | varbinary(128) | NO   | PRI | NULL    |       |
content/en/docs/22.0/user-guides/configuration-advanced/createlookupvindex.md:| keyspace_id | varbinary(128) | YES  |     | NULL    |       |
content/en/docs/22.0/user-guides/configuration-advanced/region-sharding.md:  `fullname` varbinary(256) DEFAULT NULL,
content/en/docs/22.0/user-guides/configuration-advanced/region-sharding.md:  `nationalid` varbinary(256) DEFAULT NULL,
content/en/docs/22.0/user-guides/configuration-advanced/region-sharding.md:  `country` varbinary(256) DEFAULT NULL,
content/en/docs/22.0/user-guides/configuration-advanced/region-sharding.md:| keyspace_id | varbinary(128) | YES  |     | NULL    |       |
content/en/docs/22.0/user-guides/migration/materialize.md:  sku varbinary(128) DEFAULT NULL,
content/en/docs/22.0/user-guides/migration/materialize.md:  sku varbinary(128) DEFAULT NULL,
content/en/docs/22.0/user-guides/schema-changes/audit-and-control.md:  `sku` varbinary(128) DEFAULT NULL,
content/en/docs/22.0/user-guides/schema-changes/audit-and-control.md:  `sku` varbinary(128) DEFAULT NULL,
content/en/docs/22.0/user-guides/schema-changes/audit-and-control.md:  `sku` varbinary(128) DEFAULT NULL,
content/en/docs/22.0/user-guides/vschema-guide/lookup-as-primary.md:create table corder_event(corder_event_id bigint, corder_id bigint, ename varchar(128), keyspace_id varbinary(10), primary key(corder_id, corder_event_id));
content/en/docs/22.0/user-guides/vschema-guide/non-unique-lookup.md:create table oname_keyspace_idx(oname varchar(128), corder_id bigint, keyspace_id varbinary(10), primary key(oname, corder_id));
content/en/docs/22.0/user-guides/vschema-guide/unique-lookup.md:create table corder_keyspace_idx(corder_id bigint, keyspace_id varbinary(10), primary key(corder_id));
content/zh/docs/22.0/get-started/kubernetes.md:  sku varbinary(128),
content/zh/docs/22.0/get-started/kubernetes.md:  description varbinary(128),
content/zh/docs/22.0/get-started/kubernetes.md:  email varbinary(128),
content/zh/docs/22.0/get-started/kubernetes.md:  sku varbinary(128),
content/zh/docs/22.0/get-started/kubernetes.md:任意长度的`varbinary`也可以按原样映射到keyspace id。这就是`binary`vindex所做的。
content/zh/docs/22.0/get-started/local.md:  sku varbinary(128),
content/zh/docs/22.0/get-started/local.md:  description varbinary(128),
content/zh/docs/22.0/get-started/local.md:  email varbinary(128),
content/zh/docs/22.0/get-started/local.md:  sku varbinary(128),
content/zh/docs/22.0/get-started/local.md:任意长度的`varbinary`也可以按原样映射到keyspace id。这就是`binary`vindex所做的。
content/zh/docs/22.0/reference/vitess-api.md:| <code>BYTES</code> | <code>2</code> | BYTES is when an array of bytes is used. This is represented as 'varbinary' in mysql  |
content/zh/docs/22.0/schema-management/consistent-lookup.md:The guidance for implementing lookup vindexes has been to create a two-column table. The first column (from column) should match the type of the column of the main table that needs the vindex. The second column (to column) should be a `binary` or a `varbinary` large enough to accommodate the keyspace id.

Are you OK doing that as well? I can help as needed.

Thanks again!

@arthmis
Copy link
Author

arthmis commented Dec 5, 2024

Are you OK doing that as well? I can help as needed.

Yes I can update the website docs.

@mattlord mattlord removed the NeedsWebsiteDocsUpdate What it says label Dec 6, 2024
@mattlord mattlord changed the title Feature Request: Change varbinary to varchar in schemas used in examples folder #15997 Feature Request: Change varbinary to varchar in schemas used in examples folder Dec 6, 2024
@mattlord
Copy link
Contributor

mattlord commented Dec 6, 2024

@arthmis thanks again! One other thing... if we're going to mark this PR as fixing #15997 then we should do the second task listed there as well: Change the vindex type used for those columns in related vschema definitions from hash to xxhash:

examples/compose/test_keyspace_vschema.json:					"name": "hash"
examples/compose/test_keyspace_vschema.json:					"name": "hash"
examples/compose/test_keyspace_vschema.json:		"hash": {
examples/compose/test_keyspace_vschema.json:			"type": "hash"
examples/compose/lookup_keyspace_vschema.json:					"name": "hash"
examples/compose/lookup_keyspace_vschema.json:					"name": "hash"
examples/compose/lookup_keyspace_vschema.json:		"hash": {
examples/compose/lookup_keyspace_vschema.json:			"type": "hash"
examples/compose/vtcompose/vtcompose.go:		vSchemaFile = applyJsonInMemoryPatch(vSchemaFile, generatePrimaryVIndex(tableName, firstColumnName, "hash"))
examples/compose/vtcompose/base_vschema.json:    "hash": {
examples/compose/vtcompose/base_vschema.json:      "type": "hash"
examples/compose/default_vschema.json:    "hash": {
examples/compose/default_vschema.json:      "type": "hash"
examples/demo/schema/customer/vschema.json:    "hash": {
examples/demo/schema/customer/vschema.json:      "type": "hash"
examples/demo/schema/customer/vschema.json:        "name": "hash"
examples/demo/schema/customer/vschema.json:        "name": "hash"
examples/local/vschema_customer_sharded.json:        "hash": {
examples/local/vschema_customer_sharded.json:            "type": "hash"
examples/local/vschema_customer_sharded.json:                    "name": "hash"
examples/local/vschema_customer_sharded.json:                    "name": "hash"
examples/local/vschema.json:    "hash": {
examples/local/vschema.json:      "type": "hash"
examples/local/vschema.json:          "name": "hash"
examples/operator/vschema_customer_sharded.json:        "hash": {
examples/operator/vschema_customer_sharded.json:            "type": "hash"
examples/operator/vschema_customer_sharded.json:                    "name": "hash"
examples/operator/vschema_customer_sharded.json:                    "name": "hash"

That will impact the docs changes as well.

I'm sorry, I was forgetting about these details myself. Does this make sense?

@arthmis
Copy link
Author

arthmis commented Dec 6, 2024

@arthmis thanks again! One other thing... if we're going to mark this PR as fixing #15997 then we should do the second task listed there as well: Change the vindex type used for those columns in related vschema definitions from hash to xxhash

@mattlord Could you actually show an example of a vindex that needs to change in any of the example docs? I did look through the vindexes in the vschema definitions but i didn't find any references to the columns that were changed. That's why I didn't include it in the PR. I could also be misreading some of the vschemas. So an example would be helpful to me.

@deepthi
Copy link
Member

deepthi commented Dec 8, 2024

@arthmis thanks again! One other thing... if we're going to mark this PR as fixing #15997 then we should do the second task listed there as well: Change the vindex type used for those columns in related vschema definitions from hash to xxhash

@mattlord Could you actually show an example of a vindex that needs to change in any of the example docs? I did look through the vindexes in the vschema definitions but i didn't find any references to the columns that were changed. That's why I didn't include it in the PR. I could also be misreading some of the vschemas. So an example would be helpful to me.

Look at vschema.json and vschema_customer_sharded.json. We want to s/hash/xxhash.

@arthmis
Copy link
Author

arthmis commented Dec 9, 2024

Look at vschema.json and vschema_customer_sharded.json. We want to s/hash/xxhash.

Oh does the type for vindexes need to be changed from hash to xxhash as long as they are part of the table? I thought I had to change vindexes if the columns that were changed, like sku, were part of the vschema.

so using vschema.json in the local example is this correct?

  "vindexes": {
    "hash": {
      "type": "hash" -> "xxhash"
    }
  },

And in the rest of that file there is

  "tables": {
    "messages": {
      "column_vindexes": [
        {
          "column": "page",
          "name": "hash" // does this need to be changed to xxhash as well?
        }
      ]
    }

@deepthi
Copy link
Member

deepthi commented Dec 10, 2024

// does this need to be changed to xxhash as well?

No. It is the "name" of the vindex, which is defined here

  "vindexes": {
    "hash": {

@arthmis
Copy link
Author

arthmis commented Dec 10, 2024

ahh thank you!

@arthmis
Copy link
Author

arthmis commented Dec 11, 2024

So I did update the vindex type to xxhash because what was asked for was

Change the vindex type used for those columns in related vschema definitions from hash to xxhash

I only ended up updating 2 files because I assume I only needed to update the vindex for schemas that affected the updated tables. Is that right? Or did I actually have to update everywhere that had a type of hash like @mattlord indicated here

@mattlord
Copy link
Contributor

@arthmis the issue is that the hash vindex uses md5 and thus gets flagged by security audits (even though it's not used for encryption here). xxhash is also faster, so we want to use that by default in examples/demos/docs etc.

It's really quite simple, replace all instances of hash in the .vschema files under the ./examples directory with xxhash. Does that make sense?

@arthmis
Copy link
Author

arthmis commented Dec 11, 2024

oh ok yeah how to change it makes sense. I just wasn't sure whether all of it needed to change, based on the description from the issue, because a lot of the instances of hash weren't related to the tables updated in this Pr. I can go ahead and do that now!

@arthmis
Copy link
Author

arthmis commented Dec 11, 2024

ok @mattlord I've changed all instances of "hash" to "xxhash"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Examples Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Move most VARBINARY columns to VARCHAR in examples
3 participants