-
Notifications
You must be signed in to change notification settings - Fork 472
CITEXT data type #20028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CITEXT data type #20028
Conversation
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
2690813
to
1885cd1
Compare
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
src/current/v25.3/citext.md
Outdated
|
||
The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings. | ||
|
||
All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handled internally with the [
lower()
]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function
We actually diverged a bit from how we handle CITEXT internally. Instead of lower()
, CRDB handles CITEXT similarly to a collated string with the "und-u-ks-level2"
locale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal here is to describe the behavior, not the implementation details like collated strings. Here's an alternative rewording I'll propose:
The
CITEXT
data type represents a case-insensitive string. The casing of values is preserved through storage and retrieval, just likeSTRING
values. The key difference withSTRING
values is that comparisons betweenCITEXT
values are case-insensitive.
And add some examples that show what we mean by that, e.g.:
CREATE TABLE t (c0 CITEXT, c1 CITEXT);
-- CREATE TABLE
INSERT INTO t VALUES ('foo', 'FOO');
-- INSERT 0 1
-- Casing is preserved during storage and retrieval.
SELECT * FROM t;
-- c0 | c1
-- ------+------
-- foo | FOO
-- (1 row)
-- Comparisons are insensitive to casing.
SELECT c0 = c1 FROM t;
-- ?column?
-- ------------
-- t
-- (1 row)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgartner There's a fuller example near the end of the doc that conveys the above -- do you think that will suffice?
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
src/current/v25.3/citext.md
Outdated
|
||
## Collations | ||
|
||
`CITEXT` compares values as a `STRING` column with the `und-u-ks-level2` [collation]({% link {{ page.version.version }}/collate.md %}), meaning it is case-insensitive but accent-sensitive. If you need accent-insensitive behavior, consider using `STRING` with a nondeterministic collation instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to define what a "nondeterministic collation" is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more clearer to omit the "nondeterministic" part. If we have a docs page for collations and their locale extensions, we should link that here. Otherwise, an example of an accent-insensitive collation is one with the -u-ks-level1
locale extension, such as "und-u-ks-level1"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed that part -- we do actually have the docs page you described (it's linked there), but I can see that this is broadening the scope too much.
src/current/v25.3/citext.md
Outdated
|
||
The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings. | ||
|
||
All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal here is to describe the behavior, not the implementation details like collated strings. Here's an alternative rewording I'll propose:
The
CITEXT
data type represents a case-insensitive string. The casing of values is preserved through storage and retrieval, just likeSTRING
values. The key difference withSTRING
values is that comparisons betweenCITEXT
values are case-insensitive.
And add some examples that show what we mean by that, e.g.:
CREATE TABLE t (c0 CITEXT, c1 CITEXT);
-- CREATE TABLE
INSERT INTO t VALUES ('foo', 'FOO');
-- INSERT 0 1
-- Casing is preserved during storage and retrieval.
SELECT * FROM t;
-- c0 | c1
-- ------+------
-- foo | FOO
-- (1 row)
-- Comparisons are insensitive to casing.
SELECT c0 = c1 FROM t;
-- ?column?
-- ------------
-- t
-- (1 row)
src/current/v25.3/citext.md
Outdated
t | ||
~~~ | ||
|
||
With `CITEXT`, equality operators (`=`, `!=`, `<>`), ordering operators (`<`, `>`, etc.), and [`STRING` functions]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions), treat values as case-insensitive by default. Refer to the [example](#example). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an example of one of these string functions that treat the values as case-insensitive? I think this might not be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulniziolek Can you help with this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually nm. Although Paul confirmed the statement is true, I don't know if there is a good example of a function that demonstrates the case insensitivity. So I'm removing this line for simplicity.
@paulniziolek @mgartner TFTRs -- I ended up simplifying this doc, in part to address your comments. Please have a look! |
DOC-14015