Skip to content

CITEXT data type #20028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

CITEXT data type #20028

wants to merge 3 commits into from

Conversation

taroface
Copy link
Contributor

Copy link

netlify bot commented Jul 30, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 4ab784c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/688aa2913ca4160008d20949

Copy link

netlify bot commented Jul 30, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 4ab784c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/688aa2918a37f300086e3df9

Copy link

Copy link

netlify bot commented Jul 30, 2025

Netlify Preview

Name Link
🔨 Latest commit 2690813
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/688a4520afbdc80008a458bd
😎 Deploy Preview https://deploy-preview-20028--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.


The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings.

All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled internally with the [lower()]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function

We actually diverged a bit from how we handle CITEXT internally. Instead of lower(), CRDB handles CITEXT similarly to a collated string with the "und-u-ks-level2" locale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal here is to describe the behavior, not the implementation details like collated strings. Here's an alternative rewording I'll propose:

The CITEXT data type represents a case-insensitive string. The casing of values is preserved through storage and retrieval, just like STRING values. The key difference with STRING values is that comparisons between CITEXT values are case-insensitive.

And add some examples that show what we mean by that, e.g.:

CREATE TABLE t (c0 CITEXT, c1 CITEXT);
-- CREATE TABLE

INSERT INTO t VALUES ('foo', 'FOO');
-- INSERT 0 1

-- Casing is preserved during storage and retrieval.
SELECT * FROM t;
--   c0  | c1
-- ------+------
--   foo | FOO
-- (1 row)

-- Comparisons are insensitive to casing.
SELECT c0 = c1 FROM t;
--   ?column?
-- ------------
--      t
-- (1 row)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgartner There's a fuller example near the end of the doc that conveys the above -- do you think that will suffice?

Copy link

netlify bot commented Jul 30, 2025

Netlify Preview

Name Link
🔨 Latest commit 4ab784c
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/688aa29158f78400083ad93d
😎 Deploy Preview https://deploy-preview-20028--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.


## Collations

`CITEXT` compares values as a `STRING` column with the `und-u-ks-level2` [collation]({% link {{ page.version.version }}/collate.md %}), meaning it is case-insensitive but accent-sensitive. If you need accent-insensitive behavior, consider using `STRING` with a nondeterministic collation instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define what a "nondeterministic collation" is?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more clearer to omit the "nondeterministic" part. If we have a docs page for collations and their locale extensions, we should link that here. Otherwise, an example of an accent-insensitive collation is one with the -u-ks-level1 locale extension, such as "und-u-ks-level1".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that part -- we do actually have the docs page you described (it's linked there), but I can see that this is broadening the scope too much.


The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings.

All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal here is to describe the behavior, not the implementation details like collated strings. Here's an alternative rewording I'll propose:

The CITEXT data type represents a case-insensitive string. The casing of values is preserved through storage and retrieval, just like STRING values. The key difference with STRING values is that comparisons between CITEXT values are case-insensitive.

And add some examples that show what we mean by that, e.g.:

CREATE TABLE t (c0 CITEXT, c1 CITEXT);
-- CREATE TABLE

INSERT INTO t VALUES ('foo', 'FOO');
-- INSERT 0 1

-- Casing is preserved during storage and retrieval.
SELECT * FROM t;
--   c0  | c1
-- ------+------
--   foo | FOO
-- (1 row)

-- Comparisons are insensitive to casing.
SELECT c0 = c1 FROM t;
--   ?column?
-- ------------
--      t
-- (1 row)

t
~~~

With `CITEXT`, equality operators (`=`, `!=`, `<>`), ordering operators (`<`, `>`, etc.), and [`STRING` functions]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions), treat values as case-insensitive by default. Refer to the [example](#example).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example of one of these string functions that treat the values as case-insensitive? I think this might not be true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulniziolek Can you help with this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually nm. Although Paul confirmed the statement is true, I don't know if there is a good example of a function that demonstrates the case insensitivity. So I'm removing this line for simplicity.

@taroface
Copy link
Contributor Author

@paulniziolek @mgartner TFTRs -- I ended up simplifying this doc, in part to address your comments. Please have a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants