Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltext case-sensitive index behavior #1145

Open
gar1t opened this issue Aug 21, 2024 · 1 comment
Open

Fulltext case-sensitive index behavior #1145

gar1t opened this issue Aug 21, 2024 · 1 comment
Assignees

Comments

@gar1t
Copy link

gar1t commented Aug 21, 2024

In Quick Start, the query:

SELECT 
  ts,
  api_path,
  log
FROM
  app_logs
WHERE
  matches(log, 'timeout');

shows results that are case-sensitive:

+---------------------+------------------+--------------------+
| ts                  | api_path         | log                |
+---------------------+------------------+--------------------+
| 2024-07-11 20:00:10 | /api/v1/billings | Connection timeout |
| 2024-07-11 20:00:10 | /api/v1/resource | Connection timeout |
+---------------------+------------------+--------------------+
2 rows in set (0.01 sec)

However, the table def is this:

Create Table: CREATE TABLE IF NOT EXISTS `app_logs` (
...
`log` STRING NULL FULLTEXT WITH(analyzer = 'English', case_sensitive = 'false'),
...)

The docs for CREATE indicate that case_sensitive for FULLTEXT is true. Based on what I'm seeing, following Quick Start, the default is false.

In any event, the query behavior is case sensitive.

Issues as I see them:

  • Possible error in either docs or implementation for default value of case_sensitive for fulltext index
  • Case-sensitive match behavior when schema shows case_sensitive to be false
@zhongzc
Copy link
Contributor

zhongzc commented Aug 21, 2024

Thank you for your thorough review; the issue does indeed exist.

The specific reason is that the calculation for matches is separate between frontend and datanode. Datanode does respect the case-sensitive configuration, but this part has not yet been completed in frontend (see TODO): https://github.com/GreptimeTeam/greptimedb/blob/9c1704d4cbbfab8af07a77da598a1cfe2a5e7b22/src/common/function/src/scalars/matches.rs#L75-L95. As it stands, the implementation is currently case-sensitive.

Therefore, until this part of the work is completed, to maintain consistency, I think we can either hardcode this configuration to true and make it unchangeable, or hardcode it to false, but then change https://github.com/GreptimeTeam/greptimedb/blob/9c1704d4cbbfab8af07a77da598a1cfe2a5e7b22/src/common/function/src/scalars/matches.rs#L205 to use ilike, which would be more practical.

In any case, it was indeed an oversight, and I will arrange for a prompt fix.

cc @waynexia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants