Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(rfcs): multi-dimension partition rule #3350

Merged
merged 7 commits into from
Mar 4, 2024

Conversation

waynexia
Copy link
Member

@waynexia waynexia commented Feb 21, 2024

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Propose a new region partition scheme.

🖥️ rendered

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Feb 21, 2024
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
@MichaelScofield
Copy link
Collaborator

As to this syntax:

PARTITION ON COLUMNS (c, b, a) (
  a < 10,
  10 >= a AND a < 20,
  20 >= a AND b < 100,
  20 >= a AND b > 100
)

Do we have to ensure it's legit in that the partition columns do divide the value space correctly? For example, a range is missing in this partition rule:

PARTITION ON COLUMNS (c, b, a) (
  a < 10,
  10 >= a AND a < 20,
  20 >= a AND b < 1,
# 20 >= a AND b >= 1 AND b <= 100 is missing
  20 >= a AND b > 100
)

If we do, I'm afraid it's not an easy task to do, especially facing multiple partition columns and partition data types (numbers, chars and dates).
(I think it's one reason mysql choose to only use "less than" in its partition rule, lol.)
If we don't, then we either have to fill the missing ranges when creating the table (still difficult because we first need to find then) or throw errors at data insertion time. Neither is good.

@waynexia
Copy link
Member Author

If we don't, then we either have to fill the missing ranges when creating the table (still difficult because we first need to find then) or throw errors at data insertion time. Neither is good.

I'm thinking have a "default" region which is for all the remaining. It works as there is no partition rule.

@MichaelScofield
Copy link
Collaborator

Thinking of the flexibility of our repartitioning in the future, a "default" region is acceptable. Just how to define the syntax?

@waynexia
Copy link
Member Author

Thinking of the flexibility of our repartitioning in the future, a "default" region is acceptable. Just how to define the syntax?

No need to specify. It exists by default

@MichaelScofield
Copy link
Collaborator

Why it's existed by default? Which region to put the data in range "20 >= a AND b >= 1 AND b <= 100 in the following partition rule?

PARTITION ON COLUMNS (c, b, a) (
  a < 10,
  10 >= a AND a < 20,
  20 >= a AND b < 1,
  20 >= a AND b > 100
)

@waynexia
Copy link
Member Author

Which region to put the data in range "20 >= a AND b >= 1 AND b <= 100 in the following partition rule?

The default one.

Why it's existed by default?

For simplicity. This is something like a switch ... case ... grammar. Has a default branch for all the exceptions. We can remove it (internally) if the provided rule set is complete.

@killme2008
Copy link
Contributor

Thinking of the flexibility of our repartitioning in the future, a "default" region is acceptable. Just how to define the syntax?

No need to specify. It exists by default

But we don't provide a default region currently, am I right?

I think it's better to add it to the RFC too.

Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM

Signed-off-by: Ruihang Xia <[email protected]>
@waynexia
Copy link
Member Author

I think it's better to add it to the RFC too.

Updated. PTAL @MichaelScofield

Signed-off-by: Ruihang Xia <[email protected]>
@MichaelScofield
Copy link
Collaborator

Which region to put the data in range "20 >= a AND b >= 1 AND b <= 100 in the following partition rule?

The default one.

Why it's existed by default?

For simplicity. This is something like a switch ... case ... grammar. Has a default branch for all the exceptions. We can remove it (internally) if the provided rule set is complete.

But if the default region definition doesn't show up in the partition rule, it's a little surprise to user. If the default region is some region in the partition rule, for example, the first one, then it's against the intuition that the region can only contain the data with the range in it.

@waynexia
Copy link
Member Author

But if the default region definition doesn't show up in the partition rule, it's a little surprise to user. If the default region is some region in the partition rule, for example, the first one, then it's against the intuition that the region can only contain the data with the range in it.

No, this is not how the default rule works and for. It's a conventional setting for usability. Or would an explicit declaration be no such surprise? Like

PARTITION ON COLUMNS (c, b, a) (
  a < 10,
  10 >= a AND a < 20,
  20 >= a AND b < 1,
# 20 >= a AND b >= 1 AND b <= 100 is missing
  DEFAULT,
  20 >= a AND b > 100
)

If the DEFALUT is missing there won't be a default region. Only if it is present, no matter the place, will enable this default region for this table.

@killme2008
Copy link
Contributor

@MichaelScofield PTAL

@waynexia waynexia added this pull request to the merge queue Mar 4, 2024
Merged via the queue into GreptimeTeam:main with commit ae2c18e Mar 4, 2024
12 checks passed
@waynexia waynexia deleted the rfc-partition-rule branch March 4, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants