docs(rfcs): multi-dimension partition rule #3350

waynexia · 2024-02-21T13:24:49Z

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Propose a new region partition scheme.

🖥️ rendered

Checklist

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

Signed-off-by: Ruihang Xia <[email protected]>

MichaelScofield · 2024-02-22T11:46:20Z

As to this syntax:

PARTITION ON COLUMNS (c, b, a) (
  a < 10,
  10 >= a AND a < 20,
  20 >= a AND b < 100,
  20 >= a AND b > 100
)

Do we have to ensure it's legit in that the partition columns do divide the value space correctly? For example, a range is missing in this partition rule:

PARTITION ON COLUMNS (c, b, a) (
  a < 10,
  10 >= a AND a < 20,
  20 >= a AND b < 1,
# 20 >= a AND b >= 1 AND b <= 100 is missing
  20 >= a AND b > 100
)

If we do, I'm afraid it's not an easy task to do, especially facing multiple partition columns and partition data types (numbers, chars and dates).
(I think it's one reason mysql choose to only use "less than" in its partition rule, lol.)
If we don't, then we either have to fill the missing ranges when creating the table (still difficult because we first need to find then) or throw errors at data insertion time. Neither is good.

waynexia · 2024-02-22T11:50:47Z

If we don't, then we either have to fill the missing ranges when creating the table (still difficult because we first need to find then) or throw errors at data insertion time. Neither is good.

I'm thinking have a "default" region which is for all the remaining. It works as there is no partition rule.

MichaelScofield · 2024-02-22T12:20:16Z

Thinking of the flexibility of our repartitioning in the future, a "default" region is acceptable. Just how to define the syntax?

waynexia · 2024-02-22T12:23:39Z

Thinking of the flexibility of our repartitioning in the future, a "default" region is acceptable. Just how to define the syntax?

No need to specify. It exists by default

MichaelScofield · 2024-02-23T01:33:39Z

Why it's existed by default? Which region to put the data in range "20 >= a AND b >= 1 AND b <= 100 in the following partition rule?

PARTITION ON COLUMNS (c, b, a) (
  a < 10,
  10 >= a AND a < 20,
  20 >= a AND b < 1,
  20 >= a AND b > 100
)

waynexia · 2024-02-23T06:39:38Z

Which region to put the data in range "20 >= a AND b >= 1 AND b <= 100 in the following partition rule?

The default one.

Why it's existed by default?

For simplicity. This is something like a switch ... case ... grammar. Has a default branch for all the exceptions. We can remove it (internally) if the provided rule set is complete.

killme2008 · 2024-02-26T03:27:19Z

Thinking of the flexibility of our repartitioning in the future, a "default" region is acceptable. Just how to define the syntax?

No need to specify. It exists by default

But we don't provide a default region currently, am I right?

I think it's better to add it to the RFC too.

killme2008

Almost LGTM

Signed-off-by: Ruihang Xia <[email protected]>

waynexia · 2024-02-26T06:54:46Z

I think it's better to add it to the RFC too.

Updated. PTAL @MichaelScofield

Signed-off-by: Ruihang Xia <[email protected]>

MichaelScofield · 2024-02-26T08:29:14Z

Which region to put the data in range "20 >= a AND b >= 1 AND b <= 100 in the following partition rule?

The default one.

Why it's existed by default?

For simplicity. This is something like a switch ... case ... grammar. Has a default branch for all the exceptions. We can remove it (internally) if the provided rule set is complete.

But if the default region definition doesn't show up in the partition rule, it's a little surprise to user. If the default region is some region in the partition rule, for example, the first one, then it's against the intuition that the region can only contain the data with the range in it.

waynexia · 2024-02-26T08:39:23Z

But if the default region definition doesn't show up in the partition rule, it's a little surprise to user. If the default region is some region in the partition rule, for example, the first one, then it's against the intuition that the region can only contain the data with the range in it.

No, this is not how the default rule works and for. It's a conventional setting for usability. Or would an explicit declaration be no such surprise? Like

PARTITION ON COLUMNS (c, b, a) (
  a < 10,
  10 >= a AND a < 20,
  20 >= a AND b < 1,
# 20 >= a AND b >= 1 AND b <= 100 is missing
  DEFAULT,
  20 >= a AND b > 100
)

If the DEFALUT is missing there won't be a default region. Only if it is present, no matter the place, will enable this default region for this table.

killme2008 · 2024-03-04T08:08:01Z

@MichaelScofield PTAL

waynexia added 2 commits February 21, 2024 21:14

docs(rfcs): multi-dimension partition rule

5541fa9

Signed-off-by: Ruihang Xia <[email protected]>

change math block type

d5f72aa

Signed-off-by: Ruihang Xia <[email protected]>

github-actions bot added the docs-not-required This change does not impact docs. label Feb 21, 2024

waynexia added 2 commits February 21, 2024 21:25

fix typo

bb6d78a

Signed-off-by: Ruihang Xia <[email protected]>

update tracking issue

726c93d

Signed-off-by: Ruihang Xia <[email protected]>

This was referenced Feb 21, 2024

Tracking issue for new region partition rule #3351

Closed

ci: align docs workflow jobs with develop.yml #3356

Merged

Merge branch 'main' into rfc-partition-rule

7d86fc8

killme2008 approved these changes Feb 26, 2024

View reviewed changes

update discussion

cd23c5d

Signed-off-by: Ruihang Xia <[email protected]>

fix typo

3af56f0

Signed-off-by: Ruihang Xia <[email protected]>

waynexia enabled auto-merge February 26, 2024 11:33

killme2008 requested a review from MichaelScofield March 4, 2024 08:07

MichaelScofield approved these changes Mar 4, 2024

View reviewed changes

waynexia added this pull request to the merge queue Mar 4, 2024

Merged via the queue into GreptimeTeam:main with commit ae2c18e Mar 4, 2024
12 checks passed

waynexia deleted the rfc-partition-rule branch March 4, 2024 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(rfcs): multi-dimension partition rule #3350

docs(rfcs): multi-dimension partition rule #3350

waynexia commented Feb 21, 2024 •

edited

Loading

MichaelScofield commented Feb 22, 2024

waynexia commented Feb 22, 2024

MichaelScofield commented Feb 22, 2024

waynexia commented Feb 22, 2024

MichaelScofield commented Feb 23, 2024

waynexia commented Feb 23, 2024

killme2008 commented Feb 26, 2024

killme2008 left a comment

waynexia commented Feb 26, 2024

MichaelScofield commented Feb 26, 2024

waynexia commented Feb 26, 2024

killme2008 commented Mar 4, 2024

docs(rfcs): multi-dimension partition rule #3350

docs(rfcs): multi-dimension partition rule #3350

Conversation

waynexia commented Feb 21, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

MichaelScofield commented Feb 22, 2024

waynexia commented Feb 22, 2024

MichaelScofield commented Feb 22, 2024

waynexia commented Feb 22, 2024

MichaelScofield commented Feb 23, 2024

waynexia commented Feb 23, 2024

killme2008 commented Feb 26, 2024

killme2008 left a comment

Choose a reason for hiding this comment

waynexia commented Feb 26, 2024

MichaelScofield commented Feb 26, 2024

waynexia commented Feb 26, 2024

killme2008 commented Mar 4, 2024

waynexia commented Feb 21, 2024 •

edited

Loading