Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table broken after running incorrect alter statement (standalone mode) ? #2425

Closed
NiwakaDev opened this issue Sep 18, 2023 · 6 comments · Fixed by #2437
Closed

Table broken after running incorrect alter statement (standalone mode) ? #2425

NiwakaDev opened this issue Sep 18, 2023 · 6 comments · Fixed by #2437
Assignees
Labels
C-bug Category Bugs

Comments

@NiwakaDev
Copy link
Collaborator

NiwakaDev commented Sep 18, 2023

What type of bug is this?

Unexpected error

What subsystems are affected?

Standalone mode (latest commit: e7e254c)

What happened?

Table is broken after running incorrect alter statement. (standalone mode)

You can reproduce this issue by following steps:

  1. Create a table like this:
create table table_name(value_double double not null, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP TIME INDEX);
  1. Run incorrect alter table:
alter table table_name add column host string not null;

This should fail since we can't add a new_column which doesn't allow null and have default value. So, this is no problem.

Once we run the above steps, we can't execute select statements like select * from table_name. (maybe any statement for this table)

You can any statements which aren't related to this table, like so:

show databases;

What operating system did you use?

Mac OS (Ventura 13.3) (Maybe any OS)

Relevant log output and stack trace

Here's a history of mysql client.


mysql> create table table_name(value_double double not null, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP TIME INDEX);
Query OK, 0 rows affected (0.01 sec)

mysql> select * from table_name;
Empty set (0.00 sec)

mysql> alter table table_name add column host string not null;
ERROR 1815 (HY000): Failed to execute query, source: Failed to execute ddl, source: Failed to wait procedure done, source: Failed to execute procedure, source: Failed to execute procedure due to external error, source: Failed to operate on datanode: peer-0(), source: Failed to handle request for region 4402341478400(1025, 0), source: Invalid region request, region_id: 4402341478400(1025, 0), err: no default value for column host, location: src/store-api/src/region_request.rs:323:9, location: src/mito2/src/worker/handle_alter.
mysql> select * from table_name;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    8
Current database: public

ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    8
Current database: public

ERROR 2013 (HY000): Lost connection to MySQL server during query

By the way, after closing mysql client, it seems like there's the same problem.

How can we reproduce the bug?

  1. Create a table like this:
create table table_name(value_double double not null, ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP TIME INDEX);
  1. Run incorrect alter table:
alter table table_name add column host string not null;

This should fail since we can't add a new_column which doesn't allow null and have default value.

Once we run the above steps, we can't execute select statements like select * from table_name. (maybe any statement for this table)

@NiwakaDev NiwakaDev added the C-bug Category Bugs label Sep 18, 2023
@NiwakaDev
Copy link
Collaborator Author

I don't know whether or not this issue depends on only my local environment.

@WenyXu
Copy link
Member

WenyXu commented Sep 18, 2023

I don't know whether or not this issue depends on only my local environment.

It seems like the verify_alter didn't work as expected.

@killme2008
Copy link
Contributor

I don't know whether or not this issue depends on only my local environment.

It seems like the verify_alter didn't work as expected.

The problem is that the table was broken after this operation. Let's take a look.

@evenyag
Copy link
Contributor

evenyag commented Sep 18, 2023

@WenyXu I think we should submit the alter requests first and then update the metadata and invalidate caches. Now we always update the metadata first.

AlterTableState::UpdateMetadata => self.on_update_metadata().await,
AlterTableState::InvalidateTableCache => self.on_broadcast().await,
AlterTableState::SubmitAlterRegionRequests => self.submit_alter_region_requests().await,

Users should not see the new schema until all regions are altered. Once a region returns an invalid argument error, we should abort the procedure.

But we still need to fix verify_after() and checks whether the column is nullable.

@evenyag
Copy link
Contributor

evenyag commented Sep 18, 2023

TableInfo and RegionMetadata are two different structs. We should find a way to reuse some request structs and validators. Now we need to write similar validating methods twice.

@NiwakaDev
Copy link
Collaborator Author

It seems like the verify_alter didn't work as expected.

I think we should submit the alter requests first and then update the metadata and invalidate caches. Now we always update the metadata first.

I see. I learned a lot of things from your comments about the procedure systems. Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category Bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants