Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support utf8mb3 charset #26226

Closed
GMHDBJD opened this issue Jul 14, 2021 · 6 comments · Fixed by #44655
Closed

Support utf8mb3 charset #26226

GMHDBJD opened this issue Jul 14, 2021 · 6 comments · Fixed by #44655
Assignees
Labels
component/parser type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@GMHDBJD
Copy link
Contributor

GMHDBJD commented Jul 14, 2021

Feature Request

Is your feature request related to a problem? Please describe:

mysql 8.0.24+ and mariadb 10.6.1+, create table with utf8

create table tb(id int) DEFAULT CHARACTER SET=utf8;

show create table will return utf8mb3

show create table tb;
CREATE TABLE `tb` (
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3

TiDB doesn't support utf8mb3

CREATE TABLE `tb` (
    ->   `id` int(11) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
ERROR 1115 (42000): Unknown character set: 'utf8mb3'

Describe the feature you'd like:

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@GMHDBJD GMHDBJD added the type/feature-request Categorizes issue or PR as related to a new feature. label Jul 14, 2021
@wojiushixiaobai
Copy link

关注下,遇到了同样的问题

@niubell
Copy link
Contributor

niubell commented Mar 14, 2022

/assign WizardXiao

@WizardXiao
Copy link
Contributor

WizardXiao commented Mar 14, 2022

When use dm sync data, you can use binlog filter or handle-error as a workaround. And also, you can change a charset like utf8mb4_bin ref doc.
However, support utf8mb3 is the long-term solution.

@dveeden
Copy link
Contributor

dveeden commented Aug 31, 2022

MySQL 8.0.30:

mysql> CREATE TABLE t1(id char(1) collate utf8_general_ci);
Query OK, 0 rows affected, 1 warning (0.05 sec)

mysql> CREATE TABLE t2(id char(1) collate utf8mb3_general_ci);
Query OK, 0 rows affected, 1 warning (0.05 sec)

mysql> show warnings;
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                           |
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 3778 | 'utf8mb3_general_ci' is a collation of the deprecated character set UTF8MB3. Please consider using UTF8MB4 with an appropriate collation instead. |
+---------+------+---------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

TiDB v6.2.0:

sql> CREATE TABLE t1(id char(1) collate utf8_general_ci);
Query OK, 0 rows affected (0.1375 sec)

sql> CREATE TABLE t2(id char(1) collate utf8mb3_general_ci);
ERROR: 1273 (HY000): Unknown collation: 'utf8mb3_general_ci'

I think this should be part of #7968

@dveeden
Copy link
Contributor

dveeden commented Aug 31, 2022

To me it looks like we should:

  1. Rename utf8_general_ci to utf8mb3_general_ci and the same for other 3-byte UTF-8 collations.
  2. Map utf8 to utf8mb3 in the parser, and the same for the related collations
  3. Consider sending warnings for the use of 3-byte UTF-8 character sets and collations.

@Fanduzi
Copy link

Fanduzi commented Apr 19, 2023

I currently understand that this is a problem with tidb parser, which does not recognize utf8mb3, and the SQL audit tool we implemented based on tidb parser also encountered this problem. I hope tidb can fix it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/parser type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
7 participants