索引

索引概述

索引是帮助快速获取数据的数据结构(有序)
- 优缺点
  - 优点
    - 提高查询速度, 降低 IO 成本
    - 通过索引对数据进行了排序, 降低了排序的成本
  - 缺点
    - 索引需要占用额外空间
    - 为了维护索引 insert, update, delete 效率降低

索引结构

B+Tree 索引: 最常见的索引
- MySQL 索引对经典的 B+Tree 进行了优化, 增加了一个指向相邻叶子节点的链表指针, 提高了区间访问速度
Hash 索引: 基于哈希表实现, 不支持范围查询
- 只能用于对等比较(=, in), 不支持范围查询(between...and..., <, >, ...)
- 无法用于排序
- 查询效率高,通常只需要一次检索
R-Tree 索引(空间索引): MyISAM 的一个特殊索引类型, 用于地理空间数据类型
Full-Text(全文索引): 通过建立倒排索引, 快速匹配文档, 类似 Lucene, Solr, ES

索引	InnoDB	MyISAM	Memory
B+Tree	√	√	√
Hash	-	-	√
R-Tree	-	√	-
Full-Text	5.6+	√	-

索引分类

分类	关键字	描述
主键索引	primary key	默认自动创建, 唯一
唯一索引	unique	多个
常规索引	-	多个
全文索引	fulltext	多个

在 InnoDB 中, 索引又可以分为
- 聚集索引(Clustered Index): 数据与索引存储在一块, 索引结构的叶子节点保存了行数据, 必须有, 唯一
  - 选取规则:
    - 主键
    - 第一个 unique
    - 自动生成 rowid
- 二级索引(Secondary Index): 数据与索引分开存储, 索引结构的叶子节点关联的是对应的主键, 多个

索引语法

-- 准备测试数据
drop database if exists `java_basic`;
create database if not exists `java_basic` charset utf8mb4;
use `java_basic`;
drop table if exists `student`;
create table if not exists `student`
(
    `id`         int auto_increment primary key not null comment '编号',
    `name`       varchar(32)                    not null comment '姓名',
    `phone`      varchar(16)                    not null comment '电话',
    `email`      varchar(32)                    not null comment '邮箱',
    `gender`     tinyint unsigned               not null comment '性别',
    `age`        tinyint unsigned               not null comment '年龄',
    `entry_time` datetime                       not null comment '时间'
) comment '学生表';

drop procedure if exists add_student;
create procedure add_student(in num int)
begin
    declare i int default 0;
    declare id varchar(8);
    declare name varchar(32);
    declare phone varchar(11);
    declare email varchar(32);
    declare gender tinyint;
    declare age tinyint;
    declare time datetime;

    start transaction;

    select ifnull(max(`id`) + 1, 0)
    into i
    from `student`;
    set
        num := num + i;
    while
        i != num
        do
            set i := i + 1;
            set id := lpad(i, 8, 0);
            set name := concat('N', id);
            set phone := concat('123', id);
            set email := concat(name, '@school.edu');
            set gender := if(rand() < 0.5, 1, 2);
            set age := 16 + rand() * 8;
            set time = date_add(date_add(date_add(now(),
                                                  interval rand() * -16 year)
                                    , interval rand() * -12 month),
                                interval rand() * -31 day);
            insert into `student` (`name`, `phone`, `email`, `gender`, `age`, `entry_time`)
            values (name, phone, email, gender, age, time);
            if
                i % 100000 = 0 then
                select i as '当前', num as '总数';
            end if;
        end while;
    commit;
end;

truncate table `student`;
call add_student(10);

alter table `student` rename to `stu`;

select *
from `student`;

show index from `student`;
create index `idx_student_name` on `student` (`name`);
create unique index `idx_student_phone` on `student` (`phone`);
create index `idx_student_email_gender_age` on `student` (`email`, `gender`, `age`);

create unique index `idx_student_email` on `student` (`email`);
drop index `idx_student_email` on `student`;

性能分析

查询统计信息

show global status like 'Com_______';

慢查询日志

show variables like 'slow_query_log';
show variables like 'long_query_time';
set @@long_query_time = 10;

vim C:\ProgramData\MySQL\MySQL Server 8.0\my.ini
# slow_query_log=1 # 开启慢查询日期
# long_query_time=2; # 超时时间为2秒

profile

show variables like 'have_profiling';
show variables like 'profiling';
set @@profiling = 1;
show profiles;
show profile for query 10;
show profile cpu for query 10;

explain
- id: 指示操作表的顺序, 相同顺序执行, 不同值越大越先执行
- select_type:
  - simple
  - primary
  - union
  - subquery
- type, 性能有好到差为 null, system, const, eq_ref, ref, range, index, all
- possible_key: 显示可能应用在这张表上的索引
- key: 实际用的索引
- key_len: 索引使用字节数, 为索引字段最大可能长度, 不是实际长度, 在不损失精度的情况下, 越短越好
- rows: MySQL 认为必须要执行查询的行数, 在 InnoDB 引擎的表中, 只是一个预估值
- filtered: 表示返回结果的行数占需读取行数的百分比, 越大越好
- Extra: 额外信息

explain
select *
from `student`
where `id` > 4;

使用规则

最左前缀法则
- 如果索引了多列(联合索引), 要遵守最左前缀法则, 查询从索引的最左列开始, 不跳过索引中的列
- 如果跳过了某一列, 索引将部分失效
- 只需要包含该条件就行, 不要求条件在 where 中的先后顺序
- 在联合索引中, 出现范围查询(<, >), 范围查询的右侧将失效

-- key_len 132
explain
select *
from `student`
where `email` = 'student0020@school.edu'
  and `gender` >= 1
  and `age` >= 18;

-- key_len 130
explain
select *
from `student`
where `email` = 'student0020@school.edu';

-- key_len 131
explain
select *
from `student`
where `email` = 'student0020@school.edu'
  and `gender` >= 1;

-- key_len 130
explain
select *
from `student`
where `email` = 'student0020@school.edu'
  and `age` >= 18;

-- 范围查询
-- key_len 131
explain
select *
from `student`
where `email` = 'student0020@school.edu'
  and `gender` > 0
  and `age` >= 18;

索引失效

索引列运算
- 索引列上使用运算操作, 索引将失效
字符串未加引号
- 字符串查询时未加引号, 索引将失效
模糊匹配
- 头部模糊匹配, 索引将失效
or 连接条件
- 用 or 分开的条件任意一侧条件中的列没有索引时, 索引将失效
数据分布影响
- MySQL 评估使用索引比全表扫描还慢, 索引将失效
- is null 和 is not null 是否使用索引也取决于数据分布情况

-- key_len 66
explain
select *
from `student`
where `phone` = '12300000012';

-- 索引列运算
-- key_len NULL
explain
select *
from `student`
where substr(`phone`, 10, 2) = '12';

-- 字符串未加引号
-- key_len NULL
explain
select *
from `student`
where `phone` = 12300000012;

-- 模糊匹配
-- key_len 66
explain
select *
from `student`
where `phone` like '1230000000%';

-- key_len NULL
explain
select *
from `student`
where `phone` like '%2';

-- or
-- key_len NULL
explain
select *
from `student`
where `id` = 2
   or `age` > 18;

-- key_len 4, 66
explain
select *
from `student`
where `id` = 2
   or `phone` = '12300000001';

-- 数据分布影响
-- key_len NULL
explain
select *
from `student`
where `phone` >= '12300000000';

-- key_len 66
explain
select *
from `student`
where `phone` >= '12300000015';

索引使用

SQL 提示
- 数据库优化的重要手段
- use index: 建议使用索引
- ignore index: 忽略索引
- force index: 强制使用索引
覆盖索引
- 尽量使用覆盖索引( 查询使用了索引, 并且需要返回的列, 在该索引中已经全部能够找到), 减少使用 select *
- Extra
  - Using where; Using index
    - 查找使用了索引, 但是所需的数据都在索引列中能找到, 不需要回表查询数据
  - Using index condition
    - 查找使用了索引, 但需要回表查询数据
前缀索引
- 将字符串的一部分前缀建立索引, 大大节约索引空间, 提高检索效率
单列索引与联合索引
- 如果存在多个查询条件, 考虑针对查询字典简历索引时, 建议建立联合索引

-- SQL提示
create unique index `idx_student_email` on `student` (`email`);

-- possible_keys idx_student_email,idx_student_email_gender_age
explain
select *
from `student`
where `email` = 'student0002@school.edu';

-- possible_keys idx_student_email_gender_age
explain
select *
from `student`
         use index (`idx_student_email_gender_age`)
where `email` = 'student0002@school.edu';

-- possible_keys idx_student_email_gender_age
explain
select *
from `student`
         ignore index (`idx_student_email`)
where `email` = 'student0002@school.edu';

-- 覆盖索引
drop index `idx_student_email` on `student`;

-- Extra Using where; Using index
explain
select `id`, `email`, `gender`, `age`
from `student`
where `email` = 'student0012@school.edu'
  and `gender` >= 1
  and `age` >= 18;

-- Extra Using index condition
explain
select `id`, `name`, `email`, `gender`, `age`
from `student`
where `email` = 'student0012@school.edu'
  and `age` >= 18;

select email
from student;

select count(distinct substr(`email`, 1, 11)) / count(*)
from `student`;

create index `idx_student_email` on `student` (`email`(11));
show index from `student`;

explain
select *
from `student`
         use index (idx_student_email)
where `email` = 'student0012@school.edu'

索引设计原则

针对数据量较大的表, 且查询频繁的表
针对常作为查询条件(where), 排序(order by), 分组(group by)操作的字段建立索引
尽量选择区分度高的列作为索引, 区分度越高, 使用索引的效率越高
如果是字符串类型的字段, 字段的长度较长, 可以针对与字段的特点, 建立前缀索引
尽量使用联合索引, 减少单列索引, 查询时, 联合索引可以覆盖索引, 节省存储空间, 避免回表查询, 提高查询效率
要控制索引的数量, 不是越多越好, 索引越多, 维护索引结构的代价就越大, 会影响增删改效率
如果列不能存储 NULL 指, 务必使用 not null 约束, 当优化器知道每列是否包含 NULL 值时, 它可以更好地确定哪个索引最有效地用于查询

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!