Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some queries from Web interface aren't using indexes requiring too much time on v3.1.3 #580

Open
mchehab opened this issue Jan 24, 2024 · 5 comments
Labels
database Issues with the database or database migrations help wanted

Comments

@mchehab
Copy link
Contributor

mchehab commented Jan 24, 2024

This query: state=*&archive=true
Is producing a complex sql statement that it is not using indexes and are taking a long time to complete:

explain SELECT (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=1), 0)) AS `tag_1_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=2), 0)) AS `tag_2_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=3), 0)) AS `tag_3_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=4), 0)) AS `tag_4_count`, (coalesce((SELECT count FROM patchwork_patchtag WHERE patchwork_patchtag.patch_id=patchwork_patch.id AND patchwork_patchtag.tag_id=5), 0)) AS `tag_5_count`, `patchwork_patch`.`id`, `patchwork_patch`.`msgid`, `patchwork_patch`.`date`, `patchwork_patch`.`submitter_id`, `patchwork_patch`.`project_id`, `patchwork_patch`.`name`, `patchwork_patch`.`delegate_id`, `patchwork_patch`.`state_id`, `patchwork_patch`.`series_id`, `patchwork_person`.`id`, `patchwork_person`.`email`, `patchwork_person`.`name`, `patchwork_person`.`user_id`, `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`, `patchwork_state`.`id`, `patchwork_state`.`name`, `patchwork_state`.`slug`, `patchwork_state`.`ordering`, `patchwork_state`.`action_required`, `patchwork_series`.`id`, `patchwork_series`.`name` FROM `patchwork_patch` INNER JOIN `patchwork_person` ON (`patchwork_patch`.`submitter_id` = `patchwork_person`.`id`) LEFT OUTER JOIN `auth_user` ON (`patchwork_patch`.`delegate_id` = `auth_user`.`id`) LEFT OUTER JOIN `patchwork_state` ON (`patchwork_patch`.`state_id` = `patchwork_state`.`id`) LEFT OUTER JOIN `patchwork_series` ON (`patchwork_patch`.`series_id` = `patchwork_series`.`id`) WHERE (`patchwork_patch`.`project_id` = 1 AND `patchwork_patch`.`archived`) ORDER BY `patchwork_patch`.`date` DESC LIMIT 1000;
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+
| id   | select_type        | table              | type   | possible_keys                                                        | key                      | key_len | ref                                    | rows  | Extra                       |
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+
|    1 | PRIMARY            | patchwork_patch    | ref    | patchwork_patch_499df97c,patchwork_patch_1a37f020,patch_covering_idx | patchwork_patch_499df97c | 4       | const                                  | 30607 | Using where; Using filesort |
|    1 | PRIMARY            | patchwork_person   | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.submitter_id | 1     |                             |
|    1 | PRIMARY            | auth_user          | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.delegate_id  | 1     | Using where                 |
|    1 | PRIMARY            | patchwork_state    | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.state_id     | 1     | Using where                 |
|    1 | PRIMARY            | patchwork_series   | eq_ref | PRIMARY                                                              | PRIMARY                  | 4       | patchwork.patchwork_patch.series_id    | 1     | Using where                 |
|    6 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    5 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    4 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    3 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
|    2 | DEPENDENT SUBQUERY | patchwork_patchtag | eq_ref | patch_id                                                             | patch_id                 | 8       | patchwork.patchwork_patch.id,const     | 1     |                             |
+------+--------------------+--------------------+--------+----------------------------------------------------------------------+--------------------------+---------+----------------------------------------+-------+-----------------------------+

As reported by mysql slow log, it takes more than 1:30 mins to complete:

# Thread_id: 5213  Schema: patchwork  QC_hit: No
# Query_time: 92.553408  Lock_time: 0.000355  Rows_sent: 1000  Rows_examined: 101124
# Rows_affected: 0  Bytes_sent: 367857

Issue noticed at https://patchwork.linuxtv.org.

@mchehab
Copy link
Contributor Author

mchehab commented Jan 24, 2024

Analyze results for such query:

analyze.json

@stephenfin stephenfin added database Issues with the database or database migrations help wanted labels Jan 24, 2024
@stephenfin
Copy link
Member

This is likely above my paygrade 😅 I can reproduce on the linuxtv instance but have yet to do so locally (and even then, I'm not certain how I'd address it yet...)

@mchehab
Copy link
Contributor Author

mchehab commented Jan 24, 2024

This is likely above my paygrade 😅 I can reproduce on the linuxtv instance but have yet to do so locally (and even then, I'm not certain how I'd address it yet...)

Yeah, this one seems tricky, and it generates 3 sub-queries that don't use indexes. Btw, if the query keeps having the blank fields there, like ?q=&series=&.., the number of sub-queries without indexes rise to 4.

Funny enough, a query like ?archive=false is fast. I suspect that the problem here is how patchwork/Django converts state=*. into a query.

I mean, running:

analyze select state_id, count(*) from patchwork_patch where NOT archived GROUP BY state_id;
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
| id   | select_type | table           | type | possible_keys      | key                | key_len | ref   | rows  | r_rows   | filtered | r_filtered | Extra                    |
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
|    1 | SIMPLE      | patchwork_patch | ref  | patch_covering_idx | patch_covering_idx | 1       | const | 30608 | 44236.00 |   100.00 |     100.00 | Using where; Using index |
+------+-------------+-----------------+------+--------------------+--------------------+---------+-------+-------+----------+----------+------------+--------------------------+
1 row in set (0.047 sec)

is really fast. So, IMO, the problem is how patchwork currently handle "state=*", creating this really complex query, while something a lot simpler would produce the same result while using indexes.

@mchehab
Copy link
Contributor Author

mchehab commented Jan 30, 2024

Looking at the code, it sounds it comes from here:

class PatchQuerySet(models.query.QuerySet):
    def with_tag_counts(self, project=None):
        if project and not project.use_tags:
            return self

        # We need the project's use_tags field loaded for Project.tags().
        # Using prefetch_related means we'll share the one instance of
        # Project, and share the project.tags cache between all patch.project
        # references.
        qs = self.prefetch_related('project')
        select = OrderedDict()
        select_params = []

        # All projects have the same tags, so we're good to go here
        if project:
            tags = project.tags
        else:
            tags = Tag.objects.all()

        for tag in tags:
            select[tag.attr_name] = (
                "coalesce("
                "(SELECT count FROM patchwork_patchtag"
                " WHERE patchwork_patchtag.patch_id=patchwork_patch.id"
                " AND patchwork_patchtag.tag_id=%s), 0)"
            )
            select_params.append(tag.id)

        return qs.extra(select=select, select_params=select_params)

No idea why it is trying to count patches there via such complex query, instead of just doing the query.

@BtbN
Copy link

BtbN commented Nov 14, 2024

I also ran into this while upgrading our FFmpeg instance from 3.0 to 3.2.
It got so bad the instance ran out of request workers (100 of them), and the site became unuseable, due to 10-20 seconds of response time for each request.

I managed to reduce the issue somewhat by examining some of the most heavy-hitting queries and adding missing indices (using mariadb 11.5):

ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_date` (`id`, `date`);
ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_project_and_state` (`project_id`, `state_id`);

# not entirely sure this one is needed/used
ALTER TABLE `patchwork_patch` ADD INDEX `patchwork_patch_by_date_and_project` (`project_id`, `date`, `archived`);

This brought down some query times from 5-15 seconds to 0.xxx seconds. But some heavy hitters remain, even with them using indices.

The core issue seems to be that even a relatively simple query like this takes 20+ seconds:

SELECT
    `patchwork_patch`.`id`,
    `patchwork_patch`.`msgid`,
    `patchwork_patch`.`date`,
    `patchwork_patch`.`submitter_id`,
    `patchwork_patch`.`project_id`,
    `patchwork_patch`.`name`,
    `patchwork_patch`.`delegate_id`,
    `patchwork_patch`.`state_id`,
    `patchwork_patch`.`series_id`
FROM `patchwork_patch`
WHERE `patchwork_patch`.`project_id` = 1
ORDER BY `patchwork_patch`.`date` DESC LIMIT 10 OFFSET 36300;

That's purely because of the large offset. The query uses all the indices it can get. I don't think this can really be optimized.
This is largely an issue cause various bots are hitting the deep-links to the last page of patches I guess.
So I'm gonna try removing the links to deep pages and see if it improves stability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database Issues with the database or database migrations help wanted
Projects
None yet
Development

No branches or pull requests

3 participants