Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration Command Fails to Update All LogEntry Objects Due to Batch Processing Issue #667

Open
fabianallendorf opened this issue Sep 13, 2024 · 1 comment

Comments

@fabianallendorf
Copy link

fabianallendorf commented Sep 13, 2024

Description:

While following the migration guide to update the django-auditlog package to version 3, I encountered an issue where the migration command did not fully migrate all LogEntry records.

Upon executing the migration command:

$ python manage.py auditlogmigratejson
Updated 101,000 records using Django operations.
There are 100,906 records that need migration.

I noticed that a significant number of LogEntry objects still had null values in the changes field, even though their changes_text fields were populated. This indicated that the migration did not affect these records.

Investigation:

While debugging, I examined the migrate_using_django method in auditlog.management.commands.auditlogmigratejson.Command:

def migrate_using_django(self, batch_size=None):
    logs = self.get_logs()

    if not batch_size:
        return _apply_django_migration(logs)

    total_updated = 0
    for _ in range(ceil(logs.count() / batch_size)):
        total_updated += _apply_django_migration(self.get_logs()[:batch_size])
    return total_updated

The issue appears to stem from this line:

total_updated += _apply_django_migration(self.get_logs()[:batch_size])

When there are more than batch_size DELETE and CREATE LogEntry objects that precede any UPDATE entries, self.get_logs() consistently returns the same initial set of records in each iteration. Since DELETE and CREATE entries do not require changes, the migration process becomes stuck on these records, leaving subsequent UPDATE entries unmigrated.

Possible Solution:

To address this issue, the batching logic should ensure that each batch processes a new subset of LogEntry objects. This can be achieved by implementing an offset mechanism or utilizing Django's Paginator for proper batch iteration. Here is a suggested fix using Paginator:

from django.core.paginator import Paginator

def migrate_using_django(self, batch_size=None):
    logs = self.get_logs()

    if not batch_size:
        return _apply_django_migration(logs)

    paginator = Paginator(logs, batch_size)
    total_updated = 0

    for page_number in paginator.page_range:
        page = paginator.page(page_number)
        total_updated += _apply_django_migration(page.object_list)
    return total_updated

This modification ensures that each batch processes a unique set of LogEntry records, allowing the migration to progress through all entries without repetition.

Environment:

  • django-auditlog version: 2.x upgrading to 3.x
  • Django version: 4.2.15
  • Python version: 3.11
  • Database: PostgreSQL 15

Steps to Reproduce:

  1. Have a database with a large number of LogEntry records, including many DELETE and CREATE entries preceding UPDATE entries.
  2. Follow the migration guide to upgrade to django-auditlog version 3.
  3. Run python manage.py auditlogmigratejson with batching enabled.
  4. Observe that the migration does not update all LogEntry records.

Expected Behavior:

All LogEntry records should have their changes field populated after the migration, with no unmigrated entries remaining.

Actual Behavior:

A significant number of LogEntry records remain unmigrated, with null values in the changes field.

Workaround:

It is possible to avoid the issue by setting batch size to the amount of all existing LogEntry records.


I hope this report helps in identifying and resolving the issue. Please let me know if additional information is required.

@hramezani
Copy link
Member

Thanks @fabianallendorf for reporting this issue.

@aqeelat do you have time to take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants