Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkSaveChanges/BulkInsert(IncludeGraph=true): Issue with Parent-Child Entity Mapping #1635

Open
Zinoberous opened this issue Dec 12, 2024 · 2 comments

Comments

@Zinoberous
Copy link

Zinoberous commented Dec 12, 2024

Description

There is an issue with mapping parent (Document) and child (LineItem) entities when using BulkSaveChanges. The error occurs only on SQL Server and not with in-memory SQLite or when using EF Core's native SaveChanges.

Problem

The issue is likely caused by the structure of the database schema, where tables that include EmployeeId also include CompanyId as a FK depite Employee itself has a FK on Company. This setup introduces mapping problems during the bulk save operation when BulkSaveChangesAsync is used.

The error occurs when Document entities with associated LineItem entities are saved. Both Document and LineItem reference CompanyId and EmployeeId fields. During bulk save, these relationships appear to be misaligned, resulting in incorrect mapping of child entities (LineItem) to their parent entity (Document).

Reproduction

The issue can be reproduced with the following setup:

  1. Database Schema: I created the following DB structure: image
  2. Scenario: Create 2 companies, each with 23 employees, where each employee has 1 turnover. Generate a Document for each employee, with LineItem entities created for each of their turnovers.
  3. Expected Result: LineItem entities should be correctly mapped to their parent Document.
  4. Actual Result: While 46 Document entities are created (23 employees × 2 companies), their child LineItem entities are incorrectly assigned to documents. For employees with multiple turnovers, the LineItem entities belonging together are grouped and assigned to the same (wrong) document.
  5. Code Example:
using Bogus;
using Bogus.Extensions;
using EFCore.BulkExtensions;
using FluentAssertions;
using Microsoft.Data.Sqlite;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;

#region sql

/*
CREATE TABLE [Company] (
    [Id] INT NOT NULL IDENTITY(1, 1) CONSTRAINT [PK_Company_Id] PRIMARY KEY CLUSTERED ([Id] ASC),
    [Name] NVARCHAR(250) NOT NULL CONSTRAINT [UIX_Company_Name] UNIQUE NONCLUSTERED ([Name] ASC)
)

CREATE TABLE [Employee] (
    [Id] INT NOT NULL IDENTITY(1, 1) CONSTRAINT [PK_Employee_Id] PRIMARY KEY CLUSTERED ([Id] ASC),
    [CompanyId] INT NOT NULL CONSTRAINT [FK_Employee_CompanyId_Company_Id] FOREIGN KEY ([CompanyId]) REFERENCES [Company] ([Id])
)

-- the employees can switch the company so it's important to have the company id in the turnover table
CREATE TABLE [Turnover] (
    [Id] INT NOT NULL IDENTITY(1, 1) CONSTRAINT [PK_Turnover_Id] PRIMARY KEY CLUSTERED ([Id] ASC),
    [EmployeeId] INT NOT NULL CONSTRAINT [FK_Turnover_EmployeeId_Employee_Id] FOREIGN KEY ([EmployeeId]) REFERENCES [Employee] ([Id]),
    [CompanyId] INT NOT NULL CONSTRAINT [FK_Turnover_CompanyId_Company_Id] FOREIGN KEY ([CompanyId]) REFERENCES [Company] ([Id]),
    [Date] DATE NOT NULL,
    [Amount] MONEY NOT NULL
)

-- the employees can switch the company so it's important to have the company id in the document table
CREATE TABLE [Document] (
    [Id] INT NOT NULL IDENTITY(1, 1) CONSTRAINT [PK_Document_Id] PRIMARY KEY CLUSTERED ([Id] ASC),
    [EmployeeId] INT NOT NULL CONSTRAINT [FK_Document_EmployeeId_Employee_Id] FOREIGN KEY ([EmployeeId]) REFERENCES [Employee] ([Id]),
    [CompanyId] INT NOT NULL CONSTRAINT [FK_Document_CompanyId_Company_Id] FOREIGN KEY ([CompanyId]) REFERENCES [Company] ([Id]),
    [TotalAmount] MONEY NOT NULL,
    [CreatedAt] DATETIMEOFFSET NOT NULL CONSTRAINT [DF_Document_CreatedAt_SYSDATETIMEOFFSET_UTC] DEFAULT (SYSDATETIMEOFFSET() AT TIME ZONE 'UTC')
)

-- reference on turnover + snapshot
CREATE TABLE [LineItem] (
    [Id] INT NOT NULL IDENTITY(1, 1) CONSTRAINT [PK_LineItem_Id] PRIMARY KEY CLUSTERED ([Id] ASC),
    [DocumentId] INT NOT NULL CONSTRAINT [FK_LineItem_DocumentId_Document_Id] FOREIGN KEY ([DocumentId]) REFERENCES [Document] ([Id]),
    [TurnoverId] INT NOT NULL CONSTRAINT [FK_LineItem_TurnoverId_Turnovere_Id] FOREIGN KEY ([TurnoverId]) REFERENCES [Turnover] ([Id]),
    [Date] DATE NOT NULL,
    [Amount] MONEY NOT NULL
)
*/

#endregion

#region scaffold autogenerated

// Scaffold-DbContext -Connection "" -Provider "Microsoft.EntityFrameworkCore.SqlServer" -Context "MyDbContext" -Tables "Company", "Employee", "Turnover", "Document", "LineItem" -DataAnnotations -NoOnConfiguring -Force

[Table("Company")]
[Index("Name", Name = "UIX_Company_Name", IsUnique = true)]
public partial class Company
{
    [Key]
    public int Id { get; set; }

    [StringLength(250)]
    public string Name { get; set; } = null!;

    [InverseProperty("Company")]
    public virtual ICollection<Document> Documents { get; set; } = new List<Document>();

    [InverseProperty("Company")]
    public virtual ICollection<Employee> Employees { get; set; } = new List<Employee>();

    [InverseProperty("Company")]
    public virtual ICollection<Turnover> Turnovers { get; set; } = new List<Turnover>();
}

[Table("Employee")]
public partial class Employee
{
    [Key]
    public int Id { get; set; }

    public int CompanyId { get; set; }

    [ForeignKey("CompanyId")]
    [InverseProperty("Employees")]
    public virtual Company Company { get; set; } = null!;

    [InverseProperty("Employee")]
    public virtual ICollection<Document> Documents { get; set; } = new List<Document>();

    [InverseProperty("Employee")]
    public virtual ICollection<Turnover> Turnovers { get; set; } = new List<Turnover>();
}

[Table("Turnover")]
public partial class Turnover
{
    [Key]
    public int Id { get; set; }

    public int EmployeeId { get; set; }

    public int CompanyId { get; set; }

    public DateOnly Date { get; set; }

    [Column(TypeName = "money")]
    public decimal Amount { get; set; }

    [ForeignKey("CompanyId")]
    [InverseProperty("Turnovers")]
    public virtual Company Company { get; set; } = null!;

    [ForeignKey("EmployeeId")]
    [InverseProperty("Turnovers")]
    public virtual Employee Employee { get; set; } = null!;

    [InverseProperty("Turnover")]
    public virtual ICollection<LineItem> LineItems { get; set; } = new List<LineItem>();
}

[Table("Document")]
public partial class Document
{
    [Key]
    public int Id { get; set; }

    public int EmployeeId { get; set; }

    public int CompanyId { get; set; }

    [Column(TypeName = "money")]
    public decimal TotalAmount { get; set; }

    public DateTimeOffset CreatedAt { get; set; }

    [ForeignKey("CompanyId")]
    [InverseProperty("Documents")]
    public virtual Company Company { get; set; } = null!;

    [ForeignKey("EmployeeId")]
    [InverseProperty("Documents")]
    public virtual Employee Employee { get; set; } = null!;

    [InverseProperty("Document")]
    public virtual ICollection<LineItem> LineItems { get; set; } = new List<LineItem>();
}

[Table("LineItem")]
public partial class LineItem
{
    [Key]
    public int Id { get; set; }

    public int DocumentId { get; set; }

    public int TurnoverId { get; set; }

    public DateOnly Date { get; set; }

    [Column(TypeName = "money")]
    public decimal Amount { get; set; }

    [ForeignKey("DocumentId")]
    [InverseProperty("LineItems")]
    public virtual Document Document { get; set; } = null!;

    [ForeignKey("TurnoverId")]
    [InverseProperty("LineItems")]
    public virtual Turnover Turnover { get; set; } = null!;
}

public partial class MyDbContext : DbContext
{
    public MyDbContext(DbContextOptions<MyDbContext> options)
        : base(options)
    {
    }

    public virtual DbSet<Company> Companies { get; set; }

    public virtual DbSet<Document> Documents { get; set; }

    public virtual DbSet<Employee> Employees { get; set; }

    public virtual DbSet<LineItem> LineItems { get; set; }

    public virtual DbSet<Turnover> Turnovers { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Company>(entity =>
        {
            entity.HasKey(e => e.Id).HasName("PK_Company_Id");
        });

        modelBuilder.Entity<Document>(entity =>
        {
            entity.HasKey(e => e.Id).HasName("PK_Document_Id");

            entity.Property(e => e.CreatedAt).HasDefaultValueSql("((sysdatetimeoffset() AT TIME ZONE 'UTC'))");

            entity.HasOne(d => d.Company).WithMany(p => p.Documents)
                .OnDelete(DeleteBehavior.ClientSetNull)
                .HasConstraintName("FK_Document_CompanyId_Company_Id");

            entity.HasOne(d => d.Employee).WithMany(p => p.Documents)
                .OnDelete(DeleteBehavior.ClientSetNull)
                .HasConstraintName("FK_Document_EmployeeId_Employee_Id");
        });

        modelBuilder.Entity<Employee>(entity =>
        {
            entity.HasKey(e => e.Id).HasName("PK_Employee_Id");

            entity.HasOne(d => d.Company).WithMany(p => p.Employees)
                .OnDelete(DeleteBehavior.ClientSetNull)
                .HasConstraintName("FK_Employee_CompanyId_Company_Id");
        });

        modelBuilder.Entity<LineItem>(entity =>
        {
            entity.HasKey(e => e.Id).HasName("PK_LineItem_Id");

            entity.HasOne(d => d.Document).WithMany(p => p.LineItems)
                .OnDelete(DeleteBehavior.ClientSetNull)
                .HasConstraintName("FK_LineItem_DocumentId_Document_Id");

            entity.HasOne(d => d.Turnover).WithMany(p => p.LineItems)
                .OnDelete(DeleteBehavior.ClientSetNull)
                .HasConstraintName("FK_LineItem_TurnoverId_Turnovere_Id");
        });

        modelBuilder.Entity<Turnover>(entity =>
        {
            entity.HasKey(e => e.Id).HasName("PK_Turnover_Id");

            entity.HasOne(d => d.Company).WithMany(p => p.Turnovers)
                .OnDelete(DeleteBehavior.ClientSetNull)
                .HasConstraintName("FK_Turnover_CompanyId_Company_Id");

            entity.HasOne(d => d.Employee).WithMany(p => p.Turnovers)
                .OnDelete(DeleteBehavior.ClientSetNull)
                .HasConstraintName("FK_Turnover_EmployeeId_Employee_Id");
        });

        OnModelCreatingPartial(modelBuilder);
    }

    partial void OnModelCreatingPartial(ModelBuilder modelBuilder);
}

#endregion

#region partial extensions

public partial class MyDbContext : DbContext
{
    partial void OnModelCreatingPartial(ModelBuilder modelBuilder)
    {
        if (Database.IsSqlite())
        {
            modelBuilder.Entity<Document>()
                .Property(e => e.CreatedAt)
                // SQLite (use for unit tests) does not support (sysdatetimeoffset() AT TIME ZONE 'UTC')
                .HasDefaultValueSql("CURRENT_TIMESTAMP");
        }
    }
}

#endregion

public class BulkInsertTests
{
    private readonly MyDbContext _context;

    public BulkInsertTests()
    {
        var sqlite = true; // for "Sqlite = false;" it should be ensured that the tables defined in the "SQL" region (see above) exist

        var services = new ServiceCollection()
            .AddDbContext<MyDbContext>(options =>
            {
                options
                    .EnableDetailedErrors()
                    .EnableSensitiveDataLogging();

                if (sqlite)
                {
                    options.UseSqlite(new SqliteConnection("DataSource=:memory:"));
                }
                else
                {
                    options.UseSqlServer("");
                }
            });

        var provider = services.BuildServiceProvider();

        _context = provider.GetRequiredService<MyDbContext>();

        _context.Database.OpenConnection();
        _context.Database.EnsureCreated();
    }

    [Fact(Skip = "https://github.com/borisdj/EFCore.BulkExtensions/issues/1622")]
    public async Task BulkSaveChanges_Insert_Defaults()
    {
        #region seeding

        // 1 company with 5 employee

        var company = new Faker<Company>()
            .RuleFor(x => x.Name, f => f.Company.CompanyName().ClampLength(max: 250))
            .RuleFor(x => x.Employees, new Faker<Employee>().Generate(100))
            .Generate();

        await _context.Companies.AddAsync(company);

        await _context.SaveChangesAsync();

        #endregion

        // 1 doc for each employee
        var docs = company.Employees
            .Select(employee =>
            {
                return new Document
                {
                    EmployeeId = employee.Id,
                    CompanyId = company.Id,
                    TotalAmount = 0,
                    //CreatedAt = DateTimeOffset.UtcNow,
                };
            })
            .ToList();

        await _context.Documents.AddRangeAsync(docs);

        BulkConfig bulkConfig = new() { SetOutputIdentity = true };

        await _context.BulkSaveChangesAsync(bulkConfig);

        var actual = await _context.Documents.ToListAsync();

        actual.ForEach(x => x.CreatedAt.Should().NotBe(default));
    }

    [Fact(Skip = "sqlite = false: error with assignment between Document and LineItem (see 'error:'-comments)")]
    public async Task BulkSaveChanges_Insert_ParentWithChilds()
    {
        // error: it seems that the error occurs because everythere we have the employeeId we also have the companyId because the employee can switch the company

        #region seeding

        // <companiesCount> companies with <employeesCount> employees each with <turnoversCount> turnovers each

        // error: we need at least 2 companies with 23 employees each with 1 turnover each to cause the error, if we use less the error doesn't occur

        const int companiesCount = 2;
        const int employeesCount = 23;
        const int turnoversCount = 1;

        var companies = new Faker<Company>()
            .RuleFor(x => x.Name, f => f.Company.CompanyName().ClampLength(max: 250))
            .Generate(companiesCount);

        companies.ForEach(company =>
        {
            var employees = new Faker<Employee>()
                .RuleFor(x => x.Company, company)
                .Generate(employeesCount);

            employees.ForEach(employee =>
            {
                var turnovers = new Faker<Turnover>()
                    .RuleFor(x => x.Employee, employee)
                    .RuleFor(x => x.Company, company)
                    .RuleFor(x => x.Date, f => f.Date.PastDateOnly())
                    .RuleFor(x => x.Amount, f => f.Finance.Amount())
                    .Generate(turnoversCount);

                employee.Turnovers = turnovers;
            });

            company.Employees = employees;
        });

        await _context.Companies.AddRangeAsync(companies);

        await _context.SaveChangesAsync();

        #endregion

        // 1 doc for each employee with 1 line item for each turnover

        List<Document> docs = [];

        foreach (var company in companies)
        {
            foreach (var employee in company.Employees)
            {
                var lineItems = employee.Turnovers
                    .Select(turnover =>
                    {
                        return new LineItem
                        {
                            TurnoverId = turnover.Id,
                            Date = turnover.Date,
                            Amount = turnover.Amount,
                        };
                    })
                    .ToList();

                Document doc = new()
                {
                    EmployeeId = employee.Id,
                    CompanyId = company.Id,
                    TotalAmount = lineItems.Sum(x => x.Amount),
                    LineItems = lineItems,
                };

                docs.Add(doc);
            }
        }

        await _context.Documents.AddRangeAsync(docs);

        BulkConfig bulkConfig = new() { SetOutputIdentity = true };

        await _context.BulkSaveChangesAsync(bulkConfig);

        var actual = await _context.Documents
            .Include(x => x.LineItems)
            .ToListAsync();

        actual.ForEach(doc => doc.TotalAmount.Should().Be(doc.LineItems.Sum(x => x.Amount)));

        #region error check sql

        /*
        SELECT CASE
                WHEN Document.Id IS NULL OR LineItems.DocumentId IS NULL THEN 0
                ELSE 1 END AS SuccessfulJoin,
            CASE
                WHEN Document.CompanyId != LineItems.CompanyId THEN 0
                ELSE 1 END AS SameCompany,
            CASE
                WHEN Document.EmployeeId != LineItems.EmployeeId THEN 0
                ELSE 1 END AS SameEmployee,
            CASE
                WHEN Document.Id != LineItems.DocumentId THEN 0
                ELSE 1 END AS SameFile,
            CASE
                WHEN Document.TotalAmount != LineItems.TotalAmount THEN 0
                ELSE 1 END AS SameAmount,
            *
        FROM (
            SELECT Id, EmployeeId, CompanyId, TotalAmount
            FROM Document
        ) AS Document
            FULL OUTER JOIN (
            SELECT DocumentId, EmployeeId, CompanyId, SUM(t.Amount) AS TotalAmount
            FROM LineItem li
                INNER JOIN Turnover t
                ON t.Id = li.TurnoverId
            GROUP BY DocumentId, EmployeeId, CompanyId
        ) AS LineItems
            -- ON LineItems.DocumentId = Document.Id
            -- ON LineItems.EmployeeId = Document.EmployeeId
            -- ON LineItems.CompanyId = Document.CompanyId
            ON LineItems.TotalAmount = Document.TotalAmount
        ORDER BY 1, 2, 3, 4, 5
        */

        #endregion
    }
}
@pajacz
Copy link

pajacz commented Jan 12, 2025

Yep, the BulkSaveChanges works only on simple case without related entities... Just wondering why BulkExtensions can't reuse the dependency tree builder from EF as it works just fine. :(

@Zinoberous
Copy link
Author

Zinoberous commented Jan 27, 2025

We removes the duplicated CompanyId from our tables but the error still occurs.
We also tried to do our inserts with BulkInsert without IncludeGraph by insert grouped by entity and set the parentId manually, but the error still occurs.

using var transaction = await _context.Database.BeginTransactionAsync();

try
{
   BulkConfig bulkConfig = new()
   {
      BatchSize = 100_000,
      SetOutputIdentity = true,
   };

   // documents is list of Document and each Document has a list of LineItem
   await context.BulkInsertAsync(documents, bulkConfig); 

   var lineItems = documents.SelectMay(doc =>
   {
      foreach (var li in doc.LineItems)
      {
         li.DocumentId = doc.Id
      }
      return doc.LineItems;
   });
   await context.BulkInsert(lineItems, bulkConfig);

   await transaction.CommitTransactionAsync();
}
catch
{
   await transaction.RollbackTransactionAsync();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants