Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not track after SaveChanges() #9118

Open
Tracked by #28479 ...
smitpatel opened this issue Jul 8, 2017 · 30 comments
Open
Tracked by #28479 ...

Do not track after SaveChanges() #9118

smitpatel opened this issue Jul 8, 2017 · 30 comments
Assignees
Labels
area-change-tracking area-perf area-save-changes punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-enhancement
Milestone

Comments

@smitpatel
Copy link
Contributor

Based on https://stackoverflow.com/questions/44923147/how-to-avoid-inserted-entity-to-be-selected-after-insert-in-ef-core

Whenever we have generated columns (On insert or update), we do Select after insert/update to retrieve generated values to populate the entity in memory. In some cases, user may want just save the data and not care about updating them in memory any further (because not using them later).

At present there is no mechanism for this functionality.

@Giancarlos
Copy link

Try AsNoTracking:

https://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx

@ajcvickers
Copy link
Contributor

@Giancarlos This issue is not related to AsNoTracking--that will tell EF not to track entities returned from the query. This issue is about SaveChanges ignoring store-generated values for already tracked entities.

@AndriySvyryd
Copy link
Member

AndriySvyryd commented Dec 6, 2017

To the implementer:

  • Disable fixup when detaching to avoid messing up the FKs and navigations
  • Once implemented this should be used for data migrations.

@stap123
Copy link

stap123 commented Mar 15, 2018

It might be nice to be able to pass an optional "get me the entity" flag on a save changes call. That way when you need it add it, when you don't don't.

@leus
Copy link

leus commented Jul 4, 2019

Are there plans for this to be implemented? I'm using Autofac + EF Core in a console application and this is really biting me.

@divega
Copy link
Contributor

divega commented Jul 4, 2019

@leus this is our backlog milestone, meaning that we don't have specific plans to implement it. Could you elaborate on how this is biting you?

@leus
Copy link

leus commented Jul 5, 2019

Sure - I have a .NET Core console application that uses Autofac to create a big list of Tasks that receive a DbContext to do stuff. Everything works beautifully except I get a big hit on memory, and most of it is used by entities inserted. In the few places where we do queries we make sure of using AsNoTracking() (but we also use QueryTrackingBehavior.NoTracking when creating, so this is probably overkill).

image

Here the SqlChecklist.Processor.Writers.Sql.Model.TextData object is just a Entity that contains a big string (ntext).

My contexts are created as follows:

var dbOptions = new DbContextOptionsBuilder<ProcessorContext>();
dbOptions.UseSqlServer(connectionString);
dbOptions.UseQueryTrackingBehavior(QueryTrackingBehavior.NoTracking);

@chrisckc
Copy link

chrisckc commented Oct 8, 2019

Regarding this issue and also this one: #11852

As noted in previous comments and as i have already observed, using AsNoTracking queries or turning Tracking off for the whole context only prevents entities being added to the change tracker as a result of Query operations, SaveChanges still maintains inserted/updated entities in the change tracker.

@divega @ajcvickers Suggestion for a new configuration option:

I think that having a SaveChanges option (at context level and method level) which removes newly inserted entities from the change tracker after saving to the database would be useful. As with the original opening comment on this issue, when there are database generated columns such as auto-incrementing primary keys, with an option such as this enabled there would then be no need for EF to perform a select to retrieve the newly generated key value because the entity would effectively be discarded by EF after the save.

I use the description: "option which removes newly inserted entities from the change tracker after saving to the database" rather than: "option which prevents newly inserted entities from being added to the change tracker after saving to the database" because as far as i know the only way to insert an entity using SaveChanges is to first attach it to the change tracker using Context.Add(entity) so the entity will always be in an attached state before SaveChanges is called.

I think an option such as this would be less relevant when database generated columns are not used as there is no select query performed behind the scenes by EF so there would be no performance benefit to be provided by the option. Instead the entity can just be detached from the change tracker by using Entry(entity).State = EntityState.Detached; after calling SaveChanges, achieving the same result. Of course doing this does not prevent the behind the scenes select query from taking place right after the SaveChanges to retrieve any database generated key values that are then never used.

Maybe the option should instead be targeted at the SaveChanges behaviour when database generated columns are used, ie. don't go and get the updated values, just fire and forget, but this will cause issues with the change tracker unless the saved entity is detached? Having a side effect of detaching the entity only when database generated columns are in use is going to be really bad. It sounds much better as an an option to detach entities after saving, which has the performance improving side affect of preventing an unnecessary select query when database generated columns are used.

So rather than just detaching newly inserted entities, it would be better if the new option made SaveChanges detach any entity this it has just persisted to the database, both inserts and updates, similar to what @roji suggested here: #11852 (comment)

So what i am suggesting is an option which changes the behaviour of Save operations so that they result in just fire and forget database writes and then detach the affected entities from the change tracker.

Use Cases:

I think an option such as this would be useful in scenarios where it is not possible or desirable to use BulkCopy tools. For example when bulk importing data into a database where depending on each record being imported, there sometimes needs to be query-then-update operations performed, or lookup-and-populate-related-entities, rather than just plain inserts.

In the above scenario it can be desirable to use EF for importing data, especially if it is already being used in the project and we already have the model to work with, rather than bringing in some additional framework or tools etc. The issue with using EF for inserting/updating large numbers of records is that the change tracker causes serious performance issues due to build up of entities unless the context is regularly disposed and re-created during the import cycle. This is never much of an issue for normal use cases such as Api and Mvc controllers where request scopes and DI make sure that the context is short lived.

To get the most performance benefit in most cases the context will need to be recreated after each record is imported, although the overhead of doing this is tiny compared to the performance benefit provided by starting with a fresh context and empty change tracker, it still seems wasteful. Another issue is that it does not always make sense or look nice to have to structure the import code such that the context is recreated after each record import.

To solve this kind of issue in the past i have used a mixture of context recreation and looping through the change tracker to detach all entities after each save. This is fine when using Guid's as keys which are generated by EF, but if using database generated keys there will still be wasteful select queries happening.

@chrisckc
Copy link

chrisckc commented Oct 8, 2019

Here is a method on the context that i have been using to clear out the change tracker:

public static void DetachAllEntities(this DbContext context) {
            var entries = context.ChangeTracker.Entries()
                .Where(e => e.State != EntityState.Detached)
                .ToList();

            foreach (var entry in entries) {
                if (entry.Entity != null) {
                    entry.State = EntityState.Detached;
                }
            }         
        }

But recently, i have tried using the suggestion here: #11852 (comment)

public void ResetState() {
            ((IDbContextPoolable)this).ResetState();
            // https://stackoverflow.com/questions/46685823/how-to-use-the-executionstrategy-properly       
           
        ((IDbContextPoolable)this).Resurrect(((IDbContextPoolable)this).SnapshotConfiguration());
}

There appears to be no difference in terms of performance that i can measure, both seem to have the same affect.

As an example, clearing out the change tracker after each record import during an import of 220 records from a CSV file resulted in a 25x performance increase. Without clearing out the change tracker after each record, the import ends with 220 entities in the change tracker and takes 25 time longer to complete, so not a huge amount of entity build up but enough to cripple the performance.

In that example, when importing multiple files, i am recreating the context after each file import, but just clearing the change tracker after each record import.

I am sure this is one of the reasons why EF gets such a bad rep in terms of performance, you have to be very careful how it is used, if you are not fully aware of things like this it is easy to just dismiss it as slow and inefficient.

@chrisckc
Copy link

chrisckc commented Oct 8, 2019

a good way of getting a fresh context is to take advantage of the ServiceScope Factory:

foreach (FileInfo file in files) {
    string filePath = file.FullName;
    using (var scope = ServiceScopeFactory.CreateScope()) {
        var myCsvImporter = scope.ServiceProvider.GetRequiredService<MyCsvImporter>();
        var importResult = await myCsvImporter.ImportFromFileAsync(filePath);
        ........

    }
}

Of course MyCsvImporter could also use the same technique to create a fresh context for each record, i am not sure if this will be faster than clearing out the change tracker, but in this case i suspect it would not be measurable due the database writes taking up the vast majority of the time. The main reason not to create a fresh context for each record is due to code structure and readability and the desire to at least make some use of the framework provided DI convenience rather than always grabbing the required services from a ServiceScope.

I also agree with @roji here: #11852 (comment)

@chrisckc
Copy link

chrisckc commented Oct 8, 2019

It looks like the performance benefit of this option will not just depend on the primary key being generated by EF or by the database, but also on the EF Core Provider used.

For example if using Npgsql and a Guid primary key, but with database generated values via the postgres uuid_generate4() function, it looks like there will still be no select query performed after the Save due to the use of the Postgres feature described in this comment:/~https://github.com/npgsql/Npgsql.EntityFrameworkCore.PostgreSQL/issues/81#issuecomment-236642569
Npgsql is using the Postgres "RETURNING" statement to avoid a second query to get the newly generated Guid.

@roji I am not sure how this is handled when there are other or multiple database generated values in a table?

@roji
Copy link
Member

roji commented Oct 9, 2019

@chrisckc you're right that Npgsql has a somewhat lighter mechanism for retrieving database-generated values after insert. However, even for Npgsql, if we somehow know that we don't need to bring back database-generated values, we can omit the RETURNING clause and save some network bandwidth (for the values coming back), so this issue is still relevant.

Having multiple database-generated values doesn't change anything here - they should simply all appear in the RETURNING clause.

@timker
Copy link

timker commented Oct 13, 2020

my use case is only having INSERT not having SELECT permission (Audit Tables)
In this case I just want to fire an forget

@ErikEJ
Copy link
Contributor

ErikEJ commented Oct 13, 2020

@timker why not use ExecuteSqlRaw then?

@timker
Copy link

timker commented Oct 13, 2020

that would work, (or create a stored proc)

(and I'm happy to do either) but I would prefer to use ef types.

@asoursos
Copy link

This enforce you to write sql when you write backward compatible code.
For example If I add a column and host has being updated but db hasnt, if this select was optional, the below code would work

DbContext.Entry(entity).Property(x => x.Firstname).IsModified = false;
DbContext.SaveChanges();

IMO this select should be optional.

@asoursos
Copy link

If anyone has the same backward compatibility issue, there is a better sln from raw sql. If you have the entity models classes isolated, you could copy these into a new namespace (**.Backward) and have a factory that can decide which dbcontext to choose.

@roji
Copy link
Member

roji commented Feb 20, 2022

Note: benchmarking in #27372 (comment) shows that the fetching back generated values when inserting is mostly negligible perf-wise, at least on SQL Server. As such, the advantage of doing this would mainly be to unlock high-perf techniques which don't supported fetching back values, e.g. SqlBulkCopy.

@roji
Copy link
Member

roji commented Feb 20, 2022

Design discussion:

  • We'll introduce a flag on ChangeTracker, which stops us from fetching back generated values, and which causes the change tracker to automatically be cleared after SaveChanges completes. There's no need to add a DbContextOption to change the default, since the ChangeTracker flag can be set in the context constructor (like other flags we already have).
  • Note that this isn't only about clearing the change tracker; it's also about not propagating generated values to entity instances which are already in memory.
  • When the flag is enabled, calling SaveChanges(true) should probably throw.
  • We considered disposing the context instead of clearing its change tracker, or otherwise setting some state which causes any operation to fail on the context after SaveChanges. The advantage would be to force users to instantiate a new context after SaveChanges, with fail-fast throwing in case the user gets it wrong. However, this is incompatible with context pooling, and some users consider creation of a new context too verbose (that's why we introduced ChangeTracker.Clear).

@ajcvickers
Copy link
Contributor

Notes from triage:

  • Perf benefit seems minimal based on recent testing--if you are seeing otherwise, then please post details.
  • Clearing the change tracker after saving changes can be achieved by calling ChangeTracker.Clear after calling SaveChanges.
  • There is overlap here with bulk insert (Bulk import for efficient importing of data from the client into the database #27333) which would likely be a better solution in many cases since it would avoid all tracking

For these reasons we are punting this for EF7, but still intend to implement it in the future.

@ajcvickers ajcvickers modified the milestones: 7.0.0, Backlog Apr 23, 2022
@ajcvickers ajcvickers added punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. and removed propose-punt labels Apr 23, 2022
@stevendarby
Copy link
Contributor

Re: perf benefit, it's hard to benchmark it when it hasn't been implemented, but I took the INSERT SQL generated in 6.0, then modified it to what SQL might look like if optimised not to return generated values, and benchmarked both. Not returning values was 2.5x faster.

Method Mean Error StdDev
BulkWithSelect 2,285.9 us 47.60 us 127.88 us
BulkWithoutSelect 829.4 us 27.91 us 78.71 us

I don't think it's too surprising that it's faster to not return the generated values, and is probably why you're still pursuing other methods for Bulk insertion. Perhaps I got the wrong end of the stick re: the first point about perf benefit being minimal?

Code

It uses a simple table with a composite key, an Id column that's DB-generated and a DateTime that has a default. I inserted a bunch of data before hand as I imagine the amount of data effects how it quickly it reads back generated values. It only benchmarks the SQL and not also reading the values back, setting them on entities etc. which would add more time.

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.EntityFrameworkCore;
using System;
using System.Diagnostics;
using System.Linq;

{
    var summary = BenchmarkRunner.Run<InsertSpeed>();
}

public class InsertSpeed
{
    private MyContext _context;

    [GlobalSetup]
    public void Setup()
    {
        var random = new Random();

        using (var context = new MyContext())
        {
            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();

            var people = Enumerable.Range(0, 500000)
                .Select(x => new Person { TenantId = random.Next(1, 6) })
                .ToList();

            context.AddRange(people);
            context.SaveChanges();
        }

        _context = new MyContext();
    }

    [Benchmark]
    public void BulkWithSelect()
    {
        _context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
DECLARE @inserted0 TABLE ([Id] int, [TenantId] int, [_Position] [int]);
MERGE [People] USING (
VALUES (3, 0),
(3, 1),
(3, 2),
(3, 3),
(3, 4),
(3, 5),
(3, 6),
(3, 7),
(3, 8),
(3, 9),
(3, 10),
(3, 11),
(3, 12),
(3, 13),
(3, 14),
(3, 15),
(3, 16),
(3, 17),
(3, 18),
(3, 19),
(3, 20),
(3, 21),
(3, 22),
(3, 23),
(3, 24),
(3, 25),
(3, 26),
(3, 27),
(3, 28),
(3, 29),
(3, 30),
(3, 31),
(3, 32),
(3, 33),
(3, 34),
(3, 35),
(3, 36),
(3, 37),
(3, 38),
(3, 39),
(3, 40),
(3, 41)) AS i ([TenantId], _Position) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([TenantId])
VALUES (i.[TenantId])
OUTPUT INSERTED.[Id], INSERTED.[TenantId], i._Position
INTO @inserted0;
      
SELECT [t].[Id], [t].[CreatedDate] FROM [People] t
INNER JOIN @inserted0 i ON ([t].[Id] = [i].[Id]) AND ([t].[TenantId] = [i].[TenantId])
ORDER BY [i].[_Position];");
    }

    [Benchmark]
    public void BulkWithoutSelect()
    {
        _context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
INSERT INTO [People] ([TenantId])
VALUES (3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3);");
    }

    [GlobalCleanup]
    public void Cleanup()
    {
        _context?.Dispose();
    }
}

public class MyContext : DbContext
{
    public DbSet<Person> People { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder.UseSqlServer("Server=.;Database=SaveChanges;Trusted_Connection=True;");

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Person>(e =>
        {
            e.HasKey(x => new { x.TenantId, x.Id });
            e.Property(x => x.Id).ValueGeneratedOnAdd();
            e.Property(x => x.CreatedDate).HasDefaultValueSql("SYSUTCDATETIME()");
        });
    }
}

public class Person
{ 
    public int TenantId { get; set; }
    public int Id { get; set; }
    public DateTime CreatedDate { get; set; }
}

@roji
Copy link
Member

roji commented Apr 24, 2022

@stevendarby you probably want to take a look at #27372, which was implemented and can be tested in 7.0.0-preview.3. In a nutshell, EF Core 7.0 uses a slightly different technique for bulk-inserting which still fetches generated values, but uses the OUTPUT close without a temporary table variable (OUTPUT, not OUTPUT INTO). I've updated your benchmark code to add the new technique used by EF Core 7.0 (see below), here are the results:

Method Mean Error StdDev
Bulk_with_Output_Into 6.126 ms 0.1180 ms 0.1405 ms
Bulk_with_Output 2.328 ms 0.0460 ms 0.0829 ms
Bulk_without_Output 2.276 ms 0.0668 ms 0.1905 ms
Bulk_with_insert 2.903 ms 0.0656 ms 0.1914 ms

In other words, the slowness in your benchmark comes not from fetching the actual values (sending those two ints+position are quite negligible in the grand scheme of things), but rather from the SQL Server temporary table being used. Now, I'm not saying further gains couldn't be reached by not fetching, but they currently seem quite minimal for most cases. Of course, if the value fetched is a huge string or binary data, the situation would be quite different, but that seems like quite an edge-case scenario that we haven't seen much. Another reason to remove the fetching is that it would allow using techniques which don't allow fetching at all (in theory even SqlBulkCopy could be used, in certain cases); but that goes in a complicated direction and we don't have concrete plans to do that right now. If we do go in this direction, this issue would be part of that.

It would be great if you can run the added benchmarks below and confirm that you see the same results. If you have any other ideas or input, of course that would be welcome!

I don't think it's too surprising that it's faster to not return the generated values [...]

I think this also shows how in perf, nothing is straightforward and one should avoid assumptions... Contrary to expectations (including my own initial ones!), the fetching of generated values isn't a significant factor here.

PS Any specific reason you're seeding the database with 500000 rows in this benchmark?

Updated benchmark code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.EntityFrameworkCore;
using System;
using System.Linq;

BenchmarkRunner.Run<InsertSpeed>();

public class InsertSpeed
{
    private MyContext _context;

    [GlobalSetup]
    public void Setup()
    {
        var random = new Random();
        
        using (var context = new MyContext())
        {
            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();
        
            // var people = Enumerable.Range(0, 500000)
            //     .Select(x => new Person { TenantId = random.Next(1, 6) })
            //     .ToList();
            //
            // context.AddRange(people);
            // context.SaveChanges();
        }

        _context = new MyContext();
    }

    [Benchmark]
    public void Bulk_with_Output_Into()
    {
        _context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
DECLARE @inserted0 TABLE ([Id] int, [TenantId] int, [_Position] [int]);
MERGE [People] USING (
VALUES (3, 0),
(3, 1),
(3, 2),
(3, 3),
(3, 4),
(3, 5),
(3, 6),
(3, 7),
(3, 8),
(3, 9),
(3, 10),
(3, 11),
(3, 12),
(3, 13),
(3, 14),
(3, 15),
(3, 16),
(3, 17),
(3, 18),
(3, 19),
(3, 20),
(3, 21),
(3, 22),
(3, 23),
(3, 24),
(3, 25),
(3, 26),
(3, 27),
(3, 28),
(3, 29),
(3, 30),
(3, 31),
(3, 32),
(3, 33),
(3, 34),
(3, 35),
(3, 36),
(3, 37),
(3, 38),
(3, 39),
(3, 40),
(3, 41)) AS i ([TenantId], _Position) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([TenantId])
VALUES (i.[TenantId])
OUTPUT INSERTED.[Id], INSERTED.[TenantId], i._Position
INTO @inserted0;
      
SELECT [t].[Id], [t].[CreatedDate] FROM [People] t
INNER JOIN @inserted0 i ON ([t].[Id] = [i].[Id]) AND ([t].[TenantId] = [i].[TenantId])
ORDER BY [i].[_Position];");
    }

    [Benchmark]
    public void Bulk_with_Output()
    {
        _context.Database.ExecuteSqlRaw(
            @"SET NOCOUNT ON;
MERGE [People] USING (
VALUES (3, 0),
(3, 1),
(3, 2),
(3, 3),
(3, 4),
(3, 5),
(3, 6),
(3, 7),
(3, 8),
(3, 9),
(3, 10),
(3, 11),
(3, 12),
(3, 13),
(3, 14),
(3, 15),
(3, 16),
(3, 17),
(3, 18),
(3, 19),
(3, 20),
(3, 21),
(3, 22),
(3, 23),
(3, 24),
(3, 25),
(3, 26),
(3, 27),
(3, 28),
(3, 29),
(3, 30),
(3, 31),
(3, 32),
(3, 33),
(3, 34),
(3, 35),
(3, 36),
(3, 37),
(3, 38),
(3, 39),
(3, 40),
(3, 41)) AS i ([TenantId], _Position) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([TenantId])
VALUES (i.[TenantId])
OUTPUT INSERTED.[Id], INSERTED.[TenantId], i._Position;");
    }

    [Benchmark]
    public void Bulk_without_Output()
    {
        _context.Database.ExecuteSqlRaw(
            @"SET NOCOUNT ON;
MERGE [People] USING (
VALUES (3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3)) AS i ([TenantId]) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([TenantId])
VALUES (i.[TenantId]);");
    }

    [Benchmark]
    public void Bulk_with_insert()
    {
        _context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
INSERT INTO [People] ([TenantId])
VALUES (3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3);");
    }

    [GlobalCleanup]
    public void Cleanup()
    {
        _context?.Dispose();
    }
}

public class MyContext : DbContext
{
    public DbSet<Person> People { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder.UseSqlServer("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Trust Server Certificate=true");

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Person>(e =>
        {
            e.HasKey(x => new { x.TenantId, x.Id });
            e.Property(x => x.Id).ValueGeneratedOnAdd();
            e.Property(x => x.CreatedDate).HasDefaultValueSql("SYSUTCDATETIME()");
        });
    }
}

public class Person
{ 
    public int TenantId { get; set; }
    public int Id { get; set; }
    public DateTime CreatedDate { get; set; }
}

@stevendarby
Copy link
Contributor

stevendarby commented Apr 24, 2022

@roji that looks really promising, and I’ll take a closer look soon for my own interest.

Re: the 500000 rows, the 6.0 SQL returns the generated Id and CreatedDate values by selecting from the People table, so I figured populating it with a healthy amount of data could help demo the impact of traversing the the index and reading from the table etc.
Hopefully reading from the outputted ‘inserted’ values in 7.0 means the amount of existing data has very little impact, which is great!

@roji
Copy link
Member

roji commented Apr 24, 2022

Re: the 500000 rows, the 6.0 SQL returns the generated Id and CreatedDate values by selecting from the People table, so I figured populating it with a healthy amount of data could help demo the impact of traversing the the index and reading from the table etc.

@stevendarby I see - though the lookup is by the Id (primary key), which uses an index and therefore shouldn't be affected too much by existing rows (though there indeed should be some impact).

Note that the new 7.0 SQL is the default, but does not work where there are triggers; EF 7.0 now allows you to declare that there's a trigger on a table, in which case we revert back to the 6.0 (inefficient) SQL.

@stevendarby
Copy link
Contributor

@roji FWIW I had another look at this with your updated benchmarks, plus a few tweaks of my own to check out a few scenarios, and I'm really satisfied that the 7.0 SQL you highlighted already has the performance improvements I previously thought could only come from implementing the changes proposed for this issue. I do see a slight gain in not returning the values at all, but it's nothing like the improvement from using a direct output instead of output into. Thank you. Not using triggers in my DB!

@roji
Copy link
Member

roji commented Apr 25, 2022

Thanks for confirming @stevendarby! Yeah, we do intend to implement this issue at some point, but priority-wise it seems much less important than it previously did.

@DaleMckeown
Copy link

+1, this would be a good option to have.

In EF Core 6, I've just come across a situation where being able to opt-out of returning a key would be beneficial.

Ef Core was generating the below SQL to insert an object into a request logs table. When under high load, the table was getting locked by the SELECT statement, meaning all requests were timing out and failing.

(@p0 varchar(255),@p1 bigint,@p2 varchar(255),@p3 varchar(2000),@p4 datetime,@p5 int,@p6 int)SET NOCOUNT ON;
INSERT INTO [request_logs] ([action], [application_row_id], [controller], [path], [request_timestamp], [response_time_ms], [status_code])
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6);
SELECT [row_id]
FROM [request_logs]
WHERE @@ROWCOUNT = 1 AND [row_id] = scope_identity();

In my specific case I don't care about EF core checking the objec, or selecting the primary key after insert. I just need to dump the data into the table and move on.

I ended up having to write a custom method on my DbContext to insert the record in:

public async Task<int> AddRequestLog(RequestLog log)
{
    return await Database.ExecuteSqlInterpolatedAsync(@$"insert into request_logs (controller, action, path, status_code, response_time_ms, application_row_id, request_timestamp)
values({log.Controller}, {log.Action}, {log.Path}, {log.StatusCode}, {log.ResponseTimeMilliseconds}, {log.ConfigurationRowId}, {log.RequestTimestamp} )");
}

But would prefer the option to do something like the below, which would prevent the round trip and generate the SQL accordingly.

DbContext.RequestLogs.Add(log, ignoreConcurrency = true);
DbContext.SaveChanges();

- Generated SQL:
(@p0 varchar(255),@p1 bigint,@p2 varchar(255),@p3 varchar(2000),@p4 datetime,@p5 int,@p6 int)SET NOCOUNT ON;
INSERT INTO [request_logs] ([action], [application_row_id], [controller], [path], [request_timestamp], [response_time_ms], [status_code])
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6);

I understand the SELECT has been replaced with an OUTPUT in EF Core 7, but the option to opt-out entirely would still be good to have.

@sunnamed434
Copy link

sunnamed434 commented Apr 6, 2023

Try AsNoTracking:

https://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx

EDIT: It was probably not a solution. It was a problem on the migration side or an invalid way of db creation (which was on my side not on ef core). I moved to dbup to apply the migration and migration creation via dotnet CLI + migration script and it solved most of the problems.

It really helped me a lot, idk how, but it is, thank you @Giancarlos

Using Ef Core v3.1.32

@GertArnold
Copy link

Real-world case where users only have INSERT permissions: How to prevent primary key reading back after DbContext.Add().

An OUTPUT clause requires SELECT permissions. So this is another case where not reading any values back after insert is desired.

@idris-yakub
Copy link

Real-world case where users only have INSERT permissions: How to prevent primary key reading back after DbContext.Add().

An OUTPUT clause requires SELECT permissions. So this is another case where not reading any values back after insert is desired.

Dealing with this at the moment. Is there a solution that does not involve running raw sql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-change-tracking area-perf area-save-changes punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-enhancement
Projects
None yet
Development

No branches or pull requests