-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not track after SaveChanges() #9118
Comments
Try AsNoTracking: https://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx |
@Giancarlos This issue is not related to AsNoTracking--that will tell EF not to track entities returned from the query. This issue is about SaveChanges ignoring store-generated values for already tracked entities. |
To the implementer:
|
It might be nice to be able to pass an optional "get me the entity" flag on a save changes call. That way when you need it add it, when you don't don't. |
Are there plans for this to be implemented? I'm using Autofac + EF Core in a console application and this is really biting me. |
@leus this is our backlog milestone, meaning that we don't have specific plans to implement it. Could you elaborate on how this is biting you? |
Sure - I have a .NET Core console application that uses Autofac to create a big list of Here the My contexts are created as follows:
|
Regarding this issue and also this one: #11852 As noted in previous comments and as i have already observed, using AsNoTracking queries or turning Tracking off for the whole context only prevents entities being added to the change tracker as a result of Query operations, SaveChanges still maintains inserted/updated entities in the change tracker. @divega @ajcvickers Suggestion for a new configuration option: I think that having a SaveChanges option (at context level and method level) which removes newly inserted entities from the change tracker after saving to the database would be useful. As with the original opening comment on this issue, when there are database generated columns such as auto-incrementing primary keys, with an option such as this enabled there would then be no need for EF to perform a select to retrieve the newly generated key value because the entity would effectively be discarded by EF after the save. I use the description: "option which removes newly inserted entities from the change tracker after saving to the database" rather than: "option which prevents newly inserted entities from being added to the change tracker after saving to the database" because as far as i know the only way to insert an entity using SaveChanges is to first attach it to the change tracker using I think an option such as this would be less relevant when database generated columns are not used as there is no select query performed behind the scenes by EF so there would be no performance benefit to be provided by the option. Instead the entity can just be detached from the change tracker by using Maybe the option should instead be targeted at the SaveChanges behaviour when database generated columns are used, ie. don't go and get the updated values, just fire and forget, but this will cause issues with the change tracker unless the saved entity is detached? Having a side effect of detaching the entity only when database generated columns are in use is going to be really bad. It sounds much better as an an option to detach entities after saving, which has the performance improving side affect of preventing an unnecessary select query when database generated columns are used. So rather than just detaching newly inserted entities, it would be better if the new option made SaveChanges detach any entity this it has just persisted to the database, both inserts and updates, similar to what @roji suggested here: #11852 (comment) So what i am suggesting is an option which changes the behaviour of Save operations so that they result in just fire and forget database writes and then detach the affected entities from the change tracker. Use Cases: I think an option such as this would be useful in scenarios where it is not possible or desirable to use BulkCopy tools. For example when bulk importing data into a database where depending on each record being imported, there sometimes needs to be query-then-update operations performed, or lookup-and-populate-related-entities, rather than just plain inserts. In the above scenario it can be desirable to use EF for importing data, especially if it is already being used in the project and we already have the model to work with, rather than bringing in some additional framework or tools etc. The issue with using EF for inserting/updating large numbers of records is that the change tracker causes serious performance issues due to build up of entities unless the context is regularly disposed and re-created during the import cycle. This is never much of an issue for normal use cases such as Api and Mvc controllers where request scopes and DI make sure that the context is short lived. To get the most performance benefit in most cases the context will need to be recreated after each record is imported, although the overhead of doing this is tiny compared to the performance benefit provided by starting with a fresh context and empty change tracker, it still seems wasteful. Another issue is that it does not always make sense or look nice to have to structure the import code such that the context is recreated after each record import. To solve this kind of issue in the past i have used a mixture of context recreation and looping through the change tracker to detach all entities after each save. This is fine when using Guid's as keys which are generated by EF, but if using database generated keys there will still be wasteful select queries happening. |
Here is a method on the context that i have been using to clear out the change tracker:
But recently, i have tried using the suggestion here: #11852 (comment)
There appears to be no difference in terms of performance that i can measure, both seem to have the same affect. As an example, clearing out the change tracker after each record import during an import of 220 records from a CSV file resulted in a 25x performance increase. Without clearing out the change tracker after each record, the import ends with 220 entities in the change tracker and takes 25 time longer to complete, so not a huge amount of entity build up but enough to cripple the performance. In that example, when importing multiple files, i am recreating the context after each file import, but just clearing the change tracker after each record import. I am sure this is one of the reasons why EF gets such a bad rep in terms of performance, you have to be very careful how it is used, if you are not fully aware of things like this it is easy to just dismiss it as slow and inefficient. |
a good way of getting a fresh context is to take advantage of the ServiceScope Factory:
Of course MyCsvImporter could also use the same technique to create a fresh context for each record, i am not sure if this will be faster than clearing out the change tracker, but in this case i suspect it would not be measurable due the database writes taking up the vast majority of the time. The main reason not to create a fresh context for each record is due to code structure and readability and the desire to at least make some use of the framework provided DI convenience rather than always grabbing the required services from a ServiceScope. I also agree with @roji here: #11852 (comment) |
It looks like the performance benefit of this option will not just depend on the primary key being generated by EF or by the database, but also on the EF Core Provider used. For example if using Npgsql and a Guid primary key, but with database generated values via the postgres uuid_generate4() function, it looks like there will still be no select query performed after the Save due to the use of the Postgres feature described in this comment: @roji I am not sure how this is handled when there are other or multiple database generated values in a table? |
@chrisckc you're right that Npgsql has a somewhat lighter mechanism for retrieving database-generated values after insert. However, even for Npgsql, if we somehow know that we don't need to bring back database-generated values, we can omit the Having multiple database-generated values doesn't change anything here - they should simply all appear in the |
my use case is only having INSERT not having SELECT permission (Audit Tables) |
@timker why not use ExecuteSqlRaw then? |
that would work, (or create a stored proc) (and I'm happy to do either) but I would prefer to use ef types. |
This enforce you to write sql when you write backward compatible code.
IMO this select should be optional. |
If anyone has the same backward compatibility issue, there is a better sln from raw sql. If you have the entity models classes isolated, you could copy these into a new namespace (**.Backward) and have a factory that can decide which dbcontext to choose. |
Note: benchmarking in #27372 (comment) shows that the fetching back generated values when inserting is mostly negligible perf-wise, at least on SQL Server. As such, the advantage of doing this would mainly be to unlock high-perf techniques which don't supported fetching back values, e.g. SqlBulkCopy. |
Design discussion:
|
Notes from triage:
For these reasons we are punting this for EF7, but still intend to implement it in the future. |
Re: perf benefit, it's hard to benchmark it when it hasn't been implemented, but I took the INSERT SQL generated in 6.0, then modified it to what SQL might look like if optimised not to return generated values, and benchmarked both. Not returning values was 2.5x faster.
I don't think it's too surprising that it's faster to not return the generated values, and is probably why you're still pursuing other methods for Bulk insertion. Perhaps I got the wrong end of the stick re: the first point about perf benefit being minimal? CodeIt uses a simple table with a composite key, an Id column that's DB-generated and a DateTime that has a default. I inserted a bunch of data before hand as I imagine the amount of data effects how it quickly it reads back generated values. It only benchmarks the SQL and not also reading the values back, setting them on entities etc. which would add more time. using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.EntityFrameworkCore;
using System;
using System.Diagnostics;
using System.Linq;
{
var summary = BenchmarkRunner.Run<InsertSpeed>();
}
public class InsertSpeed
{
private MyContext _context;
[GlobalSetup]
public void Setup()
{
var random = new Random();
using (var context = new MyContext())
{
context.Database.EnsureDeleted();
context.Database.EnsureCreated();
var people = Enumerable.Range(0, 500000)
.Select(x => new Person { TenantId = random.Next(1, 6) })
.ToList();
context.AddRange(people);
context.SaveChanges();
}
_context = new MyContext();
}
[Benchmark]
public void BulkWithSelect()
{
_context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
DECLARE @inserted0 TABLE ([Id] int, [TenantId] int, [_Position] [int]);
MERGE [People] USING (
VALUES (3, 0),
(3, 1),
(3, 2),
(3, 3),
(3, 4),
(3, 5),
(3, 6),
(3, 7),
(3, 8),
(3, 9),
(3, 10),
(3, 11),
(3, 12),
(3, 13),
(3, 14),
(3, 15),
(3, 16),
(3, 17),
(3, 18),
(3, 19),
(3, 20),
(3, 21),
(3, 22),
(3, 23),
(3, 24),
(3, 25),
(3, 26),
(3, 27),
(3, 28),
(3, 29),
(3, 30),
(3, 31),
(3, 32),
(3, 33),
(3, 34),
(3, 35),
(3, 36),
(3, 37),
(3, 38),
(3, 39),
(3, 40),
(3, 41)) AS i ([TenantId], _Position) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([TenantId])
VALUES (i.[TenantId])
OUTPUT INSERTED.[Id], INSERTED.[TenantId], i._Position
INTO @inserted0;
SELECT [t].[Id], [t].[CreatedDate] FROM [People] t
INNER JOIN @inserted0 i ON ([t].[Id] = [i].[Id]) AND ([t].[TenantId] = [i].[TenantId])
ORDER BY [i].[_Position];");
}
[Benchmark]
public void BulkWithoutSelect()
{
_context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
INSERT INTO [People] ([TenantId])
VALUES (3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3);");
}
[GlobalCleanup]
public void Cleanup()
{
_context?.Dispose();
}
}
public class MyContext : DbContext
{
public DbSet<Person> People { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder.UseSqlServer("Server=.;Database=SaveChanges;Trusted_Connection=True;");
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Person>(e =>
{
e.HasKey(x => new { x.TenantId, x.Id });
e.Property(x => x.Id).ValueGeneratedOnAdd();
e.Property(x => x.CreatedDate).HasDefaultValueSql("SYSUTCDATETIME()");
});
}
}
public class Person
{
public int TenantId { get; set; }
public int Id { get; set; }
public DateTime CreatedDate { get; set; }
} |
@stevendarby you probably want to take a look at #27372, which was implemented and can be tested in 7.0.0-preview.3. In a nutshell, EF Core 7.0 uses a slightly different technique for bulk-inserting which still fetches generated values, but uses the OUTPUT close without a temporary table variable (OUTPUT, not OUTPUT INTO). I've updated your benchmark code to add the new technique used by EF Core 7.0 (see below), here are the results:
In other words, the slowness in your benchmark comes not from fetching the actual values (sending those two ints+position are quite negligible in the grand scheme of things), but rather from the SQL Server temporary table being used. Now, I'm not saying further gains couldn't be reached by not fetching, but they currently seem quite minimal for most cases. Of course, if the value fetched is a huge string or binary data, the situation would be quite different, but that seems like quite an edge-case scenario that we haven't seen much. Another reason to remove the fetching is that it would allow using techniques which don't allow fetching at all (in theory even SqlBulkCopy could be used, in certain cases); but that goes in a complicated direction and we don't have concrete plans to do that right now. If we do go in this direction, this issue would be part of that. It would be great if you can run the added benchmarks below and confirm that you see the same results. If you have any other ideas or input, of course that would be welcome!
I think this also shows how in perf, nothing is straightforward and one should avoid assumptions... Contrary to expectations (including my own initial ones!), the fetching of generated values isn't a significant factor here. PS Any specific reason you're seeding the database with 500000 rows in this benchmark? Updated benchmark codeusing BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.EntityFrameworkCore;
using System;
using System.Linq;
BenchmarkRunner.Run<InsertSpeed>();
public class InsertSpeed
{
private MyContext _context;
[GlobalSetup]
public void Setup()
{
var random = new Random();
using (var context = new MyContext())
{
context.Database.EnsureDeleted();
context.Database.EnsureCreated();
// var people = Enumerable.Range(0, 500000)
// .Select(x => new Person { TenantId = random.Next(1, 6) })
// .ToList();
//
// context.AddRange(people);
// context.SaveChanges();
}
_context = new MyContext();
}
[Benchmark]
public void Bulk_with_Output_Into()
{
_context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
DECLARE @inserted0 TABLE ([Id] int, [TenantId] int, [_Position] [int]);
MERGE [People] USING (
VALUES (3, 0),
(3, 1),
(3, 2),
(3, 3),
(3, 4),
(3, 5),
(3, 6),
(3, 7),
(3, 8),
(3, 9),
(3, 10),
(3, 11),
(3, 12),
(3, 13),
(3, 14),
(3, 15),
(3, 16),
(3, 17),
(3, 18),
(3, 19),
(3, 20),
(3, 21),
(3, 22),
(3, 23),
(3, 24),
(3, 25),
(3, 26),
(3, 27),
(3, 28),
(3, 29),
(3, 30),
(3, 31),
(3, 32),
(3, 33),
(3, 34),
(3, 35),
(3, 36),
(3, 37),
(3, 38),
(3, 39),
(3, 40),
(3, 41)) AS i ([TenantId], _Position) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([TenantId])
VALUES (i.[TenantId])
OUTPUT INSERTED.[Id], INSERTED.[TenantId], i._Position
INTO @inserted0;
SELECT [t].[Id], [t].[CreatedDate] FROM [People] t
INNER JOIN @inserted0 i ON ([t].[Id] = [i].[Id]) AND ([t].[TenantId] = [i].[TenantId])
ORDER BY [i].[_Position];");
}
[Benchmark]
public void Bulk_with_Output()
{
_context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
MERGE [People] USING (
VALUES (3, 0),
(3, 1),
(3, 2),
(3, 3),
(3, 4),
(3, 5),
(3, 6),
(3, 7),
(3, 8),
(3, 9),
(3, 10),
(3, 11),
(3, 12),
(3, 13),
(3, 14),
(3, 15),
(3, 16),
(3, 17),
(3, 18),
(3, 19),
(3, 20),
(3, 21),
(3, 22),
(3, 23),
(3, 24),
(3, 25),
(3, 26),
(3, 27),
(3, 28),
(3, 29),
(3, 30),
(3, 31),
(3, 32),
(3, 33),
(3, 34),
(3, 35),
(3, 36),
(3, 37),
(3, 38),
(3, 39),
(3, 40),
(3, 41)) AS i ([TenantId], _Position) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([TenantId])
VALUES (i.[TenantId])
OUTPUT INSERTED.[Id], INSERTED.[TenantId], i._Position;");
}
[Benchmark]
public void Bulk_without_Output()
{
_context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
MERGE [People] USING (
VALUES (3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3)) AS i ([TenantId]) ON 1=0
WHEN NOT MATCHED THEN
INSERT ([TenantId])
VALUES (i.[TenantId]);");
}
[Benchmark]
public void Bulk_with_insert()
{
_context.Database.ExecuteSqlRaw(
@"SET NOCOUNT ON;
INSERT INTO [People] ([TenantId])
VALUES (3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3),
(3);");
}
[GlobalCleanup]
public void Cleanup()
{
_context?.Dispose();
}
}
public class MyContext : DbContext
{
public DbSet<Person> People { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder.UseSqlServer("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Trust Server Certificate=true");
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Person>(e =>
{
e.HasKey(x => new { x.TenantId, x.Id });
e.Property(x => x.Id).ValueGeneratedOnAdd();
e.Property(x => x.CreatedDate).HasDefaultValueSql("SYSUTCDATETIME()");
});
}
}
public class Person
{
public int TenantId { get; set; }
public int Id { get; set; }
public DateTime CreatedDate { get; set; }
} |
@roji that looks really promising, and I’ll take a closer look soon for my own interest. Re: the 500000 rows, the 6.0 SQL returns the generated Id and CreatedDate values by selecting from the People table, so I figured populating it with a healthy amount of data could help demo the impact of traversing the the index and reading from the table etc. |
@stevendarby I see - though the lookup is by the Id (primary key), which uses an index and therefore shouldn't be affected too much by existing rows (though there indeed should be some impact). Note that the new 7.0 SQL is the default, but does not work where there are triggers; EF 7.0 now allows you to declare that there's a trigger on a table, in which case we revert back to the 6.0 (inefficient) SQL. |
@roji FWIW I had another look at this with your updated benchmarks, plus a few tweaks of my own to check out a few scenarios, and I'm really satisfied that the 7.0 SQL you highlighted already has the performance improvements I previously thought could only come from implementing the changes proposed for this issue. I do see a slight gain in not returning the values at all, but it's nothing like the improvement from using a direct output instead of output into. Thank you. Not using triggers in my DB! |
Thanks for confirming @stevendarby! Yeah, we do intend to implement this issue at some point, but priority-wise it seems much less important than it previously did. |
+1, this would be a good option to have. In EF Core 6, I've just come across a situation where being able to opt-out of returning a key would be beneficial. Ef Core was generating the below SQL to insert an object into a request logs table. When under high load, the table was getting locked by the SELECT statement, meaning all requests were timing out and failing.
In my specific case I don't care about EF core checking the objec, or selecting the primary key after insert. I just need to dump the data into the table and move on. I ended up having to write a custom method on my DbContext to insert the record in:
But would prefer the option to do something like the below, which would prevent the round trip and generate the SQL accordingly.
I understand the SELECT has been replaced with an OUTPUT in EF Core 7, but the option to opt-out entirely would still be good to have. |
EDIT: It was probably not a solution. It was a problem on the migration side or an invalid way of db creation (which was on my side not on ef core). I moved to dbup to apply the migration and migration creation via dotnet CLI + migration script and it solved most of the problems. It really helped me a lot, idk how, but it is, thank you @Giancarlos Using Ef Core v3.1.32 |
Real-world case where users only have INSERT permissions: How to prevent primary key reading back after DbContext.Add(). An OUTPUT clause requires SELECT permissions. So this is another case where not reading any values back after insert is desired. |
Dealing with this at the moment. Is there a solution that does not involve running raw sql |
Based on https://stackoverflow.com/questions/44923147/how-to-avoid-inserted-entity-to-be-selected-after-insert-in-ef-core
Whenever we have generated columns (On insert or update), we do Select after insert/update to retrieve generated values to populate the entity in memory. In some cases, user may want just save the data and not care about updating them in memory any further (because not using them later).
At present there is no mechanism for this functionality.
The text was updated successfully, but these errors were encountered: