-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NpgsqlDataAdapter doesn't support batches #1830
Comments
To be honest, DataAdapter/DataSet are the ADO.NET areas I know least - they're more or less considered legacy in general. Out of curiosity, in what scenario are you using DataAdapter? Do you feel like taking a look at batching? |
We use it to send updates via statically typed tables. We don't use What's a suggested approach to send updates to database? EF Core? I'm no sure I will have capacity to look at batching. And it's fine if you put low priority on this issue. I understand legacy nature of this. |
In general a simple DbCommand with parameters is the recommended simple way to send data, and for extra performance you Prepare() it as well. For inserting only, bulk insert (the PostgreSQL copy API) is even faster. I'll keep this open, hopefully me or someone else will have time to look at this. |
I need to insert a large amount of data (approx. 1000 rows) as fast as possible, but with ON CONFLICT DO UPDATE checks, so bulk insert isn't applicable. It would be great to have possibility to execute these INSERT commands as a batch. |
@msvprogs you can definitely do that - simply concatenate all those INSERT statements into the same NpgsqlCommand.CommandText. For extra perf you can prepare the command before executing it, if the statements inside the command are identical (i.e. same parameterized SQL), that'll give you another nice perf bump. |
@roji Actually I did so, but this method doesn't allow to use parameters, because I have to concatenate 1000 INSERTs with different values. Well, I could fill them with 6000 different parameters (6 per row), but it seems too ridiculous :) |
There's nothing necessarily wrong with batching 1000 INSERTs with 6000 parameters, have you given it a try? Any particular reason why it's problematic? |
@roji Yeah, it worked, thanks. But it still looks ridiculous to execute SQL statement with thousands of parameters :) |
@msvprogs I agree, /~https://github.com/dotnet/corefx/issues/3688 is about having a better standard API for batching in ADO.NET, but for now that's what we have. Keeping this issue open to track implementation |
@msvprogs probably a bit late but you should look at a slight change in technique. Rather create a temp table with no keys/indexes and do a bulk insert into that. Once the batch is written to the temp table then do the INSERT... ON CONFLICT DO UPDATE from the temp table to the real table. To make things even quicker, disable/drop indexes on your real table first and recreate after. A bunch of other tweaks you can do too... just do a quick search for the list of options Just as an example.. on a recent project , i'm batching 100k records this way against a table with millions of records in no time at all... |
I wrote hackish replacement of |
I'll have a look at this if nobody objects. |
A little update: |
Yeah, that definitely doesn't sound trivial. I have no idea how needed this feature is... I personally consider DataAdapter/DataSet really quite legacy/obsolete, but I know there are lots of applications out there using it. Whether they are in need of batching support is anyone's guess.. Up to you if you want to take a stab at it :) |
At least we had one feature request so I think it's worth some consideration.
Thank you. |
It turns out that this issue notoriously drops off the list when I prioritize my work (pretty much a legacy API, a lot of work, rarely requested), so I've unassigned myself. |
Sure thing @Brar, I really don't think there's a lot of value in implementing this. |
I don't think DataAdapter/DataSet is legacy/obsolete. It's useful to process data without create data model, and it is flex to manipulate any database because you can query and set value by column name. Should I make up every single SQL and connect it all by hand? |
@evolcano if you're really looking to large numbers of rows (and 100M definitely is large), you should be using binary COPY; nothing will even come close to that. |
Thanks @roji . I'm working on a cross-platform application which using different databases, we need transfer data cross these database, and we need transfer on condition and modify data before update to destination. So maybe binary COPY is not suit for my situation. We have create a middleware using UpdateBatchSize but npgsql don't support. So, I have to spell the SQL manually according to every different table. It seems only need 11 second to insert 40,000 records if 2000 records been batch inserted each time. |
@evolcano if you really are inserting hundreds of millions of rows, then I'd definitely say that handling the import in a high-perf way for each database is important (e.g. bulk copy for PostgreSQL, SqlBulkCopy for SQL Server...). In any case, very few users have requested this feature (no votes), but if someone wants to work on it and submit a PR, they definitely can. |
Attempt to set
NpgsqlDataAdapter.UpdateBatchSize
fails withSpecified method is not supported.
.This implementation inherited from DbDataAdapter.UpdateBatchSize .
Is it possible to implement batch support similar to
SqlDataAdapter
?The text was updated successfully, but these errors were encountered: