Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NpgsqlDataAdapter doesn't support batches #1830

Open
dmitry-a-morozov opened this issue Mar 5, 2018 · 21 comments
Open

NpgsqlDataAdapter doesn't support batches #1830

dmitry-a-morozov opened this issue Mar 5, 2018 · 21 comments

Comments

@dmitry-a-morozov
Copy link
Contributor

Attempt to set NpgsqlDataAdapter.UpdateBatchSize fails with Specified method is not supported..
This implementation inherited from DbDataAdapter.UpdateBatchSize .
Is it possible to implement batch support similar to SqlDataAdapter?

@roji
Copy link
Member

roji commented Mar 5, 2018

To be honest, DataAdapter/DataSet are the ADO.NET areas I know least - they're more or less considered legacy in general. Out of curiosity, in what scenario are you using DataAdapter? Do you feel like taking a look at batching?

@dmitry-a-morozov
Copy link
Contributor Author

We use it to send updates via statically typed tables. We don't use DataSet only DataTable. Frankly it's nice way to send changes back to database as alternative mechanically writing all those Update/Insert/Delete statements.
/~https://github.com/demetrixbio/FSharp.Data.Npgsql#data-modifications

What's a suggested approach to send updates to database? EF Core?

I'm no sure I will have capacity to look at batching. And it's fine if you put low priority on this issue. I understand legacy nature of this.

@roji
Copy link
Member

roji commented Mar 5, 2018

In general a simple DbCommand with parameters is the recommended simple way to send data, and for extra performance you Prepare() it as well. For inserting only, bulk insert (the PostgreSQL copy API) is even faster.

I'll keep this open, hopefully me or someone else will have time to look at this.

@msvprogs
Copy link

msvprogs commented May 16, 2018

I need to insert a large amount of data (approx. 1000 rows) as fast as possible, but with ON CONFLICT DO UPDATE checks, so bulk insert isn't applicable. It would be great to have possibility to execute these INSERT commands as a batch.

@roji
Copy link
Member

roji commented May 16, 2018

@msvprogs you can definitely do that - simply concatenate all those INSERT statements into the same NpgsqlCommand.CommandText. For extra perf you can prepare the command before executing it, if the statements inside the command are identical (i.e. same parameterized SQL), that'll give you another nice perf bump.

@msvprogs
Copy link

@roji Actually I did so, but this method doesn't allow to use parameters, because I have to concatenate 1000 INSERTs with different values. Well, I could fill them with 6000 different parameters (6 per row), but it seems too ridiculous :)

@roji
Copy link
Member

roji commented May 17, 2018

There's nothing necessarily wrong with batching 1000 INSERTs with 6000 parameters, have you given it a try? Any particular reason why it's problematic?

@msvprogs
Copy link

@roji Yeah, it worked, thanks. But it still looks ridiculous to execute SQL statement with thousands of parameters :)

@roji
Copy link
Member

roji commented May 18, 2018

@msvprogs I agree, /~https://github.com/dotnet/corefx/issues/3688 is about having a better standard API for batching in ADO.NET, but for now that's what we have.

Keeping this issue open to track implementation NpgsqlDataAdapter.UpdateBatchSize.

@roji roji added this to the Backlog milestone Jun 9, 2018
@danieldecampos
Copy link

@msvprogs probably a bit late but you should look at a slight change in technique. Rather create a temp table with no keys/indexes and do a bulk insert into that. Once the batch is written to the temp table then do the INSERT... ON CONFLICT DO UPDATE from the temp table to the real table. To make things even quicker, disable/drop indexes on your real table first and recreate after. A bunch of other tweaks you can do too... just do a quick search for the list of options

Just as an example.. on a recent project , i'm batching 100k records this way against a table with millions of records in no time at all...

@dmitry-a-morozov
Copy link
Contributor Author

I wrote hackish replacement of NpgsqlDataAdapter in F# that supports batch updates.
/~https://github.com/demetrixbio/FSharp.Data.Npgsql/blob/0388a02b3f9a7211c05a4b5b759ed436ff6b98f6/src/Runtime/BatchDataAdapter.fs

@Brar
Copy link
Member

Brar commented Jul 6, 2018

I'll have a look at this if nobody objects.

@Brar Brar self-assigned this Jul 6, 2018
@Brar Brar removed the up for grabs label Jul 6, 2018
@Brar
Copy link
Member

Brar commented Jul 26, 2018

A little update:
I found out that this involves writing code that parses parameter placeholders out of INSERT, UPDATE and DELETE statements in order to make their names unique when concatenating multiple statements.
Doing this properly is harder than it might look on the first sight.
This will definitely take time.

@roji
Copy link
Member

roji commented Jul 26, 2018

Yeah, that definitely doesn't sound trivial.

I have no idea how needed this feature is... I personally consider DataAdapter/DataSet really quite legacy/obsolete, but I know there are lots of applications out there using it. Whether they are in need of batching support is anyone's guess..

Up to you if you want to take a stab at it :)

@Brar
Copy link
Member

Brar commented Jul 26, 2018

Whether they are in need of batching support is anyone's guess.

At least we had one feature request so I think it's worth some consideration.

Up to you if you want to take a stab at it :)

Thank you.
Writing parsers has always been my nemesis.
I can do for loops over characters with a few states but this is definitely more.
I really want to tackle this but I can't promise anything ;)

@Brar
Copy link
Member

Brar commented Aug 19, 2020

It turns out that this issue notoriously drops off the list when I prioritize my work (pretty much a legacy API, a lot of work, rarely requested), so I've unassigned myself.

@roji
Copy link
Member

roji commented Aug 20, 2020

Sure thing @Brar, I really don't think there's a lot of value in implementing this.

@evolcano
Copy link

I don't think DataAdapter/DataSet is legacy/obsolete. It's useful to process data without create data model, and it is flex to manipulate any database because you can query and set value by column name.
When I insert 40,000 records into database, it cost 2 minutes and 25 seconds. And time reduced to 1 minute and 50 seconds after removed primary key and any index. Because I have more than 500 different tables, and data count is about 300M-800M in each table. Every 100M records may take 1 hour to insert. It's really time consuming.

Should I make up every single SQL and connect it all by hand?

@roji
Copy link
Member

roji commented Aug 30, 2022

@evolcano if you're really looking to large numbers of rows (and 100M definitely is large), you should be using binary COPY; nothing will even come close to that.

@evolcano
Copy link

Thanks @roji . I'm working on a cross-platform application which using different databases, we need transfer data cross these database, and we need transfer on condition and modify data before update to destination. So maybe binary COPY is not suit for my situation.

We have create a middleware using UpdateBatchSize but npgsql don't support.

So, I have to spell the SQL manually according to every different table. It seems only need 11 second to insert 40,000 records if 2000 records been batch inserted each time.

@roji
Copy link
Member

roji commented Aug 30, 2022

@evolcano if you really are inserting hundreds of millions of rows, then I'd definitely say that handling the import in a high-perf way for each database is important (e.g. bulk copy for PostgreSQL, SqlBulkCopy for SQL Server...).

In any case, very few users have requested this feature (no votes), but if someone wants to work on it and submit a PR, they definitely can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants