-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector<T> does not yield performance results that capacity usually would #24731
Comments
A Vector is under the hood an array of T. What we currently clear() we just call resize(0) on the underlaying CowData (previously we did the same btw). Some suggestions: Thoughts anyone? @reduz perhaps? |
Another option could be for Vector to know whether or not it is filled with POD types. We can do a template specialization for that case then and upon clear()/resize() etc just drop the elements on the floor since we won't have to call any dtors anyway. We can probably achieve this with C++03 but it'd be much prettier with 11 :) (It would still need some kind of notion of a 'capacity' though) |
Another option is actually to just use __is_pod, this is a compiler intrinsic supported by clang, msvc, and gcc. Then have a specialization for that and don't do anything with capacity at all. This would still mean a resize operation but it would be much, much faster. It may really be all that's necessary. |
@hpvb here are the times I get from the same benchmark: Debug, without your PR:
Debug, with your PR:
Release-tools, without your PR:
Release-tools, with your PR:
Note: there seem to be a bit of random at stake, because some rare times, for the same test, the total durations I recorded were up to 3 times less or more, so I wrote down the most occurring times. I assume it's due to core switches or the way memory is rearranged. Now, to better highlight my concern, I also tested my geometry generation code with an
(for a quick look of what I'm doing, it's a code minimap renderer, which is just a bunch of coloured lines, and I use vectors for batching: https://sebsauvage.net/paste/?c29ea94d9fa3949a#MrxGYB4a5NwuONNYAEBoxcmar9sZyamLZJHnCaJWKHg=) It really looks like the bottleneck lies in |
I got another insight today by switching my Minecrafty mesher from
As you can see, just changing the container at least DOUBLED how fast the mesher can turn voxels into mesh data in a release build, which is directly noticeable in the game without even looking at numbers. That means I can modify more voxels at once with less visual lag. I think that's enough to seriously consider an improvement to this container. This gain is caused by the way I used the vector: I keep them around for re-use, and Another note: |
The pull request that would originally close this wasn't merged, but since #38386 was, I'll close this. |
I need to profile it again some day |
Does this mean that the visual server can take |
Godot 3.1 b60939b
Windows 10 64 bits
Today I got stuck in an optimization problem: I am generating a lot of 2D geometry to be drawn often with
canvas_item_add_triangle_array
. This forces to use Vectors, which I decided re-use each time the geometry gets generated. But it keeps lagging at 6 ms just doing that since I started using it (without even including the call to draw)...Then, I realized something for good about
Vector
:The two following codes take the same amount of time to complete.
So it appears there is no way to re-use a Vector and hope it will be cost-free over repeated uses, which prevents from making my code faster. Is that really it? Is it a bug?
To describe more my problem, I tried rewriting my code in a way it never clears vectors, and instead resizes only once to the correct size before calling
canvas_item_add_triangle_array
... and I gained 200% speed, yet it still has to resize at the end (I am lucky enough to draw geometry that often roughly has the same amounts of vertices).So
Vector
does copy on write, it's easy to pass around I guess, and it resizes by power of two. But how should we deal with this case? Are we stuck allocating memory each time and half-reimplementing capacity on top of it if we want more speed? I mean, it does sound like it has capacity somehow, but it doesn't yields the results I thought.Note: this would also concern Line2D since it uses the same method too.
The text was updated successfully, but these errors were encountered: