-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WString: Optimize a bit #7553
WString: Optimize a bit #7553
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but the PR has some problems. I'm very wary of changing String because it can end up affecting a lot of things down the line if we mess anything up.
Do you have any numbers for the changes here, in terms of code size or String ops/sec? I really doubt these changes have an appreciable size or performance difference.
Can you please give some hard #s on the size difference for a few examples using this vs. master, and the runtime differences, too? You could do a build and then |
|
That shows that your optimization attempts are going the wrong way: 16 bytes increase in IROM. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing jumps out at me.
HOWEVER: we've been bitten before by innocent-seeming minor tweaks here. Please go over the current unit tests covering the String class and see if you can find any uncovered code that is affected by your changes, and report back here.
If you find uncovered code, or uncovered corner cases, please add tests to ensure correct behavior.
cores/esp8266/WString.h
Outdated
union { | ||
struct _ptr ptr; | ||
struct _sso sso; | ||
}; | ||
static constexpr auto CAPACITY_MAX = (1U << sizeof(ptr.cap) * 8) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used in the .h file, so why not move it to the .cpp?
Or could it be used from other classes? (it is protected, so only those inheriting from this class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it is protected, so only those inheriting from this class)
just only, but still possible.
if so, we should warn: [API change] (of course .isNotSSO
too)
i'd like to know why these internal implementation members are protected but not private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the init()
function was really needed for proper functioning of the class, so if someone wants to make a derived class, this function must be protected (or public).
I can imagine it is not needed for using the String class, so then protected makes perfect sense.
Now init() is no longer needed in the current implementation, but it may indeed break API design if the function is removed. The same for the CAPACITY_MAX
value.
Is the |
cores/esp8266/WString.cpp
Outdated
@@ -159,7 +159,8 @@ unsigned char String::changeBuffer(unsigned int maxStrLen) { | |||
// Fallthrough to normal allocator | |||
size_t newSize = (maxStrLen + 16) & (~0xf); | |||
// Make sure we can fit newsize in the buffer | |||
if (newSize > CAPACITY_MAX) { | |||
size_t maxCapacity = (1U << sizeof(ptr.cap) * 8) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be careful when refactoring, the original definition was static constexpr auto. At least make this constexpr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the advice, @devyte
i have periodically checked the disassembled binaries using objdump -d
.
in this case, the same as before.
cores/esp8266/WString.h
Outdated
unsigned char isSSO : 1; | ||
unsigned char len : 7; // Ensure only one byte is allocated by GCC for the bitfields | ||
unsigned char isNotSSO : 1; | ||
_sso() : len(0), isNotSSO(0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been proposed to move in the opposite direction: direct member init instead of constructor member init sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
direct member init for bitfield requires c++20, will bring some of CI tests failure (possibly host emulation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I meant. You should be able to do direct member init below in the union in some form or other, i. e. {0}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For bitfield members that's not possible, as that's one of the new features of C++20.
So for this special case of the bitfields you can only initialize them in the constructor of the _sso
struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quick test:
union
{
_sso sso = {{0},{0}};
_ptr ptr;
};
works fine with C++11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx @devyte:
that's surely fine, in terms of the logic: wholly zeroed, due to funccall of memset(this, 0, sizeof(_sso))
.
zeroing _sso.buff[0]
and _sso.
bitfields would be sufficient (and of course most efficient).
both class member initializers (char member = 0;
) and braces initialization (struct _foo foo = { 0, ... };
) cause whole initialization
(assumed that each of unspecified members has default value).
member initializer (: bar(0), ...
) seems to be only way to realize sparse(holed) initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sso struct is 12 bytes in size, yet you seem keen on doing microoptimizations here. I don't consider this worthwhile, the only case where this would be significant is in a tight loop where >90% of the time is spent in initing a String object, and in such a case I would likely yell at the author. Personally, I think there are other places where spending time and effort would bring far better returns.
If you decide to pursue this, please:
- find bin size for initialization with {{0},{0}}
- compare against bin size with your member initializer
- compare runtime performance of both cases
- report back here
BTW, we're already on gcc 10.x. Although we have C++17 enabled, in theory we have access to C++20 experimental features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, we're already on gcc 10.x. Although we have C++17 enabled, in theory we have access to C++20 experimental features.
It was already mentioned before, this class is also ported to ESP32.
Are they also using gcc 10.x?
To be honest, if the braces initialization also works, then I would say to stick with what's working and not forcing to only be able to compile with the newest C++ standard.
What I'm more worried about is how the initialization is done when in a union.
In other words, what will be guaranteed to be initialized with the isSSO set to 1 (or the inverted one set to 0)?
We can talk about semantics and how the code may be the most aestetic pleasing to the reader.
But in the end it is important this one flag is always default set to mean isSSO
is active.
If that should be done via automatic zero'ed initialization (thus inverting the flag) or the suggested:
union
{
_sso sso = {{0},{0}};
_ptr ptr;
};
That's all fine by me.
Most important is that this isSSO
is in its 'enabled' state (which isn't with this suggested code by the way) as default member value.
cores/esp8266/WString.h
Outdated
unsigned char len : 7; // Ensure only one byte is allocated by GCC for the bitfields | ||
unsigned char isSSO : 1; | ||
unsigned char len : 7; // Ensure only one byte is allocated by GCC for the bitfields | ||
unsigned char isNotSSO : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this? Why the change to reverse logic? I see in addition that this is used as if(! isSSO()) in code, so this is a reverse of a reversed name that reverses a bit.
If there is a good reason for reversing, then please change the name to something that reflects the new meaning without the Not. Having to look at things like e. g. !isSSO() and then bool isSSO() {return !isNotSSO;} is just painful.
And I won't even mention the setSSO() body ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, unsigned char disabled : 1;
or something, might be unambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need a better term, perhaps "heapBuffer" ? (isSSO
was about not using a heap allocated buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before moving forward with any change, please explain why it needs to change at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its easy: the default constructors are setting "SSO" to true. by making the default "false" or "0", the initialization can be similified because almost all compilers and all architectures have some kind of loophole optimisation to find the best possible code pattern for initializing a variable or a set of variables to zero.
I think instead of using "SSO" which is an acronym that is hard to understand in the first place, I would suggest to call it "HeapString" or "isHeapString" or "isLargeString" . "isnotSSO" which is "is not small string optimisation" is really awkward english.
cores/esp8266/WString.h
Outdated
char * buff; | ||
uint16_t cap; | ||
uint16_t len; | ||
}; | ||
// This allows strings up up to 11 (10 + \0 termination) without any extra space. | ||
enum { SSOSIZE = sizeof(struct _ptr) + 4 - 1 }; // Characters to allocate space for SSO, must be 12 or more | ||
static constexpr auto SSOSIZE = sizeof(struct _ptr) + 4 - 1; // Characters to allocate space for SSO, must be 12 or more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for the enum was that we were certain that way didn't create a constant with the value with old gcc compiler, even when using the value in code. We checked with sizeof and looked at the bin dump.
Are you positive that there is no case where this new definition creates a constant that increases the mem footprint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you indicated, this change has no performance effect (either better or worse).
according to the principle of minimum action, leaving it alone is probably the right way, i understood.
cores/esp8266/WString.cpp
Outdated
buf[0] = c; | ||
buf[1] = 0; | ||
*this = buf; | ||
uint16_t buf = __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? (uint16_t)c : (uint16_t)c << 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks duplicated, please refactor.
cores/esp8266/WString.cpp
Outdated
buf[0] = c; | ||
buf[1] = 0; | ||
return concat(buf, 1); | ||
uint16_t buf = __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? (uint16_t)c : (uint16_t)c << 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate, please refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think also that most compilers perform this optimisation automatically. there might not be an observable behavior difference between old and new code.
Guys, I'm seeing this PR morphing into two things.
Number 1 seems like a fine change and should be a 2-line PR, while number 2 is big and introduces issues of taste ans style. Maybe the best way forward is to drop any functional changes from this and open a new one with only the "init sso/ptr union" change? That init PR should be easily approvable and get us runtime in end users to see if it has any effect. |
I agree, let's split the "bug fix" off into its own pr to discuss. |
f208e3c
to
b5a8527
Compare
…) stray dtor calls derived from esp8266#7553
cores/esp8266/WString.h
Outdated
struct { | ||
char _offset0 = 0; // wbuffer()[0] = 0; | ||
char _offset1[SSOSIZE - 1]; | ||
char _offset11 = 0; // setSSO(true); setLen(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get why you want to do this, but I think the naming is a bit strange here.
Also, why not just use a constructor in one (not both) of the existing structs _ptr
or _sso
?
I think _sso
is the best choice here for having the constructor to zero all values.
And since it has to be done in the body of the constructor anyway, you can also make a function in that struct and call it both from its constructor and from the String::init()
function.
As done here, using an unnamed struct with _offsetNN
names is confusing.
For example _offset11
is actually the same address as used by the _sso flags.
3fbf724
to
b39d016
Compare
b39d016
to
dcbfd05
Compare
cores/esp8266/WString.h
Outdated
@@ -68,7 +71,9 @@ class String { | |||
explicit String(unsigned long, unsigned char base = 10); | |||
explicit String(float, unsigned char decimalPlaces = 2); | |||
explicit String(double, unsigned char decimalPlaces = 2); | |||
~String(void); | |||
inline ~String() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, but what extra does inline
add to a function implemented in a .h file?
I always thought implementing a function in a .h file always implies inline
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you said, function bodies inside a class definition (typically in .h) are implicitly marked as inline
.
like extra arithmetic parentheses, may be removed later.
f3a315e
to
d018ce4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjsuwa also note of the #7611 / d3b3a43
Adding this test to tests/host/core/test_string.cpp fails on master, but not with this patch:
...
TEST_CASE("String::move", "[core][String]")
{
const char buffer[] = "this string goes over the sso limit";
String target;
String source(buffer);
target = std::move(source);
REQUIRE(source.c_str() != nullptr);
REQUIRE(target == buffer);
}
...
Perhaps you could add this test (or my commit) here?
re ALL: all existing change requests seem resolved, please re-review? I do believe initialization issue discussed above was resolved through the use of init()
. Naming might not, but idk if it is still relevant.
What is the new code size difference with these changes, @jjsuwa ? The changing polarity of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also please add in @mcspr 's add'l String test which passes w/the fixed/minimized ::move()
, too?
cores/esp8266/WString.cpp
Outdated
rhs.setLen(0); | ||
rhs.setBuffer(nullptr); | ||
invalidate(); | ||
sso = rhs.sso; // memcpy(this, &rhs, sizeof(*this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment would be best removed.
cores/esp8266/WString.cpp
Outdated
@@ -225,32 +215,9 @@ String & String::copy(const __FlashStringHelper *pstr, unsigned int length) { | |||
|
|||
#ifdef __GXX_EXPERIMENTAL_CXX0X__ | |||
void String::move(String &rhs) noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice cleanup of code here!
based on the esp8266#7553 without isSSO -> isHeap rename and inline optimizations additionally, remove useless pre-c++11 preprocessor checks Co-authored-by: Takayuki 'January June' Suwa <jjsuwa@sys3175.com>
* (test) WString: c_str() returns null pointer target = std::move(source) does not reset buffer pointer back to the sso * wstring: correctly do move invalidation & copy based on the #7553 without isSSO -> isHeap rename and inline optimizations additionally, remove useless pre-c++11 preprocessor checks Co-authored-by: Takayuki 'January June' Suwa <jjsuwa@sys3175.com>
62a5a88
to
a07ec5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes do not reflect the long discussion in this PR.
However approving the proposed changes as they are today.
Nice one. Saves 544 bytes when used with Tasmota. Without 609844 bytes. With 609300 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. Oh please, please pretty please.
Tasmota Arduino Core v2.7.4.5 allowing webpassword over 47 characters (#9687)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-edit- latest push has add'l changes that need to be reviewed before re-approval.
While I'm not that big of a fan of the isSSO
->isHeap
conversion, what's left here does seem legit, minimal, functional, and did have a measurable impact on other large projects, so approving. Thx!
bdb7b32
to
fa0ad4d
Compare
cores/esp8266/WString.h
Outdated
inline const char *buffer() const { return (const char *)(isSSO() ? sso.buff : ptr.buff); } | ||
inline char *wbuffer() const { return isSSO() ? const_cast<char *>(sso.buff) : ptr.buff; } // Writable version of buffer | ||
const char *buffer() const { return isSSO() ? sso.buff : ptr.buff; } | ||
char *wbuffer() const { return const_cast<char *>(buffer()); } // Writable version of buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of const_cast<>
I would do the more readable other way around:
const char* buffer() const { return wbuffer(); }
char* wbuffer() const { return isSSO() ? sso.buff : ptr.buff; } // Writable version of buffer
(I wonder why the compiler don't complain against the const in char* wbuffer() const
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I wonder why the compiler don't complain against the const in
char* wbuffer() const
)
You don't change the pointer, only where the pointer points to?
Otherwise you would need something like const char* const wbuffer() const
, which is not open for debate on ugliness IMHO ;)
cores/esp8266/WString.cpp
Outdated
wbuffer()[fromIndex + 1] = '\0'; | ||
char* temp = strrchr(wbuffer(), ch); | ||
wbuffer()[fromIndex + 1] = tempchar; | ||
char *wbuffer = this->wbuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local variable wbuffer
is the same as the class method wbuffer()
so I'd really ask to change this name to something else.
I think the logic is still correct, it's just a readability and sanity thing.
cores/esp8266/WString.cpp
Outdated
wbuffer()[right] = '\0'; | ||
out = wbuffer() + left; // pointer arithmetic | ||
wbuffer()[right] = temp; //restore character | ||
char *wbuffer = this->wbuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
Also, for these two changes have you seen any code size difference? wbuffer() is a pretty simple call and I think it has a good chance of being inlined to a simple reference to the string array...
@earlephilhower the last change does indeed save some bytes more. |
0294240
to
f879080
Compare
edit Sorry I have been confused by the "forced-push". @earlephilhower @devyte I guess your reviews were addressed. |
* move bodies of dtor, `init()` and `charAt()` to .h (implicitly inlined) * unify descriptions of the initialization into one: `init()` (literally), that is called from each ctors, `invalidate()` and `move()` * invert the SSO state logic in order to make init state zeroed (as a result, each inlined `init()` saves 1 insn) * detab and trim * remove `inline` from .h * cosmetics * optimize the non-SSO -> SSO transition part of `changeBuffer()` * remove duped body of `operator =(StringSumHelper &&rval)` * remove common subexpressions from `lastIndexOf()` and `substring()` * eliminate `strlen(buf)` after calling `sprintf(buf, ...)` that returns # of chars written * eliminate `len()` after calling `setLen(newlen)` * make ctor`(char c)` inlineable * optimize `setLen()` * replace constant-forwarding overload functions with default argument ones * optimize `concat(char c)` IROM size @ FSBrowser: 304916 -> 304372 (saved 544 from master)
f879080
to
f2c7565
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the local variable wbuffer was renamed, thanks! We should put this in and then see if we have any issues at end users, then in a month or so submit the same change to the ESP32 repo.
IROM size @ FSBrowser: 304372 -> 304356 (saved 560 from master)
cores/esp8266/WString.h
Outdated
sso.len = 0; | ||
sso.isHeap = 0; | ||
void init(void) __attribute__((always_inline)) { | ||
sso.buff[0] = 0; // In the Xtensa ISA, in fact, 32-bit store insn ("S32I.N") is one-byte shorter than 8-bit one ("S8I"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the comment; One byte or one bit shorter?
Or do you mean the generated binary code?
@jjsuwa What was the total gain from this optimization effort? Flash size, IRAM, heap, speed etc. Could you please redo your checks from end of August and report here for completeness? |
sorry, now @jjsuwa is unable to login due to my mistake (DNS MX expiration)... @devyte :
end of August... such a long ago :)
for speed... i guess, at least it shouldn't become slow by this PR because of code shrinkage, funccall overhead elimination and so on. |
move bodies of dtor,
init()
andcharAt()
to .h (implicitly inlined)unify the description of initialization into one:
init()
(literally), and called from each ctors,invalidate()
andmove()
.invert the SSO state logic in order to make init state zeroed (as a result, inlined
init()
insns: 4 to 3)IROM size @ FSBrowser: 304836 -> 304596 (saved 240 from master)