-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build error in examples/braille #198
Comments
Updates:
|
Thank you for your feedback. This looks indeed very wrong. I've never seen this before on any platform and compiler tested. Something breaks to parse and convert regular expressions to code. The subsumption warnings are wrong and indicate something is broken in regex parsing and conversion. Reflex is not dependent on the compiler nor on the OS nor on 32 bit versus 64 bit, which were tested. I cannot explain why this happens without tracing the execution (see further below). "braille.l:63:44: warning:" this line has UTF-8 character encodings like all the other:
and the correct generated code is the following with valid UTF-8 in the literal string:
whereas in the error you get, it shows as:
I don't have Clang 16 (clang 14). I wish there is a Docker container available or can be created to replicate the problem, because I don't want to mess with my development environments. To trace all reflex for debugging, change lib/Make to enable this line:
then execute Running the reflex command generates a large DEBUG.log file. This is useful to trace the execution and to spot differences right away e.g. by comparing with |
Thanks for the quick response and happy pi day :) Adding
The log statement in this function was introduced in v4.0.0 and at the same time the I've attached the
Does this tell you anything? |
The diff shows that the regex pattern parsed from the braille.l file is different somehow, or the regex converter in lib/convert.cpp behaves differently to convert the regex pattern. I can see this from the parseN() calls that show this difference at a
versus
which has UTF-8 multibytes, not a single The problem is either somewhere when the patterns are taken from the braille.l file or in the next stage, the converter that analysis the regex pattern strings to convert when needed, e.g. to convert Unicode regex to produce a regex that works with any 8-bit regex engine. So it may not be the pattern-to-DFA construction in pattern.cpp, which is all this DEBUG.log code. A What if you truncate the braille.l definitions to just a few patterns? Like so:
That should make it easier to see what the generated string looks like with the This is a very strange issue. I've avoided undefined behavior and things like char being unsigned or signed that differ with compilers/OS. |
Stepping through On my NetBSD system, As a result the pattern recognized on my NetBSD system is just the middle
|
Or this may be caused by passing a char to isspace() without casting to unsigned char first. |
The The |
I think you are mistaken: see, e.g., https://en.cppreference.com/w/cpp/string/byte/isspace . std::isspace expects the int argument to be in the range of an unsigned char, or equal to EOF. Anything else is undefined behavior. (And while preparing a patch for this, I noticed that you do indeed handle this correctly in other places!) |
- Calling ctype functions (isspace, isalpha, tolower, ...) with arguments neither representable as unsigned char nor equal to EOF is undefined behavior. - This commit fixes issue Genivia#198.
Even after many years of programming C and C++ and many other PL, some complacency sets in with such completely counter-intuitive constructs in C and C++. |
I 100% understand :) This is very unfortunate historical baggage in the C & C++ standards. |
Fixes GH issue Genivia/RE-flex#198. Patches accepted upstream and should be in the next release.
updated configure scripts cast negative ctype function arguments (problem detected on NetBSD 10) #198
Fixed in 4.1.2 along with a few other improvements based on ugrep issue discussions at Genivia/ugrep#339 |
Trying to build the braille example using clang 16.0.6 or gcc 10.4.0 leads to the following errors:
Am I doing something wrong? I am not very knowledgeable when it comes to character encodings.
Build system is NetBSD 10 (beta).
The text was updated successfully, but these errors were encountered: