Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<u> for code point 128 (C1 control) causes exception #401

Closed
ietf-svn-bot opened this issue Apr 11, 2019 · 8 comments
Closed

<u> for code point 128 (C1 control) causes exception #401

ietf-svn-bot opened this issue Apr 11, 2019 · 8 comments
Labels
medium text Issues in text output

Comments

@ietf-svn-bot
Copy link

owner:henrik@levkowetz.com resolution_fixed type_defect | by julian.reschke@gmx.de


Input:

<t><u format="lit-num-name">&#x80;</u></t>

Output:

Traceback (most recent call last):
  File "/bin/xml2rfc", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/site-packages/xml2rfc/run.py", line 549, in main
    writer.write(filename)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 228, in write
    lines = self.render(self.root, width=72, joiners=joiners)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 271, in render
    res = func(e, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 2834, in render_rfc
    lines = self.ljoin(lines, c, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 446, in ljoin
    res = mklines(self.render(e, width, **kwargs), e)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 271, in render
    res = func(e, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 2024, in render_middle
    lines = self.ljoin(lines, c, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 446, in ljoin
    res = mklines(self.render(e, width, **kwargs), e)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 271, in render
    res = func(e, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 3102, in render_section
    lines = self.ljoin(lines, c, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 446, in ljoin
    res = mklines(self.render(e, width, **kwargs), e)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 271, in render
    res = func(e, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 3102, in render_section
    lines = self.ljoin(lines, c, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 446, in ljoin
    res = mklines(self.render(e, width, **kwargs), e)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 271, in render
    res = func(e, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 3470, in render_t
    text = fill(self.inner_text_renderer(e), width=width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 504, in inner_text_renderer
    text += self.render(c, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 271, in render
    res = func(e, width, **kwargs)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/text.py", line 4241, in render_u
    self.err(e, exception)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/base.py", line 1735, in err
    msg = self.msg(e, 'Error:', text)
  File "/usr/lib/python2.7/site-packages/xml2rfc/writers/base.py", line 1676, in msg
    msg = "%s(%s): %s %s" % (file or self.xmlrfc.source, lnum, label, text, )
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 30: ordinal not in range(128)


Issue migrated from trac:401 at 2022-02-05 12:48:15 +0000

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de changed component from Version 2 cli to Version_3_cli_txt

@ietf-svn-bot
Copy link
Author

@duerst@it.aoyama.ac.jp commented


An error may be better than an exception. But one or the other is appropriate, because � is just an undefined control codepoint, and shouldn't turn up in an RFC at all.

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


AFAICT, it's not an undefined control point in Unicode - otherwise why does it show up in the Unicode database?

But yes, if we want to exclude C1 controls, we need to clarify that in the documentation.

@ietf-svn-bot
Copy link
Author

@duerst@it.aoyama.ac.jp commented


With "undefined control codepoint", I meant that it's a control, but it's not defined what kind of control. Some other controls have explanations (e.g. 009D = OPERATING SYSTEM COMMAND), see https://www.unicode.org/charts/PDF/U0080.pdf. U+0080 doesn't. On top of that, even for those controls that have explanations, this is just the ISO/IEC 6429:1992 interpretation, and Unicode doesn't force that.

As for documentation, I hope there is already something about how to use non-ASCII characters carefully. Not using U+0080 can easily be seen as including that, there's no need for explicitly mentioning it.

@ietf-svn-bot
Copy link
Author

@julian.reschke@gmx.de commented


Martin, the code fails in the same way for U+0092.

@ietf-svn-bot
Copy link
Author

@henrik@levkowetz.com changed status from new to closed

@ietf-svn-bot
Copy link
Author

@henrik@levkowetz.com changed resolution from `` to fixed

@ietf-svn-bot
Copy link
Author

@henrik@levkowetz.com commented


Fixed in 30d209e:

Added a default rendering (code point number) for code points without unicode code point names. Fixes issues #401 and #402.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium text Issues in text output
Projects
None yet
Development

No branches or pull requests

1 participant