Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-breaking spaces replaced with regular, breaking spaces #31

Closed
daformat opened this issue Nov 26, 2017 · 10 comments
Closed

Non-breaking spaces replaced with regular, breaking spaces #31

daformat opened this issue Nov 26, 2017 · 10 comments

Comments

@daformat
Copy link
Contributor

daformat commented Nov 26, 2017

The parser seems to replace non breaking spaces by normal spaces, I've tried using regular html entities, unicode characters, and hard-coding the non breaking spaces (typing alt + space on a Mac), no matter what I cannot get a non-breaking space in the parser output.

Reproducing the issue

Compare the render of <JsxParser> with React's dangerouslySetInnerHtml prop using the following:

Input

<p>This is a « test » without any non-breaking space</p>
<p>This is a «\u00a0test\u00a0» with non-breaking spaces as Unicode character</p>
<p>This is a «&nbsp;test&nbsp;» with non-breaking spaces as regular html entities</p>

You can also try with hard-coded non-breaking spaces but it seems I cannot type nor paste them in github's editor for some reason, so I didn't include this in the example input

Output

With <JsxParser> you should get regular spaces everywhere, with dangerouslySetInnerHtml you should get the non-breaking spaces wherever you put them, which is the expected behavior.

Version info

I'm using React 15.6.1 and react-jsx-parser 1.1.5

@TroyAlford
Copy link
Owner

I'm not actually clear on what is breaking in this example. Are you saying that the &nbsp; is being replaced with a single non-breaking space, when rendered?

@daformat
Copy link
Contributor Author

daformat commented Nov 27, 2017

Using:

<JsxParser
  jsx={`<p>This is a « test » without any non-breaking space</p>
        <p>This is a «\u00a0test\u00a0» with non-breaking spaces as Unicode character</p>
        <p>This is a «&nbsp;test&nbsp;» with non-breaking spaces as regular html entities</p>`}
  onError={error => console.log(error)}
  />

renders to (redacted and formatted for clarity):

<div class="jsx-parser">
  <p>This is a « test » without any non-breaking space</p>
  <p>This is a « test » with non-breaking spaces as Unicode character</p>
  <p>This is a « test » with non-breaking spaces as regular html entities</p>
</div>

Both \u00a0and &nbsp; are non-breaking spaces and should render as such, but for some reason they get converted to regular spaces (breaking spaces that is).

The actual expected render should keep those entities and be:

<div class="jsx-parser">
  <p>This is a « test » without any non-breaking space</p>
  <p>This is a «\u00a0test\u00a0» with non-breaking spaces as Unicode character</p>
  <p>This is a «&nbsp;test&nbsp;» with non-breaking spaces as regular html entities</p>
</div>

Admittedly, this is a boring example :)

@daformat daformat changed the title Non breaking spaces replaced by normal spaces Non-breaking spaces replaced with regular, breaking spaces Nov 27, 2017
@daformat
Copy link
Contributor Author

daformat commented Dec 3, 2017

So, apparently this all comes down to

return (value || '').replace(/\s+/g, ' ')

The \spattern matches any blank character, including non-breaking spaces, which are then converted to regular, breaking, spaces.

I can see why you'd want to replace any multiple spaces by singles spaces, but I'm not sure this is absolutely required as multiple spaces are already ignored upon rendering.

I made a fork where I just removed the replace and I indeed get the expected result, keeping my precious non-breaking spaces as such.

Here are the options I see:

  1. Remove the replace completely (I can open a pull request for you to pull my changes).
  2. Only replace regular spaces, and maybe tabs using a more explicit pattern (I would go with something like /[ \t]{2,}/, using {2,} in order to only match sequences of two or more spaces, including tabs)

As noted in 2. I would tweak the regex so it doesn't match single spaces, as there's no need to replace those systematically.

Hope this helps, let me know your thoughts and... have a nice day!

@TroyAlford
Copy link
Owner

Oh nice. Actually - I think you're right that this is entirely unnecessary, since the repo was switched to using Acorn for parsing JSX. If you want to submit a pull request - please add a couple tests, as well, to ensure that it behaves properly, and I'll be happy to make an update and re-release!

I really appreciate the help!

daformat added a commit to daformat/react-jsx-parser that referenced this issue Dec 11, 2017
daformat added a commit to daformat/react-jsx-parser that referenced this issue Dec 11, 2017
@daformat
Copy link
Contributor Author

Here you go Troy,

Not sure I should have kept the nodeType tests on react renderer comments (the comment - text - comment thing), but in the end I did, so it's consistent with the other tests that were already there. Let me know if I should do things differently.

Happy to help resolving this, the issue was quite trivial but since non-breaking spaces can really make or break a layout, keeping them as such is quite important to me :)

Have a nice day.

TroyAlford pushed a commit that referenced this issue Dec 28, 2017
…such (#33)

* Keep white-space as-is, keeping non-breaking spaces non-breaking - see #31
@TroyAlford
Copy link
Owner

@daformat - I pushed a change in the last few days which should make this work even more consistently. For some reason just returning the value from a JsxText node, when it was an &nbsp; would get thrown out by React. Thus, when you render something like: <b>Bold</b>&nbsp;<a href="...">Link</a> it would come out with nothing between the b and the a tag.

The fix I pushed may be relevant to your use-case, too - in case you run into this. It had to do with returning <Fragment>{value}</Fragment> instead of just value from these nodes - which for some reason causes React to render as an actual non-breaking space character - rather than omitting it as useless whitespace.

I am still confused about the behavior - but as this happens all the time in my use case - I can confirm that the update solves it.

Tl;Dr; - you may want to update your version. :)

@daformat
Copy link
Contributor Author

daformat commented Feb 22, 2018

Thanks for the heads up @TroyAlford! It seems those poor non-breaking spaces don't get much love these days... Will update but I'm not too concerned by this specific use-case in my app.

Maybe React does a filtering for empty text nodes via the same regexp you were using:
the \s (white spaces) character class catches all spaces, including non breaking ones and tabs, so if they test for empty text nodes using this instead of explicitly looking for regular breaking spaces and tabs, it could explain your issue.

Something like:

// Replace multiple spaces by a single space
// even non-breaking spaces, duh
decodeHTMLEntities(
  html
).replace(
  /\s+/g,
  ' '
);

would replace non breaking spaces as soon as &nbsp; entities are parsed to non-breaking spaces, which I suspect happens before the multiple space trimming thing React does.

The correct behavior requires to use something more specific than the \s class.
As mentioned in the previous comments, using /[ \t]{2,}/g instead yields the expected result without removing any non-breaking spaces.

@TroyAlford
Copy link
Owner

Thanks for the thoughtful reply! I tried a bunch of different permutations to fix this (because I really wanted to just output the encoded version into the rendered HTML) - but in the end, settled on something that works.

I suspect you are exactly right - but hopefully this won't crop up again to require further fixing. :) If it does, this gives me some good notes to go back to, to revisit possible solutions.

@daformat
Copy link
Contributor Author

daformat commented Feb 23, 2018

You're very welcome, I don't have much time today but I did a quick search in React source code. I was looking for regexes containing \s without much luck regarding our issue.

But then it occurred to me that the inverse class, \S, could have been used, too (capital S, matching anything but whitespaces, non-breaking spaces being considered as whitespace). I found only one place where this occurs:

/~https://github.com/facebook/react/blob/825682390d825bb51be2e06835cf7e3458352e83/packages/react-dom/src/client/validateDOMNesting.js#L442-L453

As a quick check I compiled the list of all utf spaces I could find information on and ran a quick map to generate a table with the results of checking against different regexp:

image

Digging a little bit more in the React repo, I found Issue 7515

Looking at the comments it seems to me it's related:
image

It's still rough but I cannot look further today, just wanted to add this for reference, here is the gist for testing utf spaces against different regular expressions.

@TroyAlford
Copy link
Owner

Thanks for the great research summary! I'd be totally happy to check out a PR, if you think there's better handling to be done here. At the moment, I'm ok with just outputting the actual non-standard space character inside a <Fragment> block - but perhaps there's something more elegant to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants