Description
There are a number of issues with the go-runewidth
package that makes it problematic when used with emojis. Working with runes is a flawed approach as explained by the author of uniseg
:
That's because a single code point (rune) is not always a complete character. That's the whole basis of the uniseg package. The README explains this in detail. One example given there is the country flag emoji. These emojis must always consist of two runes. What's the width of only one of these runes then? It's basically undefined because it's an incomplete grapheme cluster. The mattn/go-runewidth package gets this fundamentally wrong and I have tried for a long time to help them do it right but there was never much interest in following up on it.
rivo/uniseg#48 (comment)
All references to RuneWidth
and StringWidth
need to be replaced to improve the width calculations when dealing with emojis.
It is trivial to replace StringWidth
from go-runewidth
to uniseg
as they provide the same input and output. There is also a performance increase of 4x migrating to using the uniseg
version so there are some very clearly benefits here.
The complexity lies with what to do about the RuneWidth
function. The right choice will be stop thinking in runes and switch over to grapheme clusters. This will require logic changes to any function using runes.
From an acceptance criteria perspective, the goals should be:
- Remove
go-runewidth
as a dependency. - No regressions in performance.
We can expect different results for some of the calculations but these should be considered improvements and bug fixes. It is likely this will cause breaking changes to applications that depend on this package. This may warrant a major version bump to reflect that. We should also make it clear in the documentation that the calculations involving some emojis will have changed.