Skip to content

Commit

Permalink
Improve Emoji handling:
Browse files Browse the repository at this point in the history
- Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
  ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
  to implement.
- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emodi: false`
- Add new `:all_no_vs16` mode
- Only consider terminal cells needed when recommending Emoji support level
  (Emoji themselves might display differently)
- Set default Emoji mode for unknown/unsupported terminals to `:none`
  (instead of `:basic`)
  • Loading branch information
janlelis committed Nov 17, 2024
1 parent dc0a9fd commit 698ea9b
Show file tree
Hide file tree
Showing 5 changed files with 196 additions and 118 deletions.
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
# CHANGELOG

## 3.1.0 (unreleased)

**Further Emoji improvements:**

- Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
to implement.
- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emodi: false`
- Add new `:all_no_vs16` mode
- Only consider terminal cells needed when recommending Emoji support level
(Emoji themselves might display differently)
- Set default Emoji mode for unknown/unsupported terminals to `:none`
(instead of `:basic`)


## 3.0.1

- Add WezTerm and foot as good Emoji terminals
Expand Down
74 changes: 41 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Determines the monospace display width of a string in Ruby, which is useful for

Unicode version: **16.0.0** (September 2024)

## Gem Version 3.0 — Improved Emoji Support
## Gem Version 3 — Improved Emoji Support

**Emoji support is now enabled by default.** See below for description and configuration possibilities.

Expand Down Expand Up @@ -81,58 +81,66 @@ Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB cou

Please note that using overwrites disables some perfomance optimizations of this gem.

### Emoji Option
### Emoji

The gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
If your terminal supports it, the gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:

```ruby
Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 2
Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: :all # => 2
Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: false # => 5
```

Disabling Emoji support yields wrong results, as illustrated in the example above, but increases performance of display width calculation. You can configure [the Emoji set to match for](https://www.unicode.org/reports/tr51/#def_rgi_set) by passing a symbol as value:

```ruby
Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_mqe # => 3
Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_uqe # => 2
```

#### How this Library Handles Emoji Width

There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.

Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).

Emoji Type | Width / Comment
------------|----------------
Basic/Single Emoji character without Variation Selector | No special handling, uses mechanism from table above
Basic/Single Emoji character with VS15 (Text) | No special handling, uses mechanism from table above
Basic/Single Emoji character with VS16 (Emoji) | 2
Emoji Sequence | 2 (only if sequence belongs to configured Emoji set)

The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji sets can be used:
Basic/Single Emoji character without Variation Selector | No special handling
Basic/Single Emoji character with VS15 (Text) | No special handling
Basic/Single Emoji character with VS16 (Emoji) | 2 (except with `emoji: :none` or `emoji: :all_no_vs16`
Emoji Sequence | 2 (only) if Emoji belongs to configured Emoji set

The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:

Option | Description | Example Terminals
-------|-------------|------------------
`emoji: true` or `emoji: :auto` | Automatically use recommended Emoji setting for your terminal | -
`emoji: false` or `emoji: :none` | No Emoji adjustments, Emoji characters with VS16 not handled | Gnome Terminal, many older terminals
`emoji: :basic` | Full-width VS16-Emoji, but no width adjustments for Emoji sequences: All partial Emoji treated separately with a width of 2 | ?
`emoji: :rgi_fqe` | Full-width VS16-Emoji, all fully-qualified RGI Emoji sequences are considered to have a width of 2 | ?
`emoji: :rgi_mqe` | Full-width VS16-Emoji, all fully- and minimally-qualified RGI Emoji sequences are considered to have a width of 2 | ?
`emoji: :rgi_uqe` | Full-width VS16-Emoji, all RGI Emoji sequences, regardless of qualification status are considered to have a width of 2 | ?
`emoji: :possible`| Full-width VS16-Emoji, all possible/well-formed Emoji sequences are considered to have a width of 2 | ?
`emoji: :all` | Full-width VS16-Emoji, all ZWJ/modifier/keycap sequences have a width of 2, even if they are not well-formed Emoji sequences | foot, Contour, WezTerm
`emoji: :all_no_vs16` | VS16-Emoji not handled, all ZWJ/modifier/keycap sequences to have a width of 2, even if they are not well-formed Emoji sequences | -

- *RGI Emoji:* Emoji Recommended for General Interchange
- *Qualification:* Whether an Emoji sequence has all required VS16 codepoints
- *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences

Example:

Option | Descriptions
-------|-------------
`emoji: true` | Use recommended Emoji set on your platform, see section below
`emoji: :basic` | No width adjustments for Emoji sequences: all partial Emoji treated separately
`emoji: :rgi_fqe` | All fully-qualified RGI Emoji sequences are considered to have a width of 2
`emoji: :rgi_mqe` | All fully- and minimally-qualified RGI Emoji sequences are considered to have a width of 2
`emoji: :rgi_uqe` | All RGI Emoji sequences, regardless of qualification status are considered to have a width of 2
`emoji: :all` | All possible/well-formed Emoji sequences are considered to have a width of 2
`emoji: false` | No Emoji adjustments, Emoji characters with VS16 not handled

*RGI Emoji:* Emoji Recommended for General Interchange

*Qualification:* Whether an Emoji sequence has all required VS16 codepoints
```ruby
Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_mqe # => 3 (2 for U+1f43b, 1 for U+2744)
Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_uqe # => 2
```

See [emoji-test.txt](https://www.unicode.org/Public/emoji/16.0/emoji-test.txt), the [unicode-emoji gem](/~https://github.com/janlelis/unicode-emoji) and [UTS-51](https://www.unicode.org/reports/tr51/#def_qualified_emoji_character) for more details about qualified and unqualified Emoji sequences.

#### Emoji Support in Terminals

Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` is used, the gem will attempt to set the best fitting Emoji set for you (e.g. `:rgi_uqe` on "Apple_Terminal" or `:basic` on Gnome's terminal widget).
Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_uqe` on "Apple_Terminal" or `:none` on Gnome's terminal widget).

Note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](/~https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project], which is a great resource that compares various terminal's Unicode/Emoji capabilities.

---

Please [open an issue](/~https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value.
To terminal implementors reading this: Although handling Emoji/ZWJ sequences as always having a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi_uqe` option (see table above) and just give those unknown Emoji the space they need? It is painful to implement, I know, but it kind of underlines the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…

You are encouraged to give your users the option to configure the level of Emoji support in your library or application and for the best developer experience in their terminals. (same is true for ambigouos width).
---

### Usage with String Extension

Expand Down
135 changes: 81 additions & 54 deletions lib/unicode/display_width.rb
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,14 @@ class DisplayWidth
WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
}
EMOJI_SEQUENCES_REGEX_MAPPING = {
rgi_fqe: :REGEX,
rgi_mqe: :REGEX_INCLUDE_MQE,
rgi_uqe: :REGEX_INCLUDE_MQE_UQE,
all: :REGEX_WELL_FORMED,
rgi_fqe: :REGEX,
rgi_mqe: :REGEX_INCLUDE_MQE,
rgi_uqe: :REGEX_INCLUDE_MQE_UQE,
possible: :REGEX_WELL_FORMED,
}
EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[🏻-🏿\u{FE0F}]?(\u{200D}.[🏻-🏿\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/

# Returns monospace display width of string
def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
Expand All @@ -53,7 +54,7 @@ def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **option
end
options[:overwrite] ||= {}

if options[:emoji] == nil || options[:emoji] == true
if [nil, true, :auto].include?(options[:emoji])
options[:emoji] = EmojiSupport.recommended
end

Expand Down Expand Up @@ -87,9 +88,9 @@ def self.width_ascii(string)

def self.width_frame(string, options)
# Retrieve Emoji width
if !options[:emoji]
if options[:emoji] == false || options[:emoji] == :none
res = 0
else options[:emoji]
else
res, string = emoji_width(
string,
options[:emoji],
Expand Down Expand Up @@ -163,58 +164,84 @@ def self.width_all_features(string, index_full, index_low, first_ambiguous, over
end


def self.emoji_width(string, sequences = :rgi_fqe)
def self.emoji_width(string, mode = :all)
res = 0

if regex = EMOJI_SEQUENCES_REGEX_MAPPING[sequences]
emoji_sequence_regex = Unicode::Emoji.const_get(regex)
else # sequences == :basic
emoji_sequence_regex = nil
end

# Make sure we have UTF-8
string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"

if emoji_sequence_regex
# For each string possibly an emoji
no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
# Skip notorious false positives
if EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
emoji_candidate

# Check if we have a combined Emoji with width 2
elsif emoji_candidate == emoji_candidate[emoji_sequence_regex]
res += 2
""

# We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
else
# Ensure all explicit VS16 sequences have width 2
emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji|
if basic_emoji.size == 2 # VS16 present
res += 2
""
else
basic_emoji
end
}

emoji_candidate
end
}
if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode]
emoji_width_via_possible(string, Unicode::Emoji.const_get(emoji_set_regex))
elsif mode == :all_no_vs16
emoji_width_all(string)
elsif mode == :basic
emoji_width_basic(string)
elsif mode == :all
res_all, string = emoji_width_all(string)
res_basic, string = emoji_width_basic(string)
[res_all + res_basic, string]
else
# Only consider basic emoji

# Ensure all explicit VS16 sequences have width 2
no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji|
if basic_emoji.size >= 2 # VS16 present
res += 2
""
else
basic_emoji
end
}
[0, string]
end
end

# Ensure all explicit VS16 sequences have width 2
def self.emoji_width_basic(string)
res = 0

no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji|
if basic_emoji.size >= 2 # VS16 present
res += 2
""
else
basic_emoji
end
}

[res, no_emoji_string]
end

# Use simplistic ZWJ/modifier/kecap sequence matching
def self.emoji_width_all(string)
res = 0

no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){
res += 2
""
}

[res, no_emoji_string]
end

# Match possible Emoji first, then refine
def self.emoji_width_via_possible(string, emoji_set_regex)
res = 0

# For each string possibly an emoji
no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
# Skip notorious false positives
if REGEX_EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
emoji_candidate

# Check if we have a combined Emoji with width 2
elsif emoji_candidate == emoji_candidate[emoji_set_regex]
res += 2
""

# We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
else
# Ensure all explicit VS16 sequences have width 2
emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji|
if basic_emoji.size == 2 # VS16 present
res += 2
""
else
basic_emoji
end
}

emoji_candidate
end
}

[res, no_emoji_string]
end
Expand Down
17 changes: 8 additions & 9 deletions lib/unicode/display_width/emoji_support.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,22 @@ def self.recommended
end

case ENV["TERM_PROGRAM"]
when "iTerm.app", "WezTerm"
when "Apple_Terminal", "iTerm.app"
return :all
when "Apple_Terminal"
return :rgi_uqe
when "WezTerm"
return :all_no_vs16
end

case ENV["TERM"]
when "foot"
when "contour","foot"
# konsole: all, how to detect?
return :all
when "contour"
return :rgi_uqe
when /kitty/
return :rgi_fqe
return :basic
end

# As of last time checked: gnome-terminal, vscode, alacritty, konsole
:basic
# As of last time checked: gnome-terminal, vscode, alacritty
:none
end

# Maybe: Implement something like /~https://github.com/jquast/ucs-detect
Expand Down
Loading

0 comments on commit 698ea9b

Please sign in to comment.