Skip to content

Latest commit

 

History

History
52 lines (35 loc) · 2.54 KB

File metadata and controls

52 lines (35 loc) · 2.54 KB

Word boundary: \b

A word boundary pattern:\b is a test, just like pattern:^ and pattern:$.

When the regexp engine (program module that implements searching for regexps) comes across pattern:\b, it checks that the position in the string is a word boundary.

There are three different positions that qualify as word boundaries:

  • At string start, if the first string character is a word character pattern:\w.
  • Between two characters in the string, where one is a word character pattern:\w and the other is not.
  • At string end, if the last string character is a word character pattern:\w.

For instance, regexp pattern:\bJava\b will be found in subject:Hello, Java!, where subject:Java is a standalone word, but not in subject:Hello, JavaScript!.

alert( "Hello, Java!".match(/\bJava\b/) ); // Java
alert( "Hello, JavaScript!".match(/\bJava\b/) ); // null

In the string subject:Hello, Java! following positions correspond to pattern:\b:

So, it matches the pattern pattern:\bHello\b, because:

  1. At the beginning of the string matches the first test pattern:\b.
  2. Then matches the word pattern:Hello.
  3. Then the test pattern:\b matches again, as we're between subject:o and a comma.

So the pattern pattern:\bHello\b would match, but not pattern:\bHell\b (because there's no word boundary after l) and not Java!\b (because the exclamation sign is not a wordly character pattern:\w, so there's no word boundary after it).

alert( "Hello, Java!".match(/\bHello\b/) ); // Hello
alert( "Hello, Java!".match(/\bJava\b/) );  // Java
alert( "Hello, Java!".match(/\bHell\b/) );  // null (no match)
alert( "Hello, Java!".match(/\bJava!\b/) ); // null (no match)

We can use pattern:\b not only with words, but with digits as well.

For example, the pattern pattern:\b\d\d\b looks for standalone 2-digit numbers. In other words, it looks for 2-digit numbers that are surrounded by characters different from pattern:\w, such as spaces or punctuation (or text start/end).

alert( "1 23 456 78".match(/\b\d\d\b/g) ); // 23,78
alert( "12,34,56".match(/\b\d\d\b/g) ); // 12,34,56

```warn header="Word boundary pattern:\b doesn't work for non-latin alphabets" The word boundary test `pattern:\b` checks that there should be `pattern:\w` on the one side from the position and "not `pattern:\w`" - on the other side.

But pattern:\w means a latin letter a-z (or a digit or an underscore), so the test doesn't work for other characters, e.g. cyrillic letters or hieroglyphs.