Skip to content

A haxelib for consistent cross-platform UTF-8 string manipulation.

License

Notifications You must be signed in to change notification settings

vegardit/haxe-strings

Repository files navigation

haxe-strings - StringTools on steroids.

Build Status Release License Contributor Covenant

  1. What is it?
  2. The Strings utility class
  3. The String8 type
  4. The spell checker
  5. The string collection classes
  6. The StringBuilder class
  7. The Ansi class
  8. Random string generation
  9. Semantic version parsing with the Version type
  10. Installation
  11. Using the latest code
  12. License

What is it?

A haxelib for consistent cross-platform UTF-8 string manipulation.

All classes are located in the package hx.strings or below.

The library has been extensively unit tested (over 1,700 individual test cases) on the targets C++, C#, Eval, Flash, HashLink, Java, JavaScript (Node.js and PhantomJS), Neko, PHP 7 and Python 3.

Note:

  • When targeting PHP ensure the php_mbstring extension is enabled in the php.ini file. This extension is required for proper UTF-8 support.

Haxe compatiblity

haxe-strings Haxe
1.0.0 to 4.0.0 3.2.1 or higher
5.0.0 to 5.2.4 3.4.2 or higher
6.0.0 or higher 4.0.5 or higher

The Strings utility class

The hx.strings.Strings class provides handy utility methods for string manipulations.

It also contains improved implementations of functions provided by Haxe's StringTools class.

Methods ending with the letter 8 (e.g. length8(), indexOf8(), toLowerCase8()) are improved versions of similar methods provided by Haxe's String class, offering UTF-8 support and consistent behavior across all platforms (including Neko).

The hx.strings.Strings class can also be used as a static extension.

Some examples

using hx.strings.Strings; // augment all Strings with new functions

class Test {

   static function main() {
      // Strings are extended:
      "".isEmpty();      // returns true
      "   ".isBlank();   // returns true
      "123".isDigits();  // returns true
      "a".repeat(3);     // returns "aaa"
      "abc".reverse();   // returns "cba"

      // Int's are extended too:
      32.toChar().isSpace();       // returns true
      32.toChar().toString();      // returns " "
      32.toChar().isAscii();       // returns true
      6000.toChar().isAscii();     // returns false
      6000.toChar().isUTF8();      // returns true
      74.toHex();                  // returns "4A"

      // all functions are null-safe:
      var nullString:String = null;
      nullString.length8();         // returns 0
      nullString.contains("cat");   // returns false

      // all methods support UTF-8 on all platforms:
      "кот".toUpperCase8();         // returns "КОТ"
      "кот".toUpperCaseFirstChar(); // returns "Кот"
      "はいはい".length8();          // returns 4

      // ANSI escape sequence processing:
      "\x1B[1;33mHello World!\x1B[0m".ansiToHtml();            // returns '<span style="color:yellow;font-weight:bold;">Hello World!</span>'
      "\x1B[1;33mHello World!\x1B[0m".ansiToHtml(CssClasses);  // returns '<span class="ansi_fg_yellow ansi_bold">Hello World!</span>'
      "\x1B[1mHello World!\x1B[0m".removeAnsi();               // returns "Hello World!"

      // It is also possible to fully customize the css class names used using a callback:
      "\x1B[1;33mHello World!\x1B[0m".ansiToHtml(CssClassesCallback(function(st:hx.strings.AnsiState):String {
         var a : Array<String> = [];
         if (st.fgcolor != null) a.push("someprefix_fg_" + st.fgcolor + "_somesuffix");
         if (st.bgcolor != null) a.push("someprefix_bg_" + st.fgcolor);
         if (st.bold)            a.push("someprefix_bold");
         if (st.underline)       a.push("someprefix_underline");
         if (st.blink)           a.push("someprefix_blink");
         return a.join(" ");
       })));  // returns '<span class="ansi_fg_yellow ansi_bold">Hello World!</span>'

      // case formatting:
      "look at me".toUpperCamel();       // returns "LookAtMe"
      "MyCSSClass".toLowerCamel();       // returns "myCSSClass"
      "MyCSSClass".toLowerHyphen();      // returns "my-css-class"
      "MyCSSClass".toLowerUnderscore();  // returns "my_css_class"
      "myCSSClass".toUpperUnderscore();  // returns "MY_CSS_CLASS"

      // ellipsizing strings:
      "The weather is very nice".ellipsizeLeft(20);    // returns "The weather is ve..."
      "The weather is very nice".ellipsizeMiddle(20);  // returns "The weath...ery nice"
      "The weather is very nice".ellipsizeRight(20);   // returns "...ther is very nice"

      // string differences:
      "It's green".diffAt("It's red"); // returns 5
      "It's green".diff("It's red");   // returns { left: 'green', right: 'red', at: 5 }

      // hash codes:
      "Cool String".hashCode();       // generate a platform specific hash code
      "Cool String".hashCode(CRC32);  // generate a hash code using CRC32
      "Cool String".hashCode(JAVA);   // generate a hash code using the Java hash algorithm

      // cleanup:
      "/my/path/".removeLeading("/");       // returns "my/path/"
      "/my/path/".removeTrailing("/");      // returns "/my/path"
      "<i>So</i> <b>nice</b>".removeTags(); // returns "So nice"
   }
}

The String8 type

The hx.strings.String8 is an abstract type based on String. All exposed methods are UTF-8 compatible and have consistent behavior across platforms.

It can be used as a drop-in replacement for type String.

Example usage:

class Test {
   static function main() {
      var str:String = "はいはい";  // create a string with UTF8 chars
      str.length;  // will return different values depending on the default UTF8 support of the target platform

      var str8:String8 = str; // we assign the string to a variable of type String8 - because of the nature of Haxe`s abstract types this will not result in the creation of a new object instance
      str8.length;  // will return the correct character length on all platforms

      str8.ellipsizeLeft(2); // the String8 type automatically has all utility string functions provided by the Strings class.
   }
}

The type declaration in the Strings8.hx file is nearly empty because all methods are auto generated based on the static methods provided by the hx.strings.Strings class.

The spell checker

The package hx.strings.spelling contains an extensible spell checker implementation that is based on ideas outlined by Peter Norvig in his article How to write a Spell Checker.

The SpellChecker#correctWord() method can for example be used to implement a Google-like "did you mean 'xyz'?" feature for a custom search engine.

Now let's do some spell checking...

import hx.strings.spelling.checker.*;
import hx.strings.spelling.dictionary.*;
import hx.strings.spelling.trainer.*;

class Test {
   static function main() {
      /*
       * first we use the English spell checker with a pre-trained dictionary
       * that is bundled with the library:
       */
      EnglishSpellChecker.INSTANCE.correctWord("speling");  // returns "spelling"
      EnglishSpellChecker.INSTANCE.correctWord("SPELING");  // returns "spelling"
      EnglishSpellChecker.INSTANCE.correctWord("SPELLING"); // returns "spelling"
      EnglishSpellChecker.INSTANCE.correctWord("spell1ng"); // returns "spelling"
      EnglishSpellChecker.INSTANCE.correctText("sometinG zEems realy vrong!") // returns "something seems really wrong!"
      EnglishSpellChecker.INSTANCE.suggestWords("absance"); // returns [ "absence", "advance", "balance" ]

      /*
       * let's check the pre-trained German spell checker
       */
      GermanSpellChecker.INSTANCE.correctWord("schreibweise");  // returns "Schreibweise"
      GermanSpellChecker.INSTANCE.correctWord("Schreibwiese");  // returns "Schreibweise"
      GermanSpellChecker.INSTANCE.correctWord("SCHREIBWEISE");  // returns "Schreibweise"
      GermanSpellChecker.INSTANCE.correctWord("SCHRIBWEISE");   // returns "Schreibweise"
      GermanSpellChecker.INSTANCE.correctWord("Schre1bweise");  // returns "Schreibweise"
      GermanSpellChecker.INSTANCE.correctText("etwaz kohmische Aepfel ligen vör der Thür"); // returns "etwas komische Äpfel liegen vor der Tür"
      GermanSpellChecker.INSTANCE.suggestWords("Sistem");       // returns[ "System", "Sitte", "Sitten" ]

      /*
       * now we train our own dictionary from scratch
       */
      var myDict = new InMemoryDictionary("English");
      // download some training text with good vocabular
      var trainingText = haxe.Http.requestUrl("http://www.norvig.com/big.txt");
      // populating the dictionary might take a while:
      EnglishDictionaryTrainer.INSTANCE.trainWithString(myDict, trainingText);
      // let's use the trained dictionary with a spell checker
      var mySpellChecker = new EnglishSpellChecker(myDict);
      mySpellChecker.INSTANCE.correctWord("speling");  // returns "spelling"

      // since training a dictionary can be quite time consuming, we save
      // the analyzed words and their popularity/frequency to a file
      myDict.exportWordsToFile("myDict.txt");

      // the word list can later be loaded using
      myDict.loadWordsFromFile("myDict.txt");
   }
}

The string collection classes

The package hx.strings.collection contains some useful collection classes for strings.

  1. StringSet is a collection of unique strings. Each string is guaranteed to only exists once within the collection.

    var set = new hx.strings.collection.StringSet();
    set.add("a");
    set.add("a");
    set.add("b");
    set.add("b");
    // at this point the set only contains two elements: one 'a' and one 'b'
  2. SortedStringSet is a sorted collection of unique strings. A custom comparator can be provided for using different sorting algorithm.

  3. SortedStringMap is a map that is sorted by there keys (which are of type String).

The StringBuilder class

The hx.strings.StringBuilder class is an alternative to the built-in StringBuf. It provides an fluent API, cross-platform UTF-8 support and the ability to insert Strings at arbitrary positions.

import hx.strings.StringBuilder;

class Test {
   static function main() {
      // create a new instance with initial content
      var sb = new StringBuilder("def");

      // insert / add some strings via fluent API calls
      sb.insert(0, "abc")
         .newLine()   // appends "\n"
         .add("ghi")
         .addChar(106);

      sb.toString();  // returns "abcdef\nghij\n"

      sb.clear();     // reset the internal state

      sb.addAll(["a", 1, true, null]);

      sb.toString();  // returns "a1truenull"
   }
}

The Ansi class

The hx.strings.ansi.Ansi class provides functionalities to write ANSI escape sequences in a type-safe manner.

import hx.strings.ansi.Ansi;

class Test {
   static function main() {
      var stdout = Sys.stdout();

      stdout.writeString(Ansi.fg(RED));           // set the text foreground color to red
      stdout.writeString(Ansi.bg(WHITE));         // set the text background color to white
      stdout.writeString(Ansi.attr(BOLD));        // make the text bold
      stdout.writeString(Ansi.attr(RESET));       // reset all color or text attributes
      stdout.writeString(Ansi.clearScreen());     // clears the screen
      stdout.writeString(Ansi.cursor(MoveUp(2))); // moves the cursor 2 lines up

      // now let's work with the fluent API:
      var writer = Ansi.writer(stdout); // supports StringBuf, haxe.io.Ouput and hx.strings.StringBuilder
      writer
         .clearScreen()
         .cursor(GoToPos(10,10))
         .fg(GREEN).bg(BLACK).attr(ITALIC).write("How are you?").attr(RESET)
         .cursor(MoveUp(2))
         .fg(RED).bg(WHITE).attr(UNDERLINE).write("Hello World!").attr(UNDERLINE_OFF)
         .flush();
   }
}

Random string generation

The hx.strings.RandomStrings class contains methods to generate different types of random strings, e.g. UUIDs or alpha-numeric sequences.

import hx.strings.RandomStrings;
using hx.strings.Strings;

class Test {
   static function main() {
      RandomStrings.randomUUIDv4(); // generates a UUID according to RFC 4122 UUID Version 4, e.g. "f3cdf7a7-a179-464b-ae98-83f6659ae33f"
      RandomStrings.randomDigits(4); // generates a 4-char numeric strings, e.g. "4832"
      RandomStrings.randomAsciiAlphaNumeric(8); // generates a 8-char ascii alph numeric strings, e.g. "aZvDF34L"
      RandomStrings.randomSubstring("abcdefghijlkmn", 4); // returns a random 4-char substring of the given string, e.g. "defg"
   }
}

Semantic version parsing with the Version type

The hx.strings.Version type provides functionalities for parsing of and working with version strings following the SemVer 2.0 Specification.

import hx.strings.Version;

class Test {
   static function main() {

      var ver:Version;

      ver = new Version(11, 2, 4);
      ver.major;                  // returns 11
      ver.minor;                  // returns 2
      ver.patch;                  // returns 4
      ver.toString();             // returns '11.2.4'
      ver.nextPatch().toString(); // returns '11.2.5'
      ver.nextMinor().toString(); // returns '11.3.0'
      ver.nextMajor().toString(); // returns '12.0.0'

      ver = Version.of("11.2.4-alpha.2+exp.sha.141d2f7");
      ver.major;            // returns 11
      ver.minor;            // returns 2
      ver.patch;            // returns 4
      ver.isPreRelease;     // returns true
      ver.preRelease;       // returns 'alpha.2'
      ver.buildMetadata;    // returns 'exp.sha.141d2f7'
      ver.hasBuildMetadata; // returns true
      ver.nextPreRelease().toString(); // returns "11.2.4-alpha.3"

      var v1_0_0:Version = "1.0.0";
      var v1_0_1:Version = "1.0.1";

      v1_0_0 < v1_0_1;       // returns true
      v1_0_1 >= v1_0_0;      // returns true

      v1_0_1.isGreaterThan(v1_0_0); // returns true
      v1_0_1.isLessThan(Version.of("1.0.0"); // returns false

      Version.isValid("foobar"); // returns false
   }
}

Installation

  1. install the library via haxelib using the command:

    haxelib install haxe-strings
    
  2. use in your Haxe project

    • for OpenFL/Lime projects add <haxelib name="haxe-strings" /> to your project.xml
    • for free-style projects add -lib haxe-strings to your *.hxml file or as command line option when running the Haxe compiler

Using the latest code

Using haxelib git

haxelib git haxe-strings /~https://github.com/vegardit/haxe-strings main D:\haxe-projects\haxe-strings

Using Git

  1. check-out the main branch

    git clone /~https://github.com/vegardit/haxe-strings --branch main --single-branch D:\haxe-projects\haxe-strings --depth=1
    
  2. register the development release with haxe

    haxelib dev haxe-strings D:\haxe-projects\haxe-strings
    

License

All files are released under the Apache License 2.0.

Individual files contain the following tag instead of the full license text:

SPDX-License-Identifier: Apache-2.0

This enables machine processing of license information based on the SPDX License Identifiers that are available here: https://spdx.org/licenses/.