Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to enable passing unicode-range instead of glyphs? #6

Closed
bjrn opened this issue May 18, 2021 · 11 comments
Closed

Possibility to enable passing unicode-range instead of glyphs? #6

bjrn opened this issue May 18, 2021 · 11 comments

Comments

@bjrn
Copy link

bjrn commented May 18, 2021

subset-font allows for passing in a string with glyphs to subset, but would it be interesting to also include an option to pass a unicode-range like possible with pyftsubset?

I'm aware that subfont provides a conversion utility to convert a string to a unicode-range (for CSS output) but do you know if there's a standard-ish way of doing the opposite? I'm suspecting there might be a few weird exceptions to take into account?

@papandreou
Copy link
Owner

Good idea. Should be fairly straightforward.

I also wanted to look into whether it'd be possible to only include ligatures that might actually be exercised by the text provided. Eg. there's no reason to preserve an "ff" ligature if the text is "foof". Maybe it'd make sense to tackle those two ideas together.

@papandreou
Copy link
Owner

papandreou commented May 18, 2021

do you know if there's a standard-ish way of doing the opposite? I'm suspecting there might be a few weird exceptions to take into account?

Hmm, yeah, the U+4?? syntax looks like fun: https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face/unicode-range

In terms of subsetting I guess it's fine to just expand that to all the possible values, whether or not those codepoints actually exists in the font (or in the Unicode repertoire 😅 ). The subsetting code should just ignore the codepoints that don't exist in the original font.

@papandreou
Copy link
Owner

This module looks like it's up to the task: /~https://github.com/Japont/unicode-range

@bjrn
Copy link
Author

bjrn commented May 18, 2021

Good find! Yes, I made a super-naïve test-script, before I stumbled upon that U+4?? syntax 😬, will take a look at that one! II completely understand if you want to keep this library small and focused, and that specifying a unicode-range might be an edge case which is better solved with providing an example in the readme where the conversion takes place prior to calling subset-font. I'll play around a bit with it and get back.

no reason to preserve an "ff" ligature if the text is "foof"

true … but isn't the text converted to a Set (of sorts, I'm not familiar with harfbuzz) and sorted?

@papandreou
Copy link
Owner

I'll play around a bit with it and get back.

Great! Good luck! 🍀

no reason to preserve an "ff" ligature if the text is "foof"

true … but isn't the text converted to a Set (of sorts, I'm not familiar with harfbuzz) and sorted?

Yes, I think we'll have to go even more low level when instructing harfbuzz about which glyphs to include -- if that's even supported 😬

@bjrn
Copy link
Author

bjrn commented May 19, 2021

const path = require('path');
const { readFile, writeFile } = require('fs').promises;
const subsetFont = require('subset-font');
const { UnicodeRange } = require('@japont/unicode-range');

const latinRange = 'U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6, U+02DA, U+02DC, U+2000-206F, U+2074, U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD';

// util to handle passing unicode-range as a string
function formatRange(range) {
  if (typeof range === 'string') {
    return range.replace(/\s*/g, '').split(',');
  }
  return range;
}


function getGlyphsFromUnicodeRange(range) {
  // UnicodeRange currently requires an array of ranges …
  const rangeArray = formatRange(range);

  const glyphs = UnicodeRange.parse(rangeArray).map((cp) =>
    String.fromCodePoint(cp)
  );

  return glyphs;
}


async function generateFont() {
  const font = await readFile(
    path.resolve(__dirname, 'woff2', 'SomeFontFile.woff2')
  );

  const glyphs = getGlyphsFromUnicodeRange(latinRange);

  const result = await subsetFont(font, glyphs, {
    targetFormat: 'woff2',
  });

 // ... and so on
}

Did a quick try, and from what I can tell so far, that library does the trick 👍🏼 .

I don't know how you feel, but figuring out which glyphs to subset might seem a bit out of scope for subset-font after all (in the same way as subfont handles parsing of content etc.). Let me know if you want me to make a PR with an example, or anything regarding this.

Skipping unused ligatures is an interesting one, depending on language group there might be some savings. I have mostly thought about it as a on/off thing, (ie. liga is either enabled or disabled for the font). In my current use-case, there's a mix of static and dynamic content, hence the need to subset fonts based on unicode-range rather than individual codepoints … I would love to dive deeper into it though

@papandreou
Copy link
Owner

Great that you got it to work! Thanks for sharing your solution. I agree with your scope concern. Let's leave it here for now and see if it comes up as a common request. Maybe we can even add a link to this issue to the README.

Skipping unused ligatures is an interesting one, depending on language group there might be some savings. I have mostly thought about it as a on/off thing, (ie. liga is either enabled or disabled for the font).

I'll probably explore it one day when I have time. I'm not sure that the savings will be big either, it's mostly from a perfectionist angle. Spending years hunting down these kilobyte savings does that to you 🙈

In my current use-case, there's a mix of static and dynamic content, hence the need to subset fonts based on unicode-range rather than individual codepoints … I would love to dive deeper into it though

Ah yes, that makes sense! Btw. subfont has an experimental --dynamic switch that renders the pages in a headless browser and does additional tracing inside it. But it might not work for you, depends on exactly how dynamic the content is :)

@papandreou
Copy link
Owner

I'd also be happy to entertain the idea of configuring subfont to include a given unicode-range of characters in the subsets, regardless of what the tracing step says. It wouldn't really be hard to do, I think the main challenge would be to come up with a way to configure it if it has to be configurable per @font-family declaration.

@bjrn
Copy link
Author

bjrn commented May 19, 2021

Yes it could be a good fit within subfonts scope actually — much of the tooling around generating @font-face declarations would be useful, just that instead of deriving unicode-range from parsed content, it would be provided by the configuration.

Regarding the per @font-family declaration, that is a tricky one, since I guess much of the idea behind subfont is to enable it as a drop-in addition to static site generators

@papandreou
Copy link
Owner

Yeah, that is the core use case, but I'm not opposed to exposing more controls like that. We could even do it as a custom CSS property in the @font-face rule, eg.:

@font-face {
  font-family: foo;
  src: ...;
  font-weight: 700;
  -subfont-unicode-range: U+0131, U+0152-0153, U+02BB-02BC;
}

@papandreou
Copy link
Owner

For what it's worth, Munter/subfont#161 implemented the ability to specify text to include in the subset via -subfont-text.

I'll close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants