Add translations and touch-ups for Fun with Fonts on the Web - blog

commit c4f51a0eea15f7bd907391b2508b063482034a8c
parent 55324bf734965d507368599ee23c97d49cec2bc3
Author: Shimmy Xu <shimmy.xu@shimmy1996.com>
Date:   Mon,  2 Dec 2019 22:32:52 -0500

Add translations and touch-ups for Fun with Fonts on the Web

Diffstat:
M content/posts/2019-12-01-fun-with-fonts-on-the-web.en.md  | 20 ++++++++++----------
A content/posts/2019-12-01-fun-with-fonts-on-the-web.zh.md  | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M org/2019.org  | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------

3 files changed, 242 insertions(+), 21 deletions(-)
diff --git a/content/posts/2019-12-01-fun-with-fonts-on-the-web.en.md b/content/posts/2019-12-01-fun-with-fonts-on-the-web.en.md
@@ -8,7 +8,7 @@ draft = false
 A more accurate version of the title probably should be "Fun with Fonts in Web Browsers", but oh well, it sounds cooler that way. Text rendering is [hard](https://gankra.github.io/blah/text-hates-you/), and it certainly doesn't help that we have a plethora of different writing systems (blame the Tower of Babel for that, I guess) which cannot be elegantly fitted into a uniform system. Running a bilingual blog doubles the trouble in font picking, and here's a compilation of the various problems I encountered.
 
 
-## SPAAAAACE! {#spaaaaace}
+## Space Invaders {#space-invaders}
 
 Most browsers join consecutive lines of text in HTML to a single one with an added space in between, so
 
@@ -33,9 +33,9 @@ Such a simplistic rule doesn't work for CJK languages where no separators is use
 If your browser is smart enough (like Firefox), it will join the lines correctly. All the Blink based browsers, however, still stubbornly shove in the extra space, so it looks like I will be stuck in unwrapped source files like a barbarian for a bit longer. While not a cure-all solution, specifying the `lang` attribute still have the added benefit of enabling language-specific CSS rules, which comes in handy later.
 
 
-## Quotation Marks {#quotation-marks}
+## Return of the Quotation Marks {#return-of-the-quotation-marks}
 
-As mentioned in a [previous post](http://localhost:1313/en/posts/2018-06-24-fun-with-fonts-in-emacs/), CJK fonts would render quotation marks as full-width characters, different from Latin fonts. This won't be a problem as long as a web page doesn't try to mix-and-match fonts: just use language specific font-stack like so:
+As mentioned in a [previous post](https://www.shimmy1996.com/en/posts/2018-06-24-fun-with-fonts-in-emacs/), CJK fonts would render quotation marks as full-width characters, different from Latin fonts. This won't be a problem as long as a web page doesn't try to mix-and-match fonts: just use language specific font-stack.
 
 ```css
 body:lang(en) {
@@ -49,7 +49,7 @@ body:lang(zh) {
 
 Coupled with matching `lang` attributes, the story would have ended here. Firefox even allows you to specify default fonts on a per language basis, so you can actually get away with just the fallback values, like `sans-serif` or `serif`, and not even bother writing language specific CSS.
 
-However, what if I want to use Oxygen Sans for Latin characters, Noto Sans SC for CJK characters? While seemingly an sensible solution, specifying font stack like so
+However, what if I want to use Oxygen Sans for Latin characters, Noto Sans SC for CJK characters? While seemingly an sensible solution, specifying font stack like so,
 
 ```css
 body:lang(zh) {
@@ -78,18 +78,18 @@ body:lang(zh) {
 Now we can enjoy the quotation marks in their full-width glory!
 
 
-## Slicing and Dicing {#slicing-and-dicing}
+## Font Ninja {#font-ninja}
 
 Font files are quite significant in size, and even more so for CJK ones: the Noto Sans SC font just mentioned is [over 8MB](https://github.com/googlefonts/noto-cjk/blob/master/NotoSansSC-Regular.otf) in size. No matter how determined I am to serve everything from my own server, this seems like an utter overkill considering the average HTML file size on my site is probably closer to 8KB. How does all the web font services handle this then?
 
 Most web font services work by adding a bunch of [`@font-face`](https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face) definitions into a website's style sheet, which pulls font files from dedicated servers. To reduce the size of files been served, Google Fonts slice the font file into smaller chunks, and declare corresponding `unicode-range` for each chunk under `@font-face` blocks (this is exactly how they handle [CJK fonts](https://fonts.googleapis.com/css?family=Noto+Sans+SC)). They also compress the font files into WOFF2, further reducing file size. On the other hand, [Adobe Fonts](https://fonts.adobe.com/) (previously known as Typekit) seem to have some JavaScript wizardry that dynamically determines which glyphs to load from a font file.
 
-Combining best of both worlds, and thanks to the fact that this is a static site, it is easy to gather all the used characters and serve a font file containing just that. The tools of choice here would be pyftsubset (available as a component of [fonttools](https://pypi.org/project/fonttools/)) and GNU awk. Compressing font files into WOFF2 also requires Brotli, a compression library. Under Arch Linux, the required packages are [python-fonttools](https://www.archlinux.org/packages/community/any/python-fonttools/), [gawk](https://www.archlinux.org/packages/core/x86%5F64/gawk/), [brotli](https://www.archlinux.org/packages/community/x86%5F64/brotli/), and [python-brotli](https://www.archlinux.org/packages/community/x86%5F64/python-brotli/).
+Combining best of both worlds, and thanks to the fact that this is a static site, it is easy to gather all the used characters and serve a font file containing just that. The tools of choice here would be pyftsubset (available as a component of [fonttools](https://pypi.org/project/fonttools/)) and GNU AWK. Compressing font files into WOFF2 also requires Brotli, a compression library. Under Arch Linux, the required packages are [python-fonttools](https://www.archlinux.org/packages/community/any/python-fonttools/), [gawk](https://www.archlinux.org/packages/core/x86%5F64/gawk/), [brotli](https://www.archlinux.org/packages/community/x86%5F64/brotli/), and [python-brotli](https://www.archlinux.org/packages/community/x86%5F64/python-brotli/).
 
 Here's a shell one-liner to collect all the used glyphs from generated HTML files:
 
 ```sh
-find -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS="";ORS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' > glyphs.txt
+find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS="";ORS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' > glyphs.txt
 ```
 
 You may need to `export LANG=en_US.UTF-8` (or any other UTF-8 locale) for certain glyphs to be handled correctly. With the list of glyphs, we can extract the useful part of font files and compress them:
@@ -98,7 +98,7 @@ You may need to `export LANG=en_US.UTF-8` (or any other UTF-8 locale) for certai
 pyftsubset NotoSansSC-Regular.otf --text-file=glyphs.txt --flavor=woff2 --output-file=NotoSansSC-Regular.woff2
 ```
 
-Specifying `--no-hinting` and `--desubroutinize` can further reduce size of generated file at the cost of some aesthetic fine-tuning. A similar technique can be used to shrink down Latin fonts to include only ASCII characters (or keep the extended ASCII range with `U+0000-00FF`)
+Specifying `--no-hinting` and `--desubroutinize` can further reduce size of generated file at the cost of some aesthetic fine-tuning. A similar technique can be used to shrink down Latin fonts to include only ASCII characters (or keep the extended ASCII range with `U+0000-00FF`):
 
 ```sh
 pyftsubset Oxygen-Sans.ttf --unicodes="U+0000-007F" --flavor=woff2 --output-file=Oxygen-Sans.woff2
@@ -109,13 +109,13 @@ Once this is done, available glyphs can be checked using most font manager softw
 I also played around the idea of actually dividing the glyphs into further chunks by popularity, so here's another one liner to get list of glyphs sorted by number of appearances
 
 ```sh
-find -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]++;}} END{for(c in chars){printf "%06d %s\n", chars[c], c;}}' | sort -r > glyph-by-freq.txt
+find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]++;}} END{for(c in chars){printf "%06d %s\n", chars[c], c;}}' | sort -r > glyph-by-freq.txt
 ```
 
 It turns out my blog has around 1000 different Chinese characters, with roughly 400 of them appearing more than 10 times. Since the file sizes I get from directly a single subsetting is already good enough, I didn't bother proceeding with another split.
 
 
-## See the Web Through My Browser {#see-the-web-through-my-browser}
+## For Your Browsers Only {#for-your-browsers-only}
 
 With all the tricks in my bag, I was able to cut down the combined font file size to around 250KB, still magnitudes above that of an HTML file though. While it is nice to see my site appearing the same across all devices and screens, I feel the benefit is out of proportion compared to the 100-fold increase in page size.
 
diff --git a/content/posts/2019-12-01-fun-with-fonts-on-the-web.zh.md b/content/posts/2019-12-01-fun-with-fonts-on-the-web.zh.md
@@ -0,0 +1,126 @@
++++
+title = "字体配置万维网篇"
+date = 2019-12-01
+slug = "fun-with-fonts-on-the-web"
+draft = false
++++
+
+用《字体配置浏览器篇》作为标题或许更为准确，不过现在的标题听起来更吸引人一些。渲染文本 [不是一件简单的事](https://gankra.github.io/blah/text-hates-you/) ，如果还要考虑书写系统之间的巨大差异（这大概得怪巴别塔）无异于雪上加霜。运行双语博客会导致字体选择的麻烦加倍，这里是我遇到的一些问题的汇总。
+
+
+## 空格侵略者 {#空格侵略者}
+
+大多数浏览器会将 HTML 中的连续文本合并为一行，并在链接处加上空格。所以
+
+```html
+<html>Line one and
+line two.</html>
+```
+
+会被渲染为
+
+```text
+Line one and line two.
+```
+
+这种一刀切的方法显然不适用与字符之间不带分隔的 CJK 语言。解决方案是为页面（或页面上的任何特定元素）指定 `lang` 属性，如下所示：
+
+```html
+<html lang="zh">第一行和
+第二行。</html>
+```
+
+如果你的浏览器足够聪明（例如 Firefox），渲染的结果就不会有额外的空格。但是，所有基于 Blink 的浏览器仍然顽固地将多余的空格塞进去，所以我只能像野蛮人那样继续写一段一行的源文件。尽管不是万能的解决方案，但是指定 `lang` 属性仍然具有启用特定于某种语言的CSS规则的额外好处，这稍后会派上用场。
+
+
+## 引号归来 {#引号归来}
+
+如 [之前的日志](https://www.shimmy1996.com/zh/posts/2018-06-24-fun-with-fonts-in-emacs/) 所说， CJK 字体会将引号显示为全角字符，不同于拉丁字体。只要网页不尝试混搭字体，这就不会成为问题：只需使用特定于语言的字体栈就行。
+
+```css
+body:lang(en) {
+    font-family: "Oxygen Sans", sans-serif;
+}
+
+body:lang(zh) {
+    font-family: "Noto Sans SC", sans-serif;
+}
+```
+
+再加上匹配的 `lang` 属性，所有问题就都解决了。 Firefox 甚至允许为每种语言指定默认字体，所以仅使用后备字体（例如 `sans-serif` 或 `serif` ）也可行，不一定要费心编写特定于语言的CSS。
+
+那么，如果我想用 Oxygen Sans 来渲染拉丁字符，并用 Noto Sans SC 来渲染 CJK 字符怎么办？虽然看似没有问题，但像这样指定字体堆栈，
+
+```css
+body:lang(zh) {
+    font-family: "Oxygen Sans", "Noto Sans SC", sans-serif;
+}
+```
+
+会导致引号被 Oxygen Sans 渲染、显示为半角字符。我的解决方案是通过 `unicode-range` 定义一个涵盖了引号的替代字体，
+
+```css
+@font-face {
+    font-family: "Noto Sans SC Override";
+    unicode-range: U+2018-2019, U+201C-201D;
+    src: local("NotoSansCJKsc-Regular");
+}
+```
+
+并修改字体栈为
+
+```css
+body:lang(zh) {
+    font-family: "Noto Sans SC Override", "Oxygen Sans", "Noto Sans SC", sans-serif;
+}
+```
+
+这样我们就可以享受全角引号了！
+
+
+## 字体忍者 {#字体忍者}
+
+字体文件通常都不小，对于 CJK 字体来说更是如此：刚才提到的 Noto Sans SC 的大小 [超过8MB](https://github.com/googlefonts/noto-cjk/blob/master/NotoSansSC-Regular.otf) 。尽管我已经下定主意要从自己的服务器上提供所有文件，考虑到我网站上的平均 HTML 文件大小更接近 8KB，这显得有些过头了。那么那些网络字体服务如何处理这一问题呢？
+
+大多数网络字体服务的工作方式是在网站的样式表里添加一堆 [`@font-face` ](https://developer.mozilla.org/zh-CN/docs/Web/CSS/@font-face)定义，以从专用服务器上提取字体文件。为了减少所提供的文件大小， Google Fonts 会将字体文件大卸八块，并在 `@font-face` 里声明每一块所对应的 `unicode-range` （这正是它们处理 [CJK 字体](https://fonts.googleapis.com/css?family=Noto+Sans+SC) 的方式）。他们还将字体文件压缩为 WOFF2 以进一步缩减文件大小。而 [Adobe Fonts](https://fonts.adobe.com/) （以前称为 Typekit ）似乎有一些 JavaScript 奇技淫巧，可以动态确定要从字体文件加载的字形。
+
+博采众家之长，得益于这是一个静态站点，我们可以简单地统计所有用到的字符，并提供一个只包含这些字符的字体文件。所要用到的工具主要是 pyftsubset （属于 [fonttools](https://pypi.org/project/fonttools/) 下的一个组件）和 GNU AWK 。将字体压缩为 WOFF2 还需要 Brotli 压缩库。在 Arch Linux 下，获取这些程序需要安装 [python-fonttools](https://www.archlinux.org/packages/community/any/python-fonttools/) 、 [gawk](https://www.archlinux.org/packages/core/x86%5F64/gawk/) 、 [brotli](https://www.archlinux.org/packages/community/x86%5F64/brotli/) 和 [python-brotli](https://www.archlinux.org/packages/community/x86%5F64/python-brotli/) 。
+
+收集生成的HTML文件中的所有使用的字形可以使用这条 shell 命令：
+
+```sh
+find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS="";ORS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' > glyphs.txt
+```
+
+你可能需要 `export LANG=en_US.UTF-8` （或者其他 UTF-8 语言环境）以便正确处理某些字形。有了字形清单，我们就可以提取字体文件的有用部分并进行压缩：
+
+```sh
+pyftsubset NotoSansSC-Regular.otf --text-file=glyphs.txt --flavor=woff2 --output-file=NotoSansSC-Regular.woff2
+```
+
+指定 `--no-hinting` 和 `--desubroutinize` 可以进一步减小生成文件的大小，但会降低字体的美观程度。拉丁字体也可以使用类似的技术来瘦身，例如只提取包含 ASCII 字符的部分（或将范围设为 `U+0000-00FF` 以涵盖 Extended ASCII 字符）：
+
+```sh
+pyftsubset Oxygen-Sans.ttf --unicodes="U+0000-007F" --flavor=woff2 --output-file=Oxygen-Sans.woff2
+```
+
+大部分字体管理器都可以用来检查最后生成文件中可用的字形，也可以使用这一 [在线检查器](http://torinak.com/font/lsfont.html) （不支持 WOFF2，但是可以先试着转为其他格式后查看，例如 WOFF）。
+
+我还考虑过将字形按受欢迎程度划分为更多块。获取按出现次数排序的字形列表可以使用以下命令：
+
+```sh
+find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]++;}} END{for(c in chars){printf "%06d %s\n", chars[c], c;}}' | sort -r > glyph-by-freq.txt
+```
+
+结果显示我的博客用到了大约 1000 个不同的汉字，其中大约 400 个出现了10次以上。由于上一步中获得的字体文件大小已经足够好，我没有继续进行拆分。
+
+
+## 孔中窥见真理之貌（好像没有啥不对） {#孔中窥见真理之貌-好像没有啥不对}
+
+我最终将字体文件的总大小减少到了 250KB 左右，但这仍然比 HTML 文件大好几个数量级。虽然看到我的网站在所有设备和屏幕上都保持一致很让人开心，但是与页面大小增加上百倍的代价相比，我觉得这点好处不成比例。
+
+费劲心思指定字体或许并不值得。如果你希望看到我眼中本站的样子的话，以下是我的常用字体：
+
+-   比例拉丁字体： [Oxygen Sans](https://github.com/KDE/oxygen-fonts) 。注意 KDE 版本与 [Google Fonts 版本](https://fonts.google.com/specimen/Oxygen) 有一些微妙的区别，我更喜欢前者。
+-   比例 CJK 字体： [Noto Sans CJK](https://www.google.com/get/noto/help/cjk/) ，即思源黑体。
+-   等宽字体： [Iosevka](https://typeof.net/Iosevka/) ，确切地说是 ss09 样式。
diff --git a/org/2019.org b/org/2019.org
@@ -340,7 +340,7 @@ Wissen ist Nacht!
 
 知识就是黑夜！
 
-* TODO Fun with Fonts on the Web
+* DONE Fun with Fonts on the Web
 :PROPERTIES:
 :EXPORT_DATE: 2019-12-01
 :EXPORT_HUGO_SLUG: fun-with-fonts-on-the-web
@@ -354,7 +354,7 @@ Wissen ist Nacht!
 
 A more accurate version of the title probably should be "Fun with Fonts in Web Browsers", but oh well, it sounds cooler that way. Text rendering is [[https://gankra.github.io/blah/text-hates-you/][hard]], and it certainly doesn't help that we have a plethora of different writing systems (blame the Tower of Babel for that, I guess) which cannot be elegantly fitted into a uniform system. Running a bilingual blog doubles the trouble in font picking, and here's a compilation of the various problems I encountered.
 
-*** SPAAAAACE!
+*** Space Invaders
 Most browsers join consecutive lines of text in HTML to a single one with an added space in between, so
 #+BEGIN_SRC html
   <html>Line one and
@@ -372,8 +372,8 @@ Such a simplistic rule doesn't work for CJK languages where no separators is use
 #+END_SRC
 If your browser is smart enough (like Firefox), it will join the lines correctly. All the Blink based browsers, however, still stubbornly shove in the extra space, so it looks like I will be stuck in unwrapped source files like a barbarian for a bit longer. While not a cure-all solution, specifying the =lang= attribute still have the added benefit of enabling language-specific CSS rules, which comes in handy later.
 
-*** Quotation Marks
-As mentioned in a [[http://localhost:1313/en/posts/2018-06-24-fun-with-fonts-in-emacs/][previous post]], CJK fonts would render quotation marks as full-width characters, different from Latin fonts. This won't be a problem as long as a web page doesn't try to mix-and-match fonts: just use language specific font-stack like so:
+*** Return of the Quotation Marks
+As mentioned in a [[https://www.shimmy1996.com/en/posts/2018-06-24-fun-with-fonts-in-emacs/][previous post]], CJK fonts would render quotation marks as full-width characters, different from Latin fonts. This won't be a problem as long as a web page doesn't try to mix-and-match fonts: just use language specific font-stack.
 #+BEGIN_SRC css
   body:lang(en) {
       font-family: "Oxygen Sans", sans-serif;
@@ -385,7 +385,7 @@ As mentioned in a [[http://localhost:1313/en/posts/2018-06-24-fun-with-fonts-in-
 #+END_SRC
 Coupled with matching =lang= attributes, the story would have ended here. Firefox even allows you to specify default fonts on a per language basis, so you can actually get away with just the fallback values, like =sans-serif= or =serif=, and not even bother writing language specific CSS.
 
-However, what if I want to use Oxygen Sans for Latin characters, Noto Sans SC for CJK characters? While seemingly an sensible solution, specifying font stack like so
+However, what if I want to use Oxygen Sans for Latin characters, Noto Sans SC for CJK characters? While seemingly an sensible solution, specifying font stack like so,
 #+BEGIN_SRC css
   body:lang(zh) {
       font-family: "Oxygen Sans", "Noto Sans SC", sans-serif;
@@ -407,22 +407,22 @@ and revise the font stack as
 #+END_SRC
 Now we can enjoy the quotation marks in their full-width glory!
 
-*** Slicing and Dicing
+*** Font Ninja
 Font files are quite significant in size, and even more so for CJK ones: the Noto Sans SC font just mentioned is [[https://github.com/googlefonts/noto-cjk/blob/master/NotoSansSC-Regular.otf][over 8MB]] in size. No matter how determined I am to serve everything from my own server, this seems like an utter overkill considering the average HTML file size on my site is probably closer to 8KB. How does all the web font services handle this then?
 
 Most web font services work by adding a bunch of [[https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face][=@font-face=]] definitions into a website's style sheet, which pulls font files from dedicated servers. To reduce the size of files been served, Google Fonts slice the font file into smaller chunks, and declare corresponding =unicode-range= for each chunk under =@font-face= blocks (this is exactly how they handle [[https://fonts.googleapis.com/css?family=Noto+Sans+SC][CJK fonts]]). They also compress the font files into WOFF2, further reducing file size. On the other hand, [[https://fonts.adobe.com/][Adobe Fonts]] (previously known as Typekit) seem to have some JavaScript wizardry that dynamically determines which glyphs to load from a font file.
 
-Combining best of both worlds, and thanks to the fact that this is a static site, it is easy to gather all the used characters and serve a font file containing just that. The tools of choice here would be pyftsubset (available as a component of [[https://pypi.org/project/fonttools/][fonttools]]) and GNU awk. Compressing font files into WOFF2 also requires Brotli, a compression library. Under Arch Linux, the required packages are [[https://www.archlinux.org/packages/community/any/python-fonttools/][python-fonttools]], [[https://www.archlinux.org/packages/core/x86_64/gawk/][gawk]], [[https://www.archlinux.org/packages/community/x86_64/brotli/][brotli]], and [[https://www.archlinux.org/packages/community/x86_64/python-brotli/][python-brotli]].
+Combining best of both worlds, and thanks to the fact that this is a static site, it is easy to gather all the used characters and serve a font file containing just that. The tools of choice here would be pyftsubset (available as a component of [[https://pypi.org/project/fonttools/][fonttools]]) and GNU AWK. Compressing font files into WOFF2 also requires Brotli, a compression library. Under Arch Linux, the required packages are [[https://www.archlinux.org/packages/community/any/python-fonttools/][python-fonttools]], [[https://www.archlinux.org/packages/core/x86_64/gawk/][gawk]], [[https://www.archlinux.org/packages/community/x86_64/brotli/][brotli]], and [[https://www.archlinux.org/packages/community/x86_64/python-brotli/][python-brotli]].
 
 Here's a shell one-liner to collect all the used glyphs from generated HTML files:
 #+BEGIN_SRC sh
-  find -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS="";ORS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' > glyphs.txt
+  find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS="";ORS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' > glyphs.txt
 #+END_SRC
 You may need to =export LANG=en_US.UTF-8= (or any other UTF-8 locale) for certain glyphs to be handled correctly. With the list of glyphs, we can extract the useful part of font files and compress them:
 #+BEGIN_SRC sh
   pyftsubset NotoSansSC-Regular.otf --text-file=glyphs.txt --flavor=woff2 --output-file=NotoSansSC-Regular.woff2
 #+END_SRC
-Specifying =--no-hinting= and =--desubroutinize= can further reduce size of generated file at the cost of some aesthetic fine-tuning. A similar technique can be used to shrink down Latin fonts to include only ASCII characters (or keep the extended ASCII range with =U+0000-00FF=)
+Specifying =--no-hinting= and =--desubroutinize= can further reduce size of generated file at the cost of some aesthetic fine-tuning. A similar technique can be used to shrink down Latin fonts to include only ASCII characters (or keep the extended ASCII range with =U+0000-00FF=):
 #+BEGIN_SRC sh
   pyftsubset Oxygen-Sans.ttf --unicodes="U+0000-007F" --flavor=woff2 --output-file=Oxygen-Sans.woff2
 #+END_SRC
@@ -430,14 +430,109 @@ Once this is done, available glyphs can be checked using most font manager softw
 
 I also played around the idea of actually dividing the glyphs into further chunks by popularity, so here's another one liner to get list of glyphs sorted by number of appearances
 #+BEGIN_SRC sh
-  find -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]++;}} END{for(c in chars){printf "%06d %s\n", chars[c], c;}}' | sort -r > glyph-by-freq.txt
+  find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]++;}} END{for(c in chars){printf "%06d %s\n", chars[c], c;}}' | sort -r > glyph-by-freq.txt
 #+END_SRC
 It turns out my blog has around 1000 different Chinese characters, with roughly 400 of them appearing more than 10 times. Since the file sizes I get from directly a single subsetting is already good enough, I didn't bother proceeding with another split.
 
-*** See the Web Through My Browser
+*** For Your Browsers Only
 With all the tricks in my bag, I was able to cut down the combined font file size to around 250KB, still magnitudes above that of an HTML file though. While it is nice to see my site appearing the same across all devices and screens, I feel the benefit is out of proportion compared to the 100-fold increase in page size.
 
 Maybe it is just not worth it to force the choice of fonts. In case you want to see my site as I would like to see it, here are my go-to fonts:
 - Proportional Latin font: [[https://github.com/KDE/oxygen-fonts][Oxygen Sans]]. Note that the KDE version has nuanced differences from the [[https://fonts.google.com/specimen/Oxygen][Google Fonts version]], and I like the KDE version much more.
 - Proportional CJK font: [[https://www.google.com/get/noto/help/cjk/][Noto Sans CJK]].
 - Monospace font: [[https://typeof.net/Iosevka/][Iosevka]], the ss09 variant, to be more exact.
+
+** DONE zh
+:PROPERTIES:
+:EXPORT_FILE_NAME: 2019-12-01-fun-with-fonts-on-the-web.zh.md
+:EXPORT_TITLE: 字体配置万维网篇
+:END:
+
+用《字体配置浏览器篇》作为标题或许更为准确，不过现在的标题听起来更吸引人一些。渲染文本 [[https://gankra.github.io/blah/text-hates-you/][不是一件简单的事]] ，如果还要考虑书写系统之间的巨大差异（这大概得怪巴别塔）无异于雪上加霜。运行双语博客会导致字体选择的麻烦加倍，这里是我遇到的一些问题的汇总。
+
+*** 空格侵略者
+大多数浏览器会将 HTML 中的连续文本合并为一行，并在链接处加上空格。所以
+#+BEGIN_SRC html
+  <html>Line one and
+  line two.</html>
+#+END_SRC
+会被渲染为
+#+BEGIN_EXAMPLE
+Line one and line two.
+#+END_EXAMPLE
+这种一刀切的方法显然不适用与字符之间不带分隔的 CJK 语言。解决方案是为页面（或页面上的任何特定元素）指定 =lang= 属性，如下所示：
+#+BEGIN_SRC html
+  <html lang="zh">第一行和
+  第二行。</html>
+#+END_SRC
+如果你的浏览器足够聪明（例如 Firefox），渲染的结果就不会有额外的空格。但是，所有基于 Blink 的浏览器仍然顽固地将多余的空格塞进去，所以我只能像野蛮人那样继续写一段一行的源文件。尽管不是万能的解决方案，但是指定 =lang= 属性仍然具有启用特定于某种语言的CSS规则的额外好处，这稍后会派上用场。
+
+*** 引号归来
+如 [[https://www.shimmy1996.com/zh/posts/2018-06-24-fun-with-fonts-in-emacs/][之前的日志]] 所说， CJK 字体会将引号显示为全角字符，不同于拉丁字体。只要网页不尝试混搭字体，这就不会成为问题：只需使用特定于语言的字体栈就行。
+#+BEGIN_SRC css
+  body:lang(en) {
+      font-family: "Oxygen Sans", sans-serif;
+  }
+
+  body:lang(zh) {
+      font-family: "Noto Sans SC", sans-serif;
+  }
+#+END_SRC
+再加上匹配的 =lang= 属性，所有问题就都解决了。 Firefox 甚至允许为每种语言指定默认字体，所以仅使用后备字体（例如 =sans-serif= 或 =serif= ）也可行，不一定要费心编写特定于语言的CSS。
+
+那么，如果我想用 Oxygen Sans 来渲染拉丁字符，并用 Noto Sans SC 来渲染 CJK 字符怎么办？虽然看似没有问题，但像这样指定字体堆栈，
+#+BEGIN_SRC css
+  body:lang(zh) {
+      font-family: "Oxygen Sans", "Noto Sans SC", sans-serif;
+  }
+#+END_SRC
+会导致引号被 Oxygen Sans 渲染、显示为半角字符。我的解决方案是通过 =unicode-range= 定义一个涵盖了引号的替代字体，
+#+BEGIN_SRC css
+  @font-face {
+      font-family: "Noto Sans SC Override";
+      unicode-range: U+2018-2019, U+201C-201D;
+      src: local("NotoSansCJKsc-Regular");
+  }
+#+END_SRC
+并修改字体栈为
+#+BEGIN_SRC css
+  body:lang(zh) {
+      font-family: "Noto Sans SC Override", "Oxygen Sans", "Noto Sans SC", sans-serif;
+  }
+#+END_SRC
+这样我们就可以享受全角引号了！
+
+*** 字体忍者
+字体文件通常都不小，对于 CJK 字体来说更是如此：刚才提到的 Noto Sans SC 的大小 [[https://github.com/googlefonts/noto-cjk/blob/master/NotoSansSC-Regular.otf][超过8MB]] 。尽管我已经下定主意要从自己的服务器上提供所有文件，考虑到我网站上的平均 HTML 文件大小更接近 8KB，这显得有些过头了。那么那些网络字体服务如何处理这一问题呢？
+
+大多数网络字体服务的工作方式是在网站的样式表里添加一堆 [[https://developer.mozilla.org/zh-CN/docs/Web/CSS/@font-face][=@font-face= ]]定义，以从专用服务器上提取字体文件。为了减少所提供的文件大小， Google Fonts 会将字体文件大卸八块，并在 =@font-face= 里声明每一块所对应的 =unicode-range= （这正是它们处理 [[https://fonts.googleapis.com/css?family=Noto+Sans+SC][CJK 字体]] 的方式）。他们还将字体文件压缩为 WOFF2 以进一步缩减文件大小。而 [[https://fonts.adobe.com/][Adobe Fonts]] （以前称为 Typekit ）似乎有一些 JavaScript 奇技淫巧，可以动态确定要从字体文件加载的字形。
+
+博采众家之长，得益于这是一个静态站点，我们可以简单地统计所有用到的字符，并提供一个只包含这些字符的字体文件。所要用到的工具主要是 pyftsubset （属于 [[https://pypi.org/project/fonttools/][fonttools]] 下的一个组件）和 GNU AWK 。将字体压缩为 WOFF2 还需要 Brotli 压缩库。在 Arch Linux 下，获取这些程序需要安装 [[https://www.archlinux.org/packages/community/any/python-fonttools/][python-fonttools]] 、 [[https://www.archlinux.org/packages/core/x86_64/gawk/][gawk]] 、 [[https://www.archlinux.org/packages/community/x86_64/brotli/][brotli]] 和 [[https://www.archlinux.org/packages/community/x86_64/python-brotli/][python-brotli]] 。
+
+收集生成的HTML文件中的所有使用的字形可以使用这条 shell 命令：
+#+BEGIN_SRC sh
+  find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS="";ORS=""} {for(i=1;i<=NF;i++){chars[$(i)]=$(i);}} END{for(c in chars){print c;} }' > glyphs.txt
+#+END_SRC
+你可能需要 =export LANG=en_US.UTF-8= （或者其他 UTF-8 语言环境）以便正确处理某些字形。有了字形清单，我们就可以提取字体文件的有用部分并进行压缩：
+#+BEGIN_SRC sh
+  pyftsubset NotoSansSC-Regular.otf --text-file=glyphs.txt --flavor=woff2 --output-file=NotoSansSC-Regular.woff2
+#+END_SRC
+指定 =--no-hinting= 和 =--desubroutinize= 可以进一步减小生成文件的大小，但会降低字体的美观程度。拉丁字体也可以使用类似的技术来瘦身，例如只提取包含 ASCII 字符的部分（或将范围设为 =U+0000-00FF= 以涵盖 Extended ASCII 字符）：
+#+BEGIN_SRC sh
+  pyftsubset Oxygen-Sans.ttf --unicodes="U+0000-007F" --flavor=woff2 --output-file=Oxygen-Sans.woff2
+#+END_SRC
+大部分字体管理器都可以用来检查最后生成文件中可用的字形，也可以使用这一 [[http://torinak.com/font/lsfont.html][在线检查器]] （不支持 WOFF2，但是可以先试着转为其他格式后查看，例如 WOFF）。
+
+我还考虑过将字形按受欢迎程度划分为更多块。获取按出现次数排序的字形列表可以使用以下命令：
+#+BEGIN_SRC sh
+  find . -type f -name "*.html" -printf "%h/%f " | xargs -l awk 'BEGIN{FS=""} {for(i=1;i<=NF;i++){chars[$(i)]++;}} END{for(c in chars){printf "%06d %s\n", chars[c], c;}}' | sort -r > glyph-by-freq.txt
+#+END_SRC
+结果显示我的博客用到了大约 1000 个不同的汉字，其中大约 400 个出现了10次以上。由于上一步中获得的字体文件大小已经足够好，我没有继续进行拆分。
+
+*** 孔中窥见真理之貌（好像没有啥不对）
+我最终将字体文件的总大小减少到了 250KB 左右，但这仍然比 HTML 文件大好几个数量级。虽然看到我的网站在所有设备和屏幕上都保持一致很让人开心，但是与页面大小增加上百倍的代价相比，我觉得这点好处不成比例。
+
+费劲心思指定字体或许并不值得。如果你希望看到我眼中本站的样子的话，以下是我的常用字体：
+- 比例拉丁字体： [[https://github.com/KDE/oxygen-fonts][Oxygen Sans]] 。注意 KDE 版本与 [[https://fonts.google.com/specimen/Oxygen][Google Fonts 版本]] 有一些微妙的区别，我更喜欢前者。
+- 比例 CJK 字体： [[https://www.google.com/get/noto/help/cjk/][Noto Sans CJK]] ，即思源黑体。
+- 等宽字体： [[https://typeof.net/Iosevka/][Iosevka]] ，确切地说是 ss09 样式。

M	content/posts/2019-12-01-fun-with-fonts-on-the-web.en.md	\|	20	++++++++++----------
A	content/posts/2019-12-01-fun-with-fonts-on-the-web.zh.md	\|	126	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	org/2019.org	\|	117	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------