Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 98 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,16 @@ script:
./check-format.sh --fix
```

## Documentation

The documents under [docs/experimental](docs/experimental) provide some more detailed designs of various aspects of the IFT encoder. Of note:
* [compiler.md](docs/experimental/compiler.md)
* [closure_glyph_segmentation.md](docs/experimental/closure_glyph_segmentation.md)
* [closure_glyph_segmentation_merging.md](docs/experimental/closure_glyph_segmentation_merging.md)
* [closure_glyph_segmentation_complex_conditions.md](docs/experimental/closure_glyph_segmentation_complex_conditions.md)

Provide a detailed design of how the two major pieces (segmentation and compilation) of IFT font encoding work.

## Generating compile_commands.json for IDE

This repo is configured to use [hedron](https://github.com/hedronvision/bazel-compile-commands-extractor) to produce a
Expand All @@ -67,13 +77,81 @@ bazel run @hedron_compile_commands//:refresh_all

Will generate a compile_commands.json file.

## Producing IFT Encoded Fonts
## Producing IFT Encoded Fonts (with Auto Config)

The simplest way to create IFT fonts is via the `font2ift` utility utilizing the auto configuration mode.
This is done by running the utility and not providing a segmentation plan. Example invocation:

```bash
bazel run -c opt @ift_encoder//util:font2ift -- \
--input_font="$HOME/fonts/myfont/MyFont.ttf" \
--output_path=$HOME/fonts/myfont/ift/ \
--output_font="MyFont-IFT.woff2"
```

This will analyze the input font, decide how to segment it, and then produce the final IFT encoded font
and patches.

When utilizing auto config there are two optional flags which can be used to adjust the behaviour:
* `--auto_config_primary_script`: this tells the config generator which language/script the font is intended
to be used with. It has two effects: first the codepoints of the primary script are eligible to be moved
into the initial font. Second for scripts with large overlaps, such as CJK, primary script selects which
of the overlapping scripts to use frequency data from. Values refer to frequency data files in
[ift-encoder-data](https://github.com/w3c/ift-encoder-data/tree/main/data). Example values: "Script_bengali",
"Language_fr"

* `--auto_config_quality`: This is analagous to a quality level in a compression library. It controls how much
effort is spent to improve the efficiency of the final IFT font. Values range from 1 to 8, where higher
values increase encoding times but typically result in a more efficient end IFT font (ie. less bytes
transferred by clients using it).

Example command line with optional flags:

```bash
bazel run -c opt @ift_encoder//util:font2ift -- \
--input_font="$HOME/fonts/NotoSansJP-Regular.otf" \
--output_path=$HOME/fonts/ift/ \
--output_font="NotoSansJP-Regular-IFT.woff2" \
--auto_config_primary_script=Script_japanese \
--auto_config_quality=3
```

*Note: the auto configuration mode is still under development, in particular the auto selection of quality level
is currently quite simplistic. It's expected to continue to evolve from it's current state.*

## Producing IFT Encoded Fonts (Advanced)

Under the hood IFT font encoding happens in three stages:

1. Generate or write a segmenter config for the font.
2. Generate a segmentation plan, which describes how the font is split into patches. Takes the segmenter config as an input.
3. Compile the final IFT encoded font following the segmentation plan.

For more advanced use cases these steps can be performed individually. This allows the segmenter config
and segmentation plans to be fine tuned beyond what auto configuration is capable of.

### Step 1: Generating a Segmenter Config

There are two main options for generating a segmenter config:

1. Write the config by hand, the segmenter is configured via an input configuration file using the
[segmenter_config.proto](util/segmenter_config.proto) schema, see the comments there for more details.
This option is useful when maximum control over segmentation parameters is needed, or custom frequency
data is being supplied.

2. Auto generate the segmenter config using `util:generate_segmenter_config`.

IFT encoded fonts are produced in two steps:
1. A segmentation plan is generated which specifies how the font file should be split up in the IFT encoding.
2. The IFT encoded font and patches are compiled by the Compiler sub module using the segmentation plan.
```
CC=clang bazel run //util:generate_segmenter_config -- \
--quality=5 \
--input_font=$HOME/MyFont.ttf > config.txtpb
```

### Generating Segmentation Plan
This analyzes the input font and tries to pick appropriate config values automatically. As discussed in
the previous "Producing IFT Encoded Fonts" section there is a configurable quality level. If needed
the auto generated config can be hand tweaked after generation.

### Step 2: Generating Segmentation Plan

Segmentation plans are in a [textproto format](https://protobuf.dev/reference/protobuf/textformat-spec/) using the
[segmentation_plan.proto](util/segmentation_plan.proto) schema. See the comments in the schema file for more information.
Expand All @@ -83,17 +161,9 @@ possible to write plans by hand, or develop new utilities to generate plans.

In this repo 3 options are currently provided:

1. `util/generate_table_keyed_config`: this utility generates the table keyed (extension segments that augment non
glyph data in the font) portion of a plan. Example execution:

```sh
bazel run -c opt util:generate_table_keyed_config -- \
--font=$(pwd)/myfont.ttf \
latin.txt cyrillic.txt greek.txt > table_keyed.txtpb
```

2. `util/closure_glyph_keyed_segmenter_util`: this utility uses a subsetting closure based approach to generate a glyph
keyed segmentation plan (extension segments that augment glyph data). Example execution:
1. [Recommended] `util/closure_glyph_keyed_segmenter_util`: this utility uses a subsetting closure based approach
to generate a glyph keyed segmentation plan (extension segments that augment glyph data). It can optionally
generate the table keyed portion of the config as well. Example execution:

```sh
bazel run -c opt util:closure_glyph_keyed_segmenter_util -- \
Expand All @@ -109,6 +179,15 @@ In this repo 3 options are currently provided:
Note: this utility is under active development and still very experimental. See
[the status section](docs/experimental/closure_glyph_segmentation.md#status) for more details.

2. `util/generate_table_keyed_config`: this utility generates the table keyed (extension segments that augment non
glyph data in the font) portion of a plan. Example execution:

```sh
bazel run -c opt util:generate_table_keyed_config -- \
--font=$(pwd)/myfont.ttf \
latin.txt cyrillic.txt greek.txt > table_keyed.txtpb
```

3. `util/iftb2config`: this utility converts a segmentation obtained from the
[binned incremental font transfer prototype](https://github.com/adobe/binned-ift-reference)
into and equivalent segmentation plan. Example execution:
Expand All @@ -118,23 +197,20 @@ In this repo 3 options are currently provided:
bazel run util:iftb2config > segmentation_plan.txtpb
```

If seperate glyph keyed and table keyed configs were generated using #1 and #2 they can then be combined into one
If separate glyph keyed and table keyed configs were generated using #1 and #2 they can then be combined into one
complete plan by concatenating them:

```sh
cat glyph_keyed.txtpb table_keyed.txtpb > segmentation_plan.txtpb
```

Additional tools for generating encoder configs are planned to be added in the future.

For concrete examples of how to generate IFT fonts, see the [IFT Demo](https://github.com/garretrieger/ift-demo).
In particular the [Makefile](https://github.com/garretrieger/ift-demo/blob/main/Makefile) and the
[segmenter configs](https://github.com/garretrieger/ift-demo/tree/main/config) may be helpful.

### Generating an IFT Encoding
### Step 3: Generating an IFT Encoding

Once an segmentation plan has been created it can be combined with the target font to produce and incremental font and collection
of associated patches using the font2ift utility which is a wrapper around the compiler. Example execution:
Once a segmentation plan has been created it can be combined with the target font to produce an incremental font and collection of associated patches using the font2ift utility which is a wrapper around the compiler. Example execution:

```sh
bazel -c opt run util:font2ift -- \
Expand Down
33 changes: 33 additions & 0 deletions ift/encoder/closure_glyph_segmenter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -734,4 +734,37 @@ Status ClosureGlyphSegmenter::FallbackCost(
return absl::OkStatus();
}

void ClosureGlyphSegmenter::AddTableKeyedSegments(
SegmentationPlan& plan,
const btree_map<SegmentSet, MergeStrategy>& merge_groups,
const std::vector<SubsetDefinition>& segments,
const SubsetDefinition& init_segment) {
std::vector<SubsetDefinition> table_keyed_segments;
for (const auto& [segment_ids, _] : merge_groups) {
SubsetDefinition new_segment;
for (uint32_t s : segment_ids) {
new_segment.Union(segments.at(s));
}
new_segment.Subtract(init_segment);
table_keyed_segments.push_back(new_segment);
}

uint32_t max_id = 0;
for (const auto& [id, _] : plan.segments()) {
if (id > max_id) {
max_id = id;
}
}

uint32_t next_id = max_id + 1;
auto* plan_segments = plan.mutable_segments();
for (const SubsetDefinition& def : table_keyed_segments) {
GlyphSegmentation::SubsetDefinitionToSegment(def,
(*plan_segments)[next_id]);
SegmentsProto* segment_ids = plan.add_non_glyph_segments();
segment_ids->add_values(next_id);
next_id++;
}
}

} // namespace ift::encoder
8 changes: 8 additions & 0 deletions ift/encoder/closure_glyph_segmenter.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@
#include <optional>
#include <vector>

#include "absl/container/btree_map.h"
#include "absl/status/statusor.h"
#include "ift/encoder/glyph_segmentation.h"
#include "ift/encoder/merge_strategy.h"
#include "ift/encoder/segmentation_context.h"
#include "ift/encoder/subset_definition.h"
#include "ift/freq/probability_calculator.h"
#include "util/common.pb.h"
#include "util/segmentation_plan.pb.h"
#include "util/segmenter_config.pb.h"

namespace ift::encoder {
Expand Down Expand Up @@ -89,6 +91,12 @@ class ClosureGlyphSegmenter {
uint32_t& fallback_glyphs_size,
uint32_t& all_glyphs_size) const;

static void AddTableKeyedSegments(
SegmentationPlan& plan,
const absl::btree_map<common::SegmentSet, MergeStrategy>& merge_groups,
const std::vector<SubsetDefinition>& segments,
const SubsetDefinition& init_segment);

private:
uint32_t brotli_quality_;
uint32_t init_font_merging_brotli_quality_;
Expand Down
78 changes: 78 additions & 0 deletions util/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,15 @@ cc_binary(
srcs = [
"font2ift.cc",
],
data = [
"@ift_encoder_data//:freq_data",
],
deps = [
":auto_config_flags",
":auto_segmenter_config",
":load_codepoints",
":segmentation_plan_cc_proto",
":segmenter_config_util",
"//common",
"//ift",
"//ift/encoder",
Expand All @@ -76,6 +82,7 @@ cc_binary(
"@abseil-cpp//absl/status:statusor",
"@abseil-cpp//absl/strings",
"@harfbuzz",
"//util:segmenter_config_cc_proto",
],
)

Expand Down Expand Up @@ -103,6 +110,8 @@ cc_binary(
"@ift_encoder_data//:freq_data",
],
deps = [
":auto_config_flags",
":auto_segmenter_config",
":load_codepoints",
":segmentation_plan_cc_proto",
":segmenter_config_cc_proto",
Expand Down Expand Up @@ -137,6 +146,16 @@ cc_binary(
],
)

cc_library(
name = "auto_config_flags",
srcs = ["auto_config_flags.cc"],
hdrs = ["auto_config_flags.h"],
visibility = ["//visibility:public"],
deps = [
"@abseil-cpp//absl/flags:flag",
],
)

cc_library(
name = "convert_iftb",
srcs = [
Expand All @@ -154,6 +173,23 @@ cc_library(
],
)

cc_library(
name = "auto_segmenter_config",
srcs = [
"auto_segmenter_config.cc",
],
hdrs = [
"auto_segmenter_config.h",
],
deps = [
":load_codepoints",
":segmenter_config_cc_proto",
"//common",
"@abseil-cpp//absl/container:flat_hash_set",
"@harfbuzz",
],
)

cc_library(
name = "load_codepoints",
srcs = [
Expand Down Expand Up @@ -185,10 +221,30 @@ cc_library(
],
deps = [
":load_codepoints",
":segmentation_plan_cc_proto",
":segmenter_config_cc_proto",
"//common",
"//ift/encoder",
"@abseil-cpp//absl/status:statusor",
"@harfbuzz",
],
)

cc_test(
name = "auto_segmenter_config_test",
size = "small",
srcs = [
"auto_segmenter_config_test.cc",
],
data = [
"//common:testdata",
"@ift_encoder_data//:freq_data",
],
deps = [
":auto_segmenter_config",
"//common",
"@googletest//:gtest_main",
"@harfbuzz",
],
)

Expand Down Expand Up @@ -247,6 +303,28 @@ cc_test(
],
)

cc_binary(
name = "generate_segmenter_config",
srcs = [
"generate_segmenter_config.cc",
],
data = [
"@ift_encoder_data//:freq_data",
],
deps = [
":auto_segmenter_config",
":load_codepoints",
":segmenter_config_cc_proto",
"//common",
"@abseil-cpp//absl/flags:flag",
"@abseil-cpp//absl/flags:parse",
"@abseil-cpp//absl/log:initialize",
"@abseil-cpp//absl/status",
"@harfbuzz",
"@protobuf",
],
)

cc_binary(
name = "iftb2config",
srcs = [
Expand Down
15 changes: 15 additions & 0 deletions util/auto_config_flags.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#include "util/auto_config_flags.h"

#include <string>

#include "absl/flags/flag.h"

ABSL_FLAG(int, auto_config_quality, 0,
"The quality level to use when generating a segmenter config. A value of 0 "
"means auto pick. Valid values are 1-8.");

ABSL_FLAG(std::string, auto_config_primary_script, "Script_latin",
"When auto_config is enabled this sets the primary script or "
"language frequency data file to use. "
"The primary script is eligible to have codepoints moved to the init font. "
"For CJK primary script can be used to specialize against a specific language/script.");
11 changes: 11 additions & 0 deletions util/auto_config_flags.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#ifndef UTIL_AUTO_CONFIG_FLAGS_H_
#define UTIL_AUTO_CONFIG_FLAGS_H_

#include <string>

#include "absl/flags/declare.h"

ABSL_DECLARE_FLAG(int, auto_config_quality);
ABSL_DECLARE_FLAG(std::string, auto_config_primary_script);

#endif // UTIL_AUTO_CONFIG_FLAGS_H_
Loading
Loading