Next: , Previous: , Up: GNU troff Reference   [Contents][Index]


5.37 GNU troff Internals

GNU troff processes input in three steps. It gathers one or more input characters into a token,184 the smallest meaningful unit of troff input. The process of formatting translates tokens into nodes that populate a pending output line (recall Manipulating Filling and Adjustment). A node is a data structure representing any object that may ultimately appear in the output, like a glyph or motion on the page. When the pending output line breaks, the formatter applies any relevant adjustment, line number, and margin character, and finally appends it to the current diversion. Periodically, the formatter flushes accumulated output line(s) to the output device, a process that translates each node into a device-independent output language representation understood by all output drivers. Copy mode tokenizes but does not format; diversions (apart from that at the top level) format but do not write output.

For example, GNU troff converts the input ‘Gi\[:u]\%seppe’ into a character token for ‘g’, a character token for ‘i’, a special character token for ‘:u’ (representing ‘u’ with an umlaut), a token encoding a hyphenation break point,185 and further character tokens. You can observe this process by storing the foregoing input into a string—which, because its contents are read in copy mode, is only tokenized, not formatted—and dumping it with the pm request.186 (Using printf(1) requires us to double the ‘\’ and ‘%’ characters.)

$ printf '.ds str Gi\\[:u]\\%%seppe\n.pm str\n' \
    | groff 2>&1 | jq

Similarly, we can observe the details of the formatting process by interpolating the string, or supplying its contents directly as input, and invoking the pline request.

$ printf 'Gi\\[:u]\\%%seppe\n.pline\n' | groff -z 2>&1 | jq

We now see a list of nodes, including an output line start node, several glyph nodes, a discretionary break node containing a glyph node for the special character ‘:uand a glyph node for the special character ‘hy’ (hyphen), and a word space node at the end corresponding to the newline at the end of input.187

If we change ‘G’ to ‘f’, we see that the first two glyph nodes, for ‘f’ and ‘i’, become contained by a ligature node (provided the current font has a glyph for this ligature). All output glyph nodes are “processed”, which means that they are associated with a given font, type size, advance width, and so forth.

Macros, diversions, and strings collect elements in two chained lists: a list of tokens that have been passed unprocessed, and a list of nodes. Consider the following diversion.

.di xxx
a
\!b
c
.br
.di

It contains these elements.

node listtoken listelement number
line start node1
glyph node a2
word space node3
b4
\n5
glyph node c6
vertical size node7
vertical size node8
\n9

troff inserts elements 1, 7, and 8; the latter two (which are always present) specify the vertical extent of the last line, possibly modified by \x. The br request finishes the pending output line, inserting a newline token, which is subsequently converted to a space when the diversion is interpolated. Note that the word space node has a fixed width that isn’t adjustable anymore. To convert horizontal space nodes back into tokens, use the unformat request.

Macros only contain elements in the token list (and the node list is empty); diversions and strings can contain elements in both lists.

The chop request simply reduces the number of elements in a macro, string, or diversion by one. Exceptions are compatibility save and compatibility ignore tokens, which are ignored. The substring request also ignores those tokens.

Some requests like tr or cflags work on glyph identifiers only; this means that the associated glyph can be changed without destroying this association. This can be very helpful for substituting glyphs. In the following example, we assume that glyph ‘foo’ isn’t available by default, so we provide a substitution using the fchar request and map it to input character ‘x’.

.fchar \[foo] foo
.tr x \[foo]

Now let us assume that we install an additional special font ‘bar’ that has glyph ‘foo’.

.special bar
.rchar \[foo]

Since glyphs defined with fchar are searched before glyphs in special fonts, we must call rchar to remove the definition of the fallback glyph. Anyway, the translation is still active; ‘x’ now maps to the real glyph ‘foo’.

Macro and request arguments preserve compatibility mode enablement.

.cp 1     \" switch to compatibility mode
.de xx
\\$1
..
.cp 0     \" switch compatibility mode off
.xx caf\['e]
    ⇒ café

Since compatibility mode is enabled while de is invoked, the macro xx enables compatibility mode when it is called. Argument $1 can still be handled properly because it inherits the compatibility mode enablement status that was active at the point where xx was called.

After interpolation of the parameters, the compatibility save and restore tokens are removed.


Next: , Previous: , Up: GNU troff Reference   [Contents][Index]