Qi YACC


 

What is Qi YACC?

A compiler-compiler based on earlier work by Dr Gyorgy Lajos on his METALISP Ph.D. project. The syntax of Qi YACC is based on the BNF notation of Backus. Used at its most basic level, Qi YACC is a generator for recognisors based on grammars. More usefully, Qi YACC can be used to develop efficient compilers for programming languages and transducers for natural language processing. Much of the actual code driving the parsers for post-2000 versions of Qi was developed using Qi YACC. Qi YACC is bundled with Qi.

Qi YACC Parsing Strategy

Qi YACC uses a top-down recursive descent parsing strategy. Left recursion is not tolerated.

Qi YACC as a Recognisor Generator

Consider the following grammar.

<sent> := <np> <vp>
<np> := <det> <n> | <name>
<det> := the | a
<n> := cat | dog
<name> := Bill | Ben
<vp> := <vtrans> <np>
<vtrans> := likes | chases


In
Qi YACC, this grammar would be represented as:

(defcc <sent>
<np> <vp>;)

(defcc <det>
the; a;)

(defcc <np>
<det> <n>;
<name>;)

(defcc <n>
cat; dog;)

(defcc <name>
Bill; Ben;)

(defcc <vp>
<vtrans> <np>;)

(defcc <vtrans>
likes; chases;)


The spacing is left to the judgement of the programmer, but ;s separate rules. Unlike
Qi, Qi YACC gives no significance to symbols beginning in uppercase (i.e. Bill is just a symbol like cat, and not a variable). When one of these definitions (e.g. for <sent>) is entered to the Qi top level, it is compiled into Common Lisp by Qi YACC with the message warning; no semantics in <np> <vp>.

The compiler generated acts as a recogniser for sentences of the language characterised by this grammar. If it is not a sentence of the language, then the failure object #\Escape is returned. If the input to the compiler belongs to this language, then #\Escape is not returned by the compiler and generally the program behaves as an identity function. The compiler is invoked by the higher-order function compile, that receives the name of a
Qi YACC function and parses its second input by that function.

(10-) (compile <sent> [the cat likes the dog])
[the cat likes the dog]

(11-) (compile <sent> [the cat likes the canary])
#\Escape

(12-) (compile <vp> [chases the cat])
[chases the cat]


Note that names of
Qi YACC functions should always be enclosed in angles. Qi YACC is sensitive to left-recursion which will force an infinite regress. Qi YACC code is not type checked, but the code can be tracked just like regular Qi code. Lists are constructed in Qi YACC using […] or cons or list or any of the conventional methods. Unlike Qi, the constructor | cannot be used in the syntax of an expansion (i.e. to the left of :=), though it can be used to the right (in a semantic action) to perform consing. However […] can be used to the left of :=. <bcs>, below, recognises the inputs belonging to [bm][cn].

(defcc <bcs>
[<bs>] [<cs>];)

(defcc <bs>
b <bs>;
b;)

(defcc <cs>
c <cs>;
c;)

(0-) (compile <bcs> [[b b b] [c c]])
[[[b b b]] [[c c]]]

Note that compile is not part of Qi type theory although Qi YACC can be loaded into Qi with type checking enabled.

Semantic Actions in Qi YACC

Semantic actions are attached to grammar rules by following each rule by a :=. This Qi YACC definition receives a list of as and changes them to bs.

(defcc <as>
a <as> := [b | <as>];
a := [b];)


The first rule says that any input of the form a <as> is to be translated into an output consisting of b consed to the translation of <as>. The syntax of <as> requires that the input be a non-empty list of as. So (compile <as> [a a a]) gives [b b b]. The second rule is the base case.

As in
Qi, round brackets signify function applications and square ones form lists. The following reformulation is an example.

(defcc <sent>
<np> <vp> := (question <np> <vp>);)

(define question
NP VP -> (append [Is it true that your father] (append VP [?])))


Here (compile <sent> [the cat likes the dog]) gives [Is it true that your father likes the dog ?].

If semantic actions are not included,
Qi YACC warns the user and inserts a default semantic action. This default action appends non-terminals and conses terminals to form an output. So

(defcc <as>
a <as>; a;)


and

(defcc <as>
a <as> := (cons a <as>);
a := [a];)


generate the same code which acts as an identity function on any list of as.

Note that placing output one after another in the semantic actions of
Qi YACC amounts to forming a list. The definition

(defcc <as>
a <as> := b <as>;
a := b;)

forms an embedded list of bs from a list of as; so (compile <as> [a a a a]) gives [b [b [b b]]].

(defcc <as>
a <as> := [b | <as>];
a := [b];)

produces [b b b b] from [a a a a].

HERE IS A GOOD RULE!!

Qi YACC allows you to forget about semantic actions so that you can focus on getting the syntax of your language right. But once you have done that, relying on Qi YACC to second-guess what you want to do can be dangerous, even though 95% of the time it will do what you want. Fill in the semantic actions unless you are sure of yourself. The warning messages are not for show.

Backtracking in Qi YACC

Different expansions in an Qi YACC function F are tried in the order in which they appear. If one fails, then the next is tried. If all fail then control returns to the last function that called F. In right recursive cases, it is important to place the longest expansion first, since otherwise Qi YACC may fail to recurse fully through the input.

Thus given the choice of

(defcc <as>
a := b;
a <as> := b <as>;)

(defcc <as>
a <as> := b <as>;
a := b;)


The right-hand version is preferred. In other words, the situation with
Qi YACC is the opposite of functional programming; the base case generally comes last.

Qi YACC practices a limited form of backtracking. If an expansion fails, then Qi YACC does not backtrack within an expansion to find an alternative parse. Backtracking only takes place between and not within expansions. Thus given the expansion <a> <b> <c>; suppose the parse should fail at <c>. In that case, the expansion as a whole would fail, and Qi YACC would not backtrack to consider reallocating different parts of the input to <a> or <b>.

Special Symbols in Qi YACC

Qi YACC reserves special symbols with a predetermined meaning. These are

Symbol Semantics

-*- The head of the input stream.
-s- The tail of the input stream.
-o- The output stream
<e> The empty expansion.
; The end of a rule.
<...> A non-terminal

The input stream is the list of objects being parsed.

The output stream is the list of objects being produced.

This program takes a list and searches for the first occurrence of a digit (0 to 9) in the list. The non-terminal <morestuff> just throws away the input after the digit is found. The line -*- <find-digit> := <find-digit>; causes Qi YACC to discard the first element of the input stream if it is not a digit. Notice that we get the unit list containing the digit. This is because when we return a non-terminal as an output, by default Qi YACC returns the list of objects that fall under the non-terminal.

(defcc <find-digit>
<digit> <morestuff> := <digit>;
<digit> := <digit>;
-*- <find-digit> := <find-digit>;)

(defcc <morestuff>
-*- <morestuff>;
-*-;)

(defcc <digit>
0; 1; 2; 3; 4; 5; 6; 7; 8; 9;)


(36-) (compile <find-digit> [a v f g 6 y u])
[6]


This program returns the tail of the input stream after the first digit found in the stream.

(defcc <find-digit>
<digit> <morestuff> := -s-;
<digit> := -s-;
-*- <find-digit> := <find-digit>;)

(38-) (compile <find-digit> [a v f g 6 y u])
[y u]


This program recognises all lists of the form aibjck ; i.e. a series of i as followed by j bs followed by k cs all in one list where j is allowed to be 0. The empty expansion <e> caters for the 0 case.

(defcc <asbscs>
<as> <bs> <cs>;)

(defcc <as>
a <as>;
a ;)

(defcc <bs>
b <bs>;
b;
<e>;)

(defcc <cs>
c <cs>;
c;)


(compile <asbscs> [a a a b b c])
[a a a b b a a a c]

If <e> is used and no semantic action is associated with the empty expansion, Qi YACC returns as the output for <e> everything currently on the output stream. Using <e> without a corresponding semantic action is not encouraged because the exact effects are hard to judge. Here <bs> is recoded to indicate that if the empty expansion is used, then [ ] should be placed on the output stream.

(defcc <bs>
b <bs>;
b;
<e> := [ ];)


This program nows acts by returning the input if it is an aibjck stream or returns #\Escape if not.

Forcing Backtracking in Qi YACC

Returning #\Escape as the result of a semantic action will automatically cause the expansion to fail. In effect, Qi YACC treats the return of the failure object as a parse failure and triggers backtracking. This allows reformulations of certain expansions. For example

(defcc <digit>
0; 1; 2; 3; 4; 5; 6; 7; 8; 9;)


can be written

(defcc <digit>
-*- := (if (element? -*- [0 1 2 3 4 5 6 7 8 9]) -*- #\Escape))


Using -*- and #\Escape together, elegant and computationally efficient shorthands can be constructed. The following creates a new 2-place function one_of, that returns the head of the input, if this head is an element of its second argument. If not, one_of forces a backtrack.

(defcc <digit>
-*- := (one_of -*- [0 1 2 3 4 5 6 7 8 9]);)

(define one_of
X Y -> (if (element? X Y) X #\Escape))


Here is another elegant higher-order construction is_a? that returns the head of the input, if this head satisfies a predicate. If not, is_a? forces a backtrack. This program finds the first number in a list. This time we get the number and not the list containing the number.

(defcc <find-number>
<number> <morestuff> := <number>;
<number> := <number>;
-*- <find-number> := <find-number>;)

(defcc <number>
-*- := (is_a? number? -*-))

(define is_a?
F X -> (if (F X) X #\Escape))


(compile <find-number> [a v g t 7 k l])
7

The ability to control backtracking within semantic actions gives Qi YACC the ability to parse context-sensitive grammars. A good example is a grammar that recognises all lists of the form anbncn where n > 0. Here is the program.

(defcc <anbncn>
<as> <bs> <cs> := (if (equal-length? [<as> <bs> <cs>]) (appendall [<as> <bs> <cs>]) #\Escape);)

(defcc <as>
a <as>;
a;)

(defcc <bs>
b <bs>;
b;)

(defcc <cs>
c <cs>;
c;)

(define equal-length?
[] -> true
[L] -> true
[L1 L2 | Ls] -> (and (= (length L1) (length L2)) (equal-length? [L2 | Ls])))

(define appendall
[] -> []
[L | Ls] -> (append L (appendall Ls)))

(46-) (compile <anbncn> [a a a b b b c c c])
[a a a b b b c c c]

(47-) (compile <anbncn> [a a a b b b c c])
#\Escape

It is important to note that Qi YACC follows an eager model of evaluation whereby the syntax is fully parsed according to the syntax side of a rule, before any semantics is called. Thus

(defcc <foo>
-*- <large_and_nasty> := (if (= -*- 0) #\Escape <large_and_nasty>);)

will call <large_and_nasty> even if the head of the parse stream is 0.

(compile <foo> [0 1 2 1 1 3])
FUNCALL: undefined function <large_and_nasty>

Only after the completion of the parse will the head of the parse stream be checked.

HERE IS ANOTHER GOOD RULE!!

Qi YACC allows you to force backtracking from semantic actions, but this is not a good idea if the syntax rule entails lots of processing. Use semantic backtracking when

  1. You need to do some context sensitive tests OR ....
  2. ....the syntax rule is short and fast and the semantics is called quickly.
Final Remarks on Qi YACC

Qi YACC is a powerful but sometimes tricky tool. More examples are needed to help the programmer than are currently available. What is on this page is all that exists for now. The code for Qi in the Sources download file contains a fair bit of Qi YACC. The source for Qi YACC is very small (around 150 lines of Lisp) and can be read in the Sources directory of the Qi download. See also http://www.pps.jussieu.fr/~jch/software/cl-yacc/ for a larger system written for Common Lisp.

Copyright (c) 2005, Mark Tarver
dr.mtarver@ukonline.co.uk