![]() |
Qi YACC |
| What is Qi YACC? |
A compiler-compiler based on earlier work by Dr Gyorgy Lajos on his METALISP Ph.D. project. The syntax of Qi YACC is based on the BNF notation of Backus. Used at its most basic level, Qi YACC is a generator for recognisors based on grammars. More usefully, Qi YACC can be used to develop efficient compilers for programming languages and transducers for natural language processing. Much of the actual code driving the parsers for post-2000 versions of Qi was developed using Qi YACC. Qi YACC is bundled with Qi.
| Qi YACC Parsing Strategy |
Qi YACC uses a top-down recursive descent parsing strategy. Left recursion is not tolerated.
| Qi YACC as a Recognisor Generator |
Consider the following grammar.
<sent> := <np> <vp>
<np> := <det> <n> | <name>
<det> := the | a
<n> := cat | dog
<name> := Bill | Ben
<vp> := <vtrans> <np>
<vtrans> := likes | chases
In Qi YACC, this grammar would be represented as:
(defcc <sent>
<np> <vp>;)
(defcc <det>
the; a;)
(defcc <np>
<det> <n>;
<name>;)
(defcc <n>
cat; dog;)
(defcc <name>
Bill; Ben;)
(defcc <vp>
<vtrans> <np>;)
(defcc <vtrans>
likes; chases;)
The spacing is left to the judgement of the programmer, but ;s
separate rules. Unlike Qi, Qi YACC gives no significance to symbols
beginning in uppercase (i.e. Bill is just a
symbol like cat, and not a variable). When one
of these definitions (e.g. for <sent>) is
entered to the Qi top level, it is compiled into Common
Lisp by Qi YACC with the message warning; no
semantics in <np> <vp>.
The compiler generated acts as a recogniser for sentences of the
language characterised by this grammar. If it is not a sentence
of the language, then the failure object #\Escape
is returned. If the input to the compiler belongs to this
language, then #\Escape is not returned by the
compiler and generally the program behaves as an identity
function. The compiler is invoked by the higher-order function compile,
that receives the name of a Qi YACC function and parses its second input
by that function.
(10-) (compile <sent> [the cat likes the dog])
[the cat likes the dog]
(11-) (compile <sent> [the cat likes the canary])
#\Escape
(12-) (compile <vp> [chases the cat])
[chases the cat]
Note that names of Qi YACC functions should always be enclosed
in angles. Qi YACC is sensitive to left-recursion which
will force an infinite regress. Qi YACC code is not type checked, but the
code can be tracked just like regular Qi code. Lists are constructed in Qi YACC
using [
] or cons or list
or any of the conventional methods. Unlike Qi, the constructor |
cannot be used in the syntax of an expansion (i.e. to the left of
:=), though it can be used to the right (in a
semantic action) to perform consing. However [
]
can be used to the left of :=. <bcs>,
below, recognises the inputs belonging to [bm][cn].
(defcc <bcs>
[<bs>] [<cs>];)
(defcc <bs>
b <bs>;
b;)
(defcc <cs>
c <cs>;
c;)
(0-) (compile
<bcs> [[b b b] [c c]])
[[[b b b]] [[c c]]]
Note that compile is not part of Qi type theory although Qi YACC can be loaded into Qi with type checking enabled.
| Semantic Actions in Qi YACC |
Semantic actions are attached to
grammar rules by following each rule by a :=. This Qi YACC
definition receives a list of as and changes them to bs.
(defcc <as>
a <as> := [b | <as>];
a := [b];)
The first rule says that any input of the form a
<as> is to be translated into an output consisting
of b consed to the translation of <as>.
The syntax of <as> requires that the input
be a non-empty list of as. So (compile
<as> [a a a]) gives [b b b]. The
second rule is the base case.
As in Qi, round brackets signify function applications and
square ones form lists. The following reformulation is an
example.
(defcc <sent>
<np> <vp> := (question <np> <vp>);)
(define question
NP VP -> (append [Is it true that your father] (append VP
[?])))
Here (compile <sent> [the cat likes the dog]) gives
[Is it true that your father likes the dog ?].
If semantic actions are not included, Qi
YACC warns the user and
inserts a default semantic action. This default action appends
non-terminals and conses terminals to form an output. So
(defcc <as>
a <as>; a;)
and
(defcc <as>
a <as> := (cons a <as>);
a := [a];)
generate the same code which acts as an identity function on any
list of as.
Note that placing output one after another in the semantic
actions of Qi YACC amounts to forming a list. The
definition
(defcc <as>
a <as> := b <as>;
a := b;)
forms an embedded list of bs from a list of as; so (compile <as> [a a a a]) gives [b [b [b b]]].
(defcc <as>
a <as> := [b | <as>];
a := [b];)
produces [b b b b] from [a a a a].
HERE IS A GOOD RULE!!
Qi YACC allows you to forget about semantic actions so that you can focus on getting the syntax of your language right. But once you have done that, relying on Qi YACC to second-guess what you want to do can be dangerous, even though 95% of the time it will do what you want. Fill in the semantic actions unless you are sure of yourself. The warning messages are not for show.
| Backtracking in Qi YACC |
Different expansions in an Qi YACC
function F are tried in the order in which they appear. If one
fails, then the next is tried. If all fail then control returns
to the last function that called F. In right recursive cases, it
is important to place the longest expansion first, since
otherwise Qi YACC may fail to recurse fully through the input.
Thus given the choice of
(defcc <as>
a := b;
a <as> := b <as>;)
(defcc <as>
a <as> := b <as>;
a := b;)
The right-hand version is preferred. In other words, the
situation with Qi YACC is the opposite of functional
programming; the base case generally comes last.
Qi YACC practices a limited form of backtracking. If
an expansion fails, then Qi YACC does not backtrack within an
expansion to find an alternative parse. Backtracking only takes
place between and not within expansions. Thus given
the expansion <a> <b> <c>;
suppose the parse should fail at <c>. In
that case, the expansion as a whole would fail, and Qi YACC
would not backtrack to consider reallocating different parts of
the input to <a> or <b>.
| Special Symbols in Qi YACC |
Qi YACC reserves special symbols with a predetermined meaning. These are
| Symbol | Semantics |
| -*- | The head of the input stream. |
| -s- | The tail of the input stream. |
| -o- | The output stream |
| <e> | The empty expansion. |
| ; | The end of a rule. |
| <...> | A non-terminal |
The input stream is the list of objects being parsed.
The output stream is the list of objects being produced.
This program takes a list and
searches for the first occurrence of a digit (0 to 9) in the
list. The non-terminal <morestuff> just
throws away the input after the digit is found. The line -*-
<find-digit> := <find-digit>; causes Qi YACC to
discard the first element of the input stream if it is not a
digit. Notice that we get the unit list containing the digit.
This is because when we return a non-terminal as an output, by
default Qi YACC returns the list of objects that fall under
the non-terminal.
(defcc <find-digit>
<digit> <morestuff> := <digit>;
<digit> := <digit>;
-*- <find-digit> := <find-digit>;)
(defcc <morestuff>
-*- <morestuff>;
-*-;)
(defcc <digit>
0; 1; 2; 3; 4; 5; 6; 7; 8; 9;)
(36-) (compile <find-digit> [a v f g 6 y u])
[6]
This program returns the tail of the input stream after the first
digit found in the stream.
(defcc <find-digit>
<digit> <morestuff> := -s-;
<digit> := -s-;
-*- <find-digit> := <find-digit>;)
(38-) (compile <find-digit> [a v f g 6 y u])
[y u]
This program recognises all lists of the form aibjck
; i.e. a series of i as followed by j bs followed by k cs all in
one list where j is allowed to be 0. The empty expansion <e>
caters for the 0 case.
(defcc <asbscs>
<as> <bs> <cs>;)
(defcc <as>
a <as>;
a ;)
(defcc <bs>
b <bs>;
b;
<e>;)
(defcc <cs>
c <cs>;
c;)
(compile <asbscs> [a a a b b c])
[a a a b b a a a c]
If <e> is
used and no semantic action is associated with the empty
expansion, Qi YACC returns as the output for <e>
everything currently on the output stream. Using <e>
without a corresponding semantic action is not encouraged because
the exact effects are hard to judge. Here <bs>
is recoded to indicate that if the empty expansion is used, then [
] should be placed on the output stream.
(defcc <bs>
b <bs>;
b;
<e> := [ ];)
This program nows acts by returning the input if it is an aibjck
stream or returns #\Escape if not.
| Forcing Backtracking in Qi YACC |
Returning #\Escape
as the result of a semantic action will automatically cause the
expansion to fail. In effect, Qi YACC treats the return of the failure
object as a parse failure and triggers backtracking. This allows
reformulations of certain expansions. For example
(defcc <digit>
0; 1; 2; 3; 4; 5; 6; 7; 8; 9;)
can be written
(defcc <digit>
-*- := (if (element? -*- [0 1 2 3 4 5 6 7 8 9]) -*- #\Escape))
Using -*- and #\Escape together, elegant and computationally
efficient shorthands can be constructed. The following creates a
new 2-place function one_of, that returns the
head of the input, if this head is an element of its second
argument. If not, one_of forces a backtrack.
(defcc <digit>
-*- := (one_of -*- [0 1 2 3 4 5 6 7 8 9]);)
(define one_of
X Y -> (if (element? X Y) X #\Escape))
Here is another elegant higher-order construction is_a?
that returns the head of the input, if this head satisfies a
predicate. If not, is_a? forces a backtrack.
This program finds the first number in a list. This time we get
the number and not the list containing the number.
(defcc <find-number>
<number> <morestuff> := <number>;
<number> := <number>;
-*- <find-number> := <find-number>;)
(defcc <number>
-*- := (is_a? number? -*-))
(define is_a?
F X -> (if (F X) X #\Escape))
(compile <find-number> [a v g t 7 k l])
7
The ability to control
backtracking within semantic actions gives Qi
YACC the ability to
parse context-sensitive grammars. A good example is a grammar
that recognises all lists of the form anbncn
where n > 0. Here is the program.
(defcc <anbncn>
<as> <bs> <cs> := (if (equal-length?
[<as> <bs> <cs>]) (appendall [<as>
<bs> <cs>]) #\Escape);)
(defcc <as>
a <as>;
a;)
(defcc <bs>
b <bs>;
b;)
(defcc <cs>
c <cs>;
c;)
(define equal-length?
[] -> true
[L] -> true
[L1 L2 | Ls] -> (and (= (length L1) (length L2))
(equal-length? [L2 | Ls])))
(define appendall
[] -> []
[L | Ls] -> (append L (appendall Ls)))
(46-) (compile <anbncn> [a a a b b b c c c])
[a a a b b b c c c]
(47-) (compile <anbncn> [a a a b b b c c])
#\Escape
It is important to note that Qi YACC follows an eager model of evaluation whereby the syntax is fully parsed according to the syntax side of a rule, before any semantics is called. Thus
(defcc <foo>
-*- <large_and_nasty> := (if (= -*- 0) #\Escape
<large_and_nasty>);)
will call <large_and_nasty> even if the head of the parse stream is 0.
(compile <foo> [0
1 2 1 1 3])
FUNCALL: undefined function <large_and_nasty>
Only after the completion of the parse will the head of the parse stream be checked.
HERE IS ANOTHER GOOD RULE!!
Qi YACC allows you to force backtracking from semantic actions, but this is not a good idea if the syntax rule entails lots of processing. Use semantic backtracking when
| Final Remarks on Qi YACC |
Qi YACC is a powerful but sometimes tricky tool. More examples are needed to help the programmer than are currently available. What is on this page is all that exists for now. The code for Qi in the Sources download file contains a fair bit of Qi YACC. The source for Qi YACC is very small (around 150 lines of Lisp) and can be read in the Sources directory of the Qi download. See also http://www.pps.jussieu.fr/~jch/software/cl-yacc/ for a larger system written for Common Lisp.
Copyright (c) 2005, Mark Tarver
dr.mtarver@ukonline.co.uk