Shen doc 1.8
The Official Shen Standard

26/09/2011, copyright Mark Tarver


Chinese characters for 'shen' meaning 'spirit'

Preamble

Shen and Qi
Kl
The Motivation for Shen
Future Development of Shen
The Shen License

Basic Types in Shen and Kl
The Primitive Functions of Kl
The Syntax of Kl
Notes on the Implementation of Kl
Kl and the Shen Evaluator
Boolean Operators
The Syntax of Symbols
The Semantics of Symbols in Shen and KLambda
Packages
Prolog
Strings
Strings and Pattern Matching
Lists
Characters
Streams
Reader Macros
Vectors
Standard Vectors and Pattern Matching
Non-standard Vectors and Tuples
Equality
Printing
Generic Functions
Type Declarations
External Global Variables
Property Lists and Hashing
Error Handling
Numbers
Floats and Integers
The Timer
Comments
Extracting Code
Special Forms
Error Handling
Abstract Types for Kl and Shen
Functions Present in Qi II; dropped from Shen
Functions Present in Shen; not in Qi II
Shen and Qi II: differences at a glance

Shen and Qi

Shen is directly derived from Qi. Qi is a hypermodern functional language that offers many features not currently available under other functional platforms. For an introduction to Qi see www.lambdassociates.org.

Qi was written solely to run under Common Lisp. The motive for Shen came from the observation that the implementation of Qi used only 15% of the system functions described in Common Lisp: the Language.

The Shen mission is to develop an ultra-portable version of Qi that can run under a wide variety of platforms and which incorporates missing features in Qi such as streams. The targeted platforms are CL, Clojure, NewLisp, Emacs Lisp, ECL, Scheme, Javascript and Python as well as C/C++, DVM, CLR or the JVM. This approach involved rewriting and redesigning Qi to fit within the smallest feasible instruction set.

Kl

The means of achieving this involves developing a small fast Lisp called Kl which is rich enough to encode Qi, but small enough to be easily mapped into other platforms or implemented as a standalone.

In terms of design, Kl reflects the following design criteria.

1. It should be powerful enough to express Qi.
2. It should be minimal, with very few primitives and so easy to implement. No macros, optional arguments etc. because Shen can support all these things already.
3. It is not designed as a language to compete against Common Lisp; deliberately. It is a Lisp assembly language for a higher level language. The minimalism is deliberate.
4. It should be clean; i.e. it should support partial applications, lexical scoping and tail recursion optimisation.
5. It should be fast; it should support type declarations.

Some comparable small Lisps to Kl are PicoLisp, FemtoLisp, TinyScheme and SmallLisp.

Unlike Common Lisp, Kl is very portable over different platforms. In place of Common Lisp's 1150 pages of specification, Kl requires only 45 system functions. It is much easier to develop a stand-alone implementation of Kl than to build a complete version of CL. Even more significantly, it is possible to map Kl into popular existing platforms such as Clojure, Python as well as Common Lisp.

The implementation, Shen, attached to this document runs Qi on this reduced instruction set and, in terms of primitives required, has nearly 1/3 of the footprint of Qi II. Shen is entirely defined in Kl. Hence Shen is no more difficult to port than Kl itself. All that needs to be done is to map the simple reduced instruction set of Kl into the chosen platform. Our distribution in fact contains one such mapping - of Kl to Common Lisp; and this is done in 250 lines of code.

Kl misses many of the features of bigger Lisps (like macros) because Shen effectively incorporates many of these features. Even LIST and QUOTE are missing from Kl because they are not needed for Shen and Qi has never needed or missed them. Similarly we do not impose a comment convention on Kl because we do not consider that people need to or should write in Kl. The emphasis is on devising an absolutely efficient minimal set needed to port Qi which is easy for people to code in other languages.

However anybody who implements Kl, and wants to write directly in this notation, can add extra features to their implementation to Kl. This does not affect the standard; just as a CL which includes (e.g.) threads does not cease to be a CL even though it incorporates a non-standard feature. However the systems code for Shen will not rely on extra features outside the standard. It is intentionally minimal.

The Motivation for Shen

The development of Shen was motivated by the desire to use Qi outside the ambit of Common Lisp. Though Common Lisp is a very powerful language, it's usage and libraries do not compare well with other languages such as Python. It is, for example, difficult to find providers who support Common Lisp, though many providers will offer Python as part of their services. Hence Shen was devised as the solution. Shen is Qi packaged to run under many platforms.

People have asked why Shen is called 'Shen' . There is a deep reason.

The words 'qi' and 'shen' are part of the Taoist vocabulary. The concept of shen belongs to a triad; jing, qi, shen. They represent stages in the refinement of energy. Jing is the sexual essence; it is the most hormonal and least refined of the life energies but important in the alchemical transformation of our energy into spirit. Qi is better known as life-force or vitality, which accumulates when jing is conserved and our kidney and natal energy is nourished. Shen is the spiritual energy that shows in shining eyes and an alert mind. In Taoist alchemy, the transmutation of jing into qi, and qi into shen is the nature of the highest Taoist practice which leads to seperation of the shen from the corporeal form, immortality and liberation from the wheel of life and death. For this reason shen is translated as 'spirit'

In terms of this process, Qi was nourished within the physical body of a specific platform which was Common Lisp. Having nurtured it to become strong, the goal must be now to seperate Qi from conceptual dependence on Common Lisp to be able to exist as a spirit that can run on any LISP. Hence the process of our work mirrors the ancient Taoist alchemists.

Future Development of Shen

Shen is that of a RISC version of Qi II with extensions to incorporate streams; it is Qi running on a reduced instruction set and that reduced instruction set defines (much of) Kl. However this release and the accompanying language definition are by no means the final word. We are actively developing standards for other aspects of a modern programming language beyond what exists here.

The release of Shen follows this convention: all spec changes increment the leading digit; all patches increment the number following the point. Amplification or rewording of the spec follows the same pattern; it increments the number following the point. Changes to the standards in the spec increment the leading digit.

The Shen License

The Shen license is a freeware license that enables you to freely use Shen for commercial purposes. You are free to write application programs for Shen and do what you want with them. However you are not free to modify Shen or produce a derivative versions of this work and distribute it unless what you produce conforms to the language standard.

This standard is fixed by Dr Mark Tarver and is described in the latest Shen doc document. Essentially the standard is exactly that of Qi in Functional Programming in Qi (second edition) modulo the changes described in that document. Eventually the standard will be fixed in a new edition The Book of Shen.

The motivation for these restrictions is simple. We want to build a community around a reliable standard platform that people can depend on. Unlicensed, non-conformant and buggy hacks do not help us in this mission. The detailed license and it's explanation can be read here.

The Basic Types in Shen and Kl

There are 12 basic types in Shen.

1. symbols .... abc hi-there, The_browncow_jumped_over_the_moon
2. strings ..... any characters enclosed in "s
3. numbers .... all objects closed under +, /, -, *
4. booleans ... true, false
5. streams
6. errors
7. vectors
8. functions
9. lists
10. tuples
11. closures
12. continuations

Note that the last two categories can merge depending on the platform.

Any symbol, string, number or boolean is an atom. Booleans are true and false. Atoms are self-evaluating. As in Qi, booleans are not counted as symbols (in Shen they have their own type). All symbols apart from those used to apply functions are treated as implicitly quoted.

The type system of Shen differs from Qi II in not placing variables and symbols into different types. This arises mainly from dropping rule closures that appeared in Qi II. The type stream is now inbuilt since several primitives use this concept.

The Primitive Functions of Kl

The following set represents the set of 45 primitive functions which Shen requires and which are used in Kl. All other functions are included in the Shen sources. The CL definitions are given to explain the semantics of these functions. Note in some cases these primitives can be reduced still further (e.g and in terms of if etc). In fact lambda alone is sufficient, but impractical. The instruction set is therefore a balance between economy of primitives (easy to implement but inefficient in execution) and practicality (more difficult to implement but possibly faster).

Kl uses strict applicative order evaluation except for the boolean operations etc. which follow the usual pattern. NIL is just a symbol with no special status. () denotes the empty list.

Category

Common Lisp Definition

Type

Boolean Operations (4)
if (DEFMACRO if (X Y Z)
`(LET ((*C* ,X))
***(COND ((EQ *C* 'true) ,Y)
*********((EQ *C* 'false) ,Z)
*********(T (error "~S is not a boolean~%" *C*)))))
boolean --> A --> A --> A
and (DEFMACRO and (X Y)
`(if ,X (if ,Y 'true 'false) 'false))
boolean --> boolean --> boolean
or (DEFMACRO or (X Y)
`(if ,X 'true (if ,Y 'true 'false)))
boolean --> boolean --> boolean
cond

(DEFMACRO cond (Case &REST Cases)
`(if (CAR ,Case) (CADR ,Case) ,(CONS 'cond (CDR Cases))))

 
Symbols (2)    
intern (DEFUN intern (String) (INTERN String)) {approximately} string --> symbol
function (DEFUN function (X) (SYMBOL-FUNCTION (shen-maplispsym X))) (A --> B)--> (A --> B)
Strings (5)    
pos (DEFUN pos (X N) (FORMAT NIL "~C" (CHAR X N))) string --> number --> string
tlstr (DEFUN tlstr (X) (SUBSEQ X 1)) string --> string
cn (DEFUN cn (Str1 Str2)
(CONCATENATE 'STRING Str1 Str2))
string --> string --> string
str (DEFUN str (X)
(COND ((NULL X)
xxxxxxxx(ERROR "[] is not an atom in Shen; str cannot convert it to a string.~%"))
xxxxxxx((SYMBOLP X) (STRING X))
xxxxxxx((ATOM X) (FORMAT NIL "~S" X))
xxxxxxx(T (ERROR "~S is not an atom; str cannot convert it to a string.~%" X))))
A --> string
string? (DEFUN string? (X) (IF (STRINGP X) 'true 'false)) A --> boolean
Assignments (2)
set (DEFUN set (X Y) (SET X Y)) (value X) : A;
X : symbol;
Y : A;______
(set X Y) : A;
value (DEFUN value (X) (SYMBOL-VALUE X))  
Error Handling (3)
simple-error (DEFUN simple-error (String) (ERROR String)) string --> A
trap-error (DEFMACRO trap-error (X F)
`(HANDLER-CASE ,X
(ERROR (Condition) (FUNCALL ,F Condition))))
A --> (error --> A) --> A
error-to-string (DEFUN error-to-string (E) xxx
(IF (TYPEP E 'CONDITION) xxxxxx
x
(FORMAT NIL "~A" E)
x(ERROR "~A is not anxexception~")))
error --> string
Lists (4)
cons (DEFUN cons (X Y) (CONS X Y)) A --> (list A) --> (list A)
hd (DEFUN hd (X) (CAR X)) _
tl (DEFUN tl (X) (CDR X)) (list A) --> (list A)
cons? (DEFUN cons? (X) (IF (CONSP X) 'true 'false)) A --> boolean
Generic Functions (7)
defun (DEFMACRO defun (F Params Code)
`(PROG2 (DEFUN ,F ,Params (kl-to-lisp ,Params ,Code))
(COMPILE (QUOTE ,F))))
 
lambda (DEFMACRO lambda (X Y)
`(FUNCTION (LAMBDA (,X) ,Y)))
X : A >> Y : B;
(lambda X Y) : (A --> B);
let (DEFMACRO let (X Y Z)
`(LET ((,X ,Y)) ,Z))
Y : B;
X : B >> Z : A;
(let X Y Z ) : A
= (DEFUN equal? (X Y) (IF (shen-ABSEQUAL X Y) 'true 'false))

(DEFUN shen-ABSEQUAL (X Y)
(COND ((AND (CONSP X) (CONSP Y)
xxxxxxxx(shen-ABSEQUAL (CAR X) (CAR Y)))
xxxxxxx(shen-ABSEQUAL (CDR X) (CDR Y)))
xxxxxxx((AND (STRINGP X) (STRINGP Y)) (STRING= X Y))
xxxxxxx((AND (NUMBERP X) (NUMBERP Y)) (= X Y))
xxxxxxx((AND (ARRAYP X) (ARRAYP Y))
xxxxxxx(CF-VECTORS X Y (LENGTH X) (LENGTH Y)))
xxxxxx(T (EQUAL X Y))))

(DEFUN CF-VECTORS (X Y LX LY)
xxxxx(AND (= LX LY) (CF-VECTORS-HELP X Y 0 (1- LX))))

(DEFUN CF-VECTORS-HELP (X Y COUNT MAX)
xxx(COND ((= COUNT MAX)
xxxxxxxxxxx(shen-ABSEQUAL (AREF X MAX) (AREF Y MAX)))
xxxxxxxxxx((shen-ABSEQUAL (AREF X COUNT) (AREF Y COUNT)) xxxxxxxxxxx(CF-VECTORS-HELP X Y (1+ COUNT) MAX))
xxxxxxxxxx(T NIL)))
A --> A --> boolean
eval-without-reader-macros (DEFUN eval-without-macros (X)
xxxxxxx(EVAL (kl-to-lisp NIL (shen-shen->kl X))))
 
freeze (DEFMACRO freeze (X)
**`(FUNCTION (LAMBDA () ,X)))
A --> (lazy A)
type (DEFUN type (X Type) (DECLARE (IGNORE Type)) X) X : A;
(type X A) : A;
Vectors (4)
absvector (DEFUN absvector (N) (MAKE-ARRAY (LIST N)))  
address-> (DEFUN address-> (Vector N Value)
(SETF (AREF Vector N) Value) Vector)
 
<-address (DEFUN <-address (Vector N)
xxxxx(AREF Vector N))
 
absvector? (DEFUN absvector? (X) (IF (ARRAYP X) 'true 'false)) A --> boolean
Streams and I/O (4)
pr (DEFUN pr (X S)
(WRITE-STRING (shen-printer-routines X shen-*print-routines*) S))

(DEFUN shen-printer-routines (X Routines)
xxx(IF (NULL Routines)
xxxxxxxX
xxxxxxx(shen-printer-routines
xxxxxxxxxxx(FUNCALL (CAR Routines) X) (CDR Routines))))
string --> (stream out) --> string
read-byte (DEFUN read-byte (S)
(IF (EQUAL S *STANDARD-INPUT*)
xxxx(CHAR-CODE (READ-CHAR S))
xxxx(READ-BYTE S NIL -1)))
(stream in) --> number
open (DEFUN open (Type String Direction)
(LET ((Path (FORMAT NIL "~A~A" *home-directory* String)))
(COND ((EQ Type 'file) (file-stream Path Direction))
xxxxxxx(T (ERROR "invalid stream type")))))

(SETQ *home-directory* NIL)

(DEFUN file-stream (String Direction)
(COND ((EQ Direction 'in)
xxxxxxxx(OPEN String :DIRECTION :INPUT
xxxxxxxxxxxxxxxxxxxxx:ELEMENT-TYPE 'UNSIGNED-BYTE))
xxxxxxx((EQ Direction 'out)
xxxxxxxx(OPEN String :DIRECTION :OUTPUT
xxxxxxxxxxxxxxxxxxxxx:IF-EXISTS :OVERWRITE
xxxxxxxxxxxxxxxxxxxx :IF-DOES-NOT-EXIST :CREATE))
xxxxxxx(T (ERROR "invalid direction"))))
a dependent type (see notes)
close (DEFUN close (Stream) (CLOSE Stream) NIL) (stream A) --> (list B)
Time (1)
get-time (DEFUN get-time (Units)
(* 1.0 (/ (GET-INTERNAL-RUN-TIME) Units)))
number --> number
Arithmetic (9)
+ See number package number --> number --> number
- See number package number --> number --> number
* See number package number --> number --> number
/ See number package number --> number --> number
> See number package number --> number --> boolean
< See number package number --> number --> boolean
>= See number package number --> number --> boolean
<= See number package number --> number --> boolean
number? (DEFUN number? (X) (IF (NUMBERP X) 'true 'false)) A --> boolean
Total 45 functions

The Syntax of Kl

is very simple and conforms to a Lisp. Well-formed sentences of Kl are symbolic expressions (s-exprs).

  1. Any symbol, boolean, string or number is an atom.

  2. Any atom is an s-expr.

  3. Any abstraction (lambda X Y) is an s-expr if X is a symbol and Y is an s-expr.

  4. Any local assignment (let X Y Z) is an s-expr if X is a symbol and Y and Z are s-exprs.

  5. Any definition (defun F X Y) is an s-expr if F is a symbol and X is a (possibly empty) list of symbols (formal parameters) and Y is an s-expr.

  6. Any application (X1 ... Xn) is an s-expr if X1, ... Xn are s-exprs.

Notes on the Implementation of Kl

These notes are for programmers wishing to implement Kl.

Tail recursion optimisation is a must and part of the Kl standard. Quite of few of the reader routines for Shen will not work in a language like Python which lacks it. In such a case, a platform provider working to move Shen to a non-TCO language has to consider how to port tail recursive Kl code.

Kl uses lexical scoping. This is pretty standard now. Qi II actually used dynamic scoping in its implementation of Prolog to implement variable binding but in Shen this is now done with vectors.

Kl follows applicative order evaluation. Except for the following constructions.

cond, and, or, if, freeze, let, defun

Kl follows a dual namespace model. A classic model for a Lisp views 'defun' and 'set' as interchangable.  Both are thought of as asserting a math'l identity so that  (defun f (x y) y) is sugar for (set 'f (lambda x (lambda y y))).  This model supposes a single namespace for functions and variables.  You have this in Python.

So in the classic model 'defun' and 'set' together are logically superfluos. It is more reasonable to regard 'set' as logically prior since incrementing a global variable is much more easily carried out through 'set' than 'defun'. Incrementing should be very fast - function compilation is generally very slow.

In a dual namespace model, the 'defun' and 'set' are regarded as creating an association between the symbol and something else; it is a mapping not an assertion of identity.

Generally Qi and Shen require a dual namespace for symbols.  The reason for this is to do with the Qi evaluation strategy for symbols which is that symbols are self-evaluating.  In Qi the symbol 'f' evaluates to itself.  If we want to get at the value associated with 'f', we type '(value f)'.  Hence f is not thought of as shorthand for a value, but is merely a symbol to which objects (definitions, global assignments etc) can be attached.

Generally, if we had a single namespace, we would have a convention of a different kind. For instance if '(set f 6)' entailed that 'f' really meant (was just a shorthand for) 6, then we would expect that typing in 'f' to the REPL would return 6.  In Python that's exactly what happens with symbols.

>>> f = 9
>>> f
9
>>> def f (): return(9)
...
>>> f
<function f at 0x024574F0>

But Qi does not work that way.  Allowing symbols as self-evaluating really points away from this model.  Hence Kl and Shen are committed to the dual namespace approach.  

Partial applications are mandatory. Thus (defun f (x y) ....) and (defun f (x) (lambda y ...)) are equivalent. There are efficient techniques for compiling Kl into languages which don't respect this rule (like CL) - see the document on porting.

Kl and the Shen Evaluator

The Shen evaluator compiles Shen into Kl and Kl into native code. The Shen evaluator accepts S-exprs as legal input and since Kl expressions are such, any Kl expression is a legal input to Shen. Note that if a Kl expression is typed into Shen, that special constructions such as the list/string/vectors constructors in Shen which are outside the Kl spec (see the sections below) will actually work within such expressions because they are compiled into legal Kl. Thus the function;

A. (defun list-all (x y z) [x y z])

is not legal Kl and would have to be written as follows to be legal Kl

B. (defun list-all (x y z) (cons x (cons y (cons z ()))))

However expression A. will be accepted and compiled by Shen into expression B. Hence hybrid programming will work in Shen. We don't actually recommend this style, because Kl is not designed for the purposes of programming, but for easy porting and implementation. However you can write Kl code in Shen which is as compact as Common Lisp.

Boolean Operators

The list of boolean operators contains some logical redundancy. Logically only if is required to define the rest. However even ANSI C contains a version of the basic repertoire if, and, or, not and the cond is trivially compilable into a nested if. Because Kl does not contain macros and uses strict applicative order evaluation outside boolean operations, these are not efficiently interdefinable in the language itself. Note that in Kl a cond that fails in all its cases does not deliver NIL (as in CL) but an error.

Shen includes a cases statement which has the syntax

(cases <test a> <result a>
xxxxxx<test b> <result b>
...............)

and which is equivalent to (if <test a> <result a> (if <test b> <result b> ....)). If no cases apply an error is returned.

The Syntax of Symbols

In Shen an atom is either a symbol, boolean, string or number. All atoms, thus including symbols, are self-evaluating. The empty list is not counted as an atom but is also self-evaluating. Hence there is no quote in Kl or Shen. Mapping into CL which does use quote is trivial. The function symbol? is the recognisor for symbols; their BNF is as follows.

<symbol> := <alpha> <symbolchars> | <alpha>
<symbolchars> := <alpha> <symbolchars> | <digit> <symbolchars> | <alpha> | <digit>

<alpha> := a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
<alpha> := A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
<alpha> := = | - | * | / | + | _ | ? | $ | ! | @ | ~ | > | < | & | % | ' | #| ` | ; | : | { | }

<digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

As in Qi, a symbol is a variable if it begins in an uppercase letter.

The function intern maps a string to a symbol or boolean; (intern "foo") returns foo. Note that like Qi, Shen reads certain symbols of special significance like { and ; by inserting whitespace; {a --> b} is read as { a --> b } and foo; as foo ;. It is possible to use intern to create composite symbols containing these characters.

The Semantics of Symbols in Shen and Kl

Qi II and Common Lisp are all members of the Lisp family of languages. In Lisp, symbols are often semantically context sensitive; that is, the interpretation of a symbol may depend on the context in which it occurs. For example in this Common Lisp expression; (list 'John 'put 'the 'car 'into 'reverse), or in Qi II [John put the car into reverse], the symbol 'reverse' denotes a symbol. But in Common Lisp the expressions (reverse '(1 2 3)) (in Qi II (reverse [1 2 3])) and (mapcar 'reverse '((1 2) (2 3) (3 4))), (in Qi II (map reverse [[1 2] [2 3] [3 4]])), the symbol 'reverse' actually denotes a function.

That means in the Lisp family, symbols are semantically context sensitive. At the head of an application they can only denote a function and there is no ambiguity; but within the body of the application they can denote either a function or the symbol quoted. Lets us say a symbol is innocent in Qi if it is used simply to refer to itself and non-innocent if not

To help reading, Common Lisp includes the expression 'function' which disambiguates innocent symbols from the other kind; the expression (mapcar (function reverse) '((1 2) (2 3) (3 4))) does what (mapcar 'reverse '((1 2) (2 3) (3 4))) does but the semantics of 'reverse' is clarified to show that the programmer is referring to the function not the symbol itself.

Qi I and Qi II followed Common Lisp in making symbols semantically context sensitive, relying on the Common Lisp compiler to decipher the ambiguity and make the correct choice. Being specifically tailored for multiplatform development, Shen can make no such assumption. It is quite possible that the native platform does not support symbols as data objects in the manner of Lisp; in which case it is possible to construct a simulation of Lisp symbols by using tagged 2 element vectors (one element a tag to show it is impersonating a symbol, the other to hold a string representation of the symbol; see Coping with Missing Symbols in the document of porting).

However the problem arises of the context sensitivity of symbols; when should a symbol should be 'vectorised' in a non-Lisp language and when not? For instance, if the symbol 'reverse' is vectorised in the context (map reverse [[1 2] [2 3] [3 4]]), then the map function will crash, since it receives a vector and not a function as an argument. But in the context [John put the car into reverse], it is right to vectorise the symbol 'reverse' because it is naming itself and not calling a function. How does one proceed?

There are several solutions to this problem. One is to simply avoid innocent symbols and insist that strings do all the work of innocent symbols. This follows the path of languages like ML where innocent symbols do not exist. It would mean that in Shen, [p & q] could not be written; only ["p" "&" "q"]. However this would probably not appeal to those Lispers, like myself, who enjoy the convenience of working with innocent symbols.

The second is to treat all non-variable symbols in the body of an application (apart from the leading symbol which must denote a function or procedure of some form) as innocent, thus vectorising them. In this case, to prevent any higher-order function H from failing, H must, when applying any vectorised symbol S, look for the function F associated with S and use F instead. This introduces a delay into the execution of H and moreover means that native higher-order functions in the platform, which are not set up to interface to vectorised symbols, will still fail in the manner described above.

The third solution is to provide 'function' as a means of disambiguation and this is what Shen and KLambda do. Hence any symbol S which is an argument to an application is treated as innocent. But if S is enclosed as in '(function S)', the whole expression denotes a function. Hence to write portable Shen; one should avoid writing an expression like '(map reverse [[1 2] [2 3] [3 4]])' and write '(map (function reverse) [[1 2] [2 3] [3 4]])'. The former construction will certainly work under a Common Lisp platform, but the latter will run under Common Lisp and platforms outside the Lisp family.

Packages

Packages exist to avoid the usual danger of overwriting when two programmers accidently choose the same symbols in their programs to identify different values. The polyadic function package has the form (package S L E1 ... En) where

1. S is a symbol beginning in lowercase which is the name of a package; (e.g mypackage).
2. A list L (possibly empty) of non-variable symbols.
3. E1 ... En are a series of Shen expressions.

The Shen reader prepends the package symbol before all the symbols when evaluating E1 ... En apart from those symbols which are

(a) symbols listed as belonging to the system (such as ->, define, cons, append etc) or ...
(b) symbols which are variables or ...
(c) ... symbols which are listed in L or .....
(d) ... symbols internal to the Shen package or ....
(e) ... symbols consisting entirely of underscores or entirely of equals signs.

Symbols which are prepended we say are internal (to the package). Symbols which are not are external.

Hence (package mypackage [main] (define main ...) ....) will cause Shen to make all user functions read in its scope to be internal to mypackage apart from the symbol main which will be external.

The philosophy of Shen is that once a programmer has decided a symbol is internal to a package and is hidden from view that decision cannot be overridden except by changing the definition of the package. Hence the complexities of IMPORT and EXPORT found in Common Lisp are not reproduced in Shen. You cannot be 'in' a package in Shen within the REPL. It is possible to declare a package in a package.

The null package written (package null ....) has no effect on its contents. This used in certain advanced applications involving reader macros.

All the code for Shen, which is written in Shen, is contained in the package 'shen-', apart from those functions and symbols which are part of the specification of the language.

Prolog

Shen contains a Prolog, just as Qi II, but the m-prolog syntax has been dropped. The main reason for this was that embedding executable code in a string (to preserve conformancy with Edinburgh syntax) generated awkward anomalies with respect of the rest of the system. For example a special string searching routine had to be developed for m-prolog declarations embedded in a package; symbol overloading had to be used because Edinburgh Prolog uses '=' to mean something different from simple equality; you cannot insert comments inside an m-prolog program and searching in an m-prolog program is more difficult since the structure is in a string not a list. To compensate Qi developed a low level s-prolog convention in which Prolog programs were s-exprs.

In place of the awkward dual convention, Shen has one Prolog notation consistent with the rest of Shen which uses defprolog. Here are the member, reverse and append functions in Shen Prolog.

(defprolog member
xxX [X | _] <--;
xxX [_ | Y] <-- (member X Y);)

(defprolog rev
xx[] [] <--;
xx[X | Y] Z <-- (rev Y W) (conc W [X] Z);)

(defprolog conc
xx[] X X <--;
xx[X | Y] Z [X | W] <-- (conc Y Z W);)

The following functions are found in Shen Prolog

predicate arity description
unify 2 unifies terms
unify! 2 unifies terms with an occurs check
identical 2 succeeds if the terms are identical
is 2 binds the variable which is the first term to the result of evaluating the second. All variables in the second are completely dereferenced.
bind 2 as 'is' except all variables in the second term are dereferenced only so far as to derive a non-variable result.
findall 3 takes a variable X , a literal (list) L and a variable Y and finds all values for X in L for which L is provable and binds the list of these values to Y.
when 1 the term is evaluated to true or false and the call succeeds if it is true. All variables in the term are completely dereferenced.
fwhen 1 the term is evaluated to true or false and the call succeeds if it is true. All variables in the term are dereferenced only so far as to derive a non-variable result.
call 1 apply the predicate at the head of the list to the terms of the tail. (Prolog apply)
return 1 terminate Prolog returning the dereferenced value of the term.
! 0 Prolog cut

Strings

A string in Shen begins with " and ends with ". To embed a string in a string the escape character is \ which instructs the reader to ignore the conventions attaching to the succeeding character and read it as part of the string.

Note: \ has a very limited role in 1.7 and this is a placeholder only. We are working on extending \ to embrace control characters. As of writing (September 10, 2011), the conventions have not yet been finalised.

The basic functions for strings are cn - which concatenates two strings together, pos, which takes a non-negative number N (starting from zero) and returns the Nth unit string of the string and tlstr which returns all of the string apart from its leading unit string. The function str places an atom into a string (effectively enclosing it in quotes); it is an error to use str on anything but an atom. A more general function make-string will place any Shen object into a string, preserving the appearance in a way that makes it conformant to Shen printing conventions (see printing). string? recognises all and only strings.

The Shen reader reads unsigned 8 bit bytes into unit strings and parses these strings into other tokens as required. By default the list of acceptable unsigned bytes is a subset of the ASCII code points between 0 and 127, including all the points from 32 to 126 and the points 9,10 and 13. The function byte->string maps a code point to the corresponding unit string. It is consistent with the Shen specification to extend the domain of this function to incorporate extended ASCII or Unicode, but the mentioned code points must be supported.

Strings and Pattern Matching

The polyadic function @s can be used to concatenate n (n >= 2) strings. The polyadicity is a syntactic fiction maintained by the Shen reader. (@s "a" "b" "c") is parsed as (@s "a" (@s "b" "c")) exactly in the manner of @p.

The Shen reader parses (@s "123" "456") in a special way; as (@s "1" (@s "2" (@s "3" "456"))). The leading argument, if a string, is decomposed into a series of concatenations of unit strings. The significance of this is realised in the use of @s for pattern-matching over strings.

Within a function @s may be used for pattern-matching. For example; the following removes all occurences of my Christian name from a string.

(define remove-my-name
xx{string --> string}
xx
"" -> ""
xx(@s "Mark" Str) -> (remove-my-name Str)
xx(@s S Str) -> (@s S (remove-my-name Str)))

which is parsed into the following.

(define remove-my-name
xx{string --> string}
xx"" -> ""
xx(@s "M" (@s "a" (@s "r" (@s "k" Str)))) -> (remove-my-name Str)
xx(@s S Str) -> (@s S (remove-my-name Str)))

Lists

In Shen as in Qi, a list consists of a series of items, seperated by whitespace, and flanked by [ and ] to the left and right. [] is the empty list as is (). Note that Kl does not understand [...] and that the Shen reader translates this idiom into Kl. The basic constructors are cons, hd and tl and cons? corresponding to CONS, CAR and CDR, and CONSP in Lisp.

It is an error to apply hd or tl to anything but a list.

There is the question of how to treat the application of hd to the empty list []. Ideally this should produce an error. In Common Lisp the CAR of the empty list is the empty list. Actually coding hd so that it returns an error in Common Lisp requires encoding a non-empty list test into the definition of hd. This is generally unnecessarily expensive in such a heavily utilised function, because often the programmer knows before applying hd that the list is non-empty. Hence in Shen hd does not presuppose a CONSP test and the result of applying hd to the empty list is platform dependent. For implementors building Kl from scratch we recommend raising an error, as applying hd to the empty list is a deprecated operation.

For that reason in Shen, hd is not given a type since its behaviour is type unpredictable. There is a function head of type (list A) --> A in Shen which is well-behaved and which does make a non-empty list test and which raises an error if applied to the empty list.

Similar observations apply to tl which if applied to the empty list in Common Lisp produces an empty list. In other languages, an error may arise. Hence by parity of reasoning, the result of (tl ()) is platform dependent and there is no type for tl. There is a function tail of type (list A) --> (list A) in Shen which is well-behaved and which does make a CONSP test and which raises an error if applied to the empty list.

Note that cons applied to A and B where B is not a list provides a result which is called a dotted pair. This form of application is needed in Shen in the internals of Shen Prolog. In Shen cons does have a type because the type checker is capable of failing dotted pair applications as type insecure. Hence the type of cons is A --> (list A) --> (list A). In Shen, the dotted pair (cons a b) is printed off as [a | b].

Characters

Shen does not incorporate characters as a data type. The failure object is no longer #\Escape as in Qi, but (fail).

Streams

Streams in Shen were introduced to encode some of the functions that were hard-wired into Qi II like write-file and read-file. As with the rest of this first release, the goal was to capture just as much in the way of primitives as is necessary to reproduce the functionality of Qi II. It is likely that this section will expand to incorporate other kinds of streams than the ones mentioned here,

The basic functions for streams are

open
close
read-byte

Note: there is a motion to add write-byte to this list of primitives and to move pr from the status of primitive to defined function. As of writing (September 10, 2011), this has not yet been finalised.

The function open creates a stream and the first argument, determines the nature of the stream, the second argument, a string, is the sink or source, and the final argument, the direction, which is either in or out determines whether the second argument is a source or sink . So far in the first release, streams are used only for reading to and from files, and file is the only legal first argument for open. (open file "myfile.txt" in) creates a stream for reading a file. Material written to a file overwrites the contents already in it. The function close closes a stream and returns the empty list.

The open function works in hand with the cd command in Shen which fixes the home directory. All files are opened relative to the value for the home directory, which is held in the global *home-directory*. The value of this global is changed by the cd function common to Qi and Shen.

Every Shen stream is read in as a series of unsigned 8 bit bytes and the primitive function is read-byte which takes a stream as an argument and returns a byte as a decimal number. read-byte is a sugared function which can appear with no arguments; in that case the stream read is the standard input (terminal) stream *stinput*. If the stream is empty then read-byte returns -1. Note read-byte is a destructive function and hence Shen streams are mutable (the other destructive functions are address->, vector-> and set).

Having such base reader, it is possible to build any reader on top of it. A UTF8, UTF16 or UTF32 stream reader may be built from 8 bit bytes stream reader, likewise a reader for Shen tokens. Such a reader for files (read-file) is in fact supplied in Shen (as in Qi II) as a system function.

This short (non-TRO) program reads a file as a list of bytes.

(define read-file-as-bytelist
xxxFile -> (read-file-as-bytelist-help (open file File in)))

(define read-file-as-bytelist-help
xxxStream -> (let Byte (read-byte Stream)
xxxxxxxxxxxxxxxxx(if (= Byte -1)
xxxxxxxxxxxxxxxxxxxx[]
xxxxxxxxxxxxxxxxxxxx[Byte | (read-file-as-bytelist-help Stream)])))

The type stream is a parametric type in Shen and in the 2011 release has only two possible parameters - in or out. The type of the open function is a dependent type which cannot be given a type in arrow notation but requires the following sequent calculus definition.

File : string; Direction : direction;
(open file File Direction) : (stream Direction);

where the inhabitants of the type direction are 'in' and 'out'.

Reader Macros

The Shen reader sits on top of the primitive 8 bit bytes stream reader and reads from either a file (using open) or the input stream. The reader is programmable just as in Qi II, but uses the defmacro construction, which is actually cleaner and easier to use than the Qi II sugar function. However internally, the defmacro construction is handled using similar techniques as for sugar functions. A macro in Shen is a 1-place function that is used by the reader to parse the user input. Here for instance is a macro used for benchmarking Shen.

(defmacro exec-macro
xx[exec Expr] -> [trap-error [time Expr] [/. E failed]])

The action of this macro is to macroexpand every instance of (exec Expr) as typed or loaded into the top level by the macroexpansion of (trap-error (time Expr) (/. E failed)). There is no need to include a default X -> X at the bottom of a macro (as is needed in Qi II) - this is inserted automatically. Shen macros are tied into the Shen eval function (see eval below). *macros* contains the list of current macros. The function macroexpand applies the list of current macros to the top level of an expression.

Vectors

In Kl and Shen, (1 dimensional) vectors fulfil all the tasks of arrays. In Kl there are only 4 primitive functions concerned with vectors.

1. absvector - which creates an absolute vector with N addresses.
2. address-> - which places an element E into a vector address N in vector V and returns the resultant vector.
Note this function is destructive and hence absolute vectors are mutable.
3. <-address - which extracts an element from a vector address N in vector V.
4. absvector? which recognises a vector.

All of these functions are accessible from Shen but only the last has a type, since Kl vectors may have elements of any type. It is an error to try to access an address beyond the limit of the vector or to supply any number which is not a whole number between 0 and the limit of the vector.

Note that absvector? plugs into the native vector recognition routine. It is possible for this function to return true to objects which are not vectors under other platfroms. In Lisp for instance, strings are vectors of characters and the default representation of tuples in Shen is by use of vectors. The function vector? (see below) is better behaved in this respect.

Vectors are numbered from 0; so (absvector 100) creates a vector with addresses from 0 to 100 and the limit is 100.

If a vector V is created and nothing has been stored address V(N) then the result returned by (<-address V N) is platform dependent.

In Shen absolute vectors are partioned into standard and non-standard vectors. In Shen, by convention, when a standard vector V is created two operations are performed.

1. The 0th element of V is allocated a positive integer which indicates the size (limit) of the vector.
2. Every other address is allocated the failure object designated by the expression (fail).

The function vector of type number --> (vector A) creates such a vector. If the 0th element of V is not a non-negative integer then the vector is non standard. Hence access to the user contents of the standard vector begins with the index N = 1.

The shortest standard vector is created by expression (vector 0) which creates a standard vector which contains one element whose contents are zero. This called the empty vector and is significant in pattern-matching over vectors (see next section). It is impossible to write to the empty vector in a type secure way and hence under type security, the empty vector is immutable. Shen permits the user to write <> as shorthand for the empty vector. The type of the empty vector is (vector A).

In Kl the primitive function <-address in (<-address N V) accesses the Nth element of the vector V including the 0th element which indicates the limit of the vector. This function has no type because the 0th element of a standard vector V will be an integer irrespective of the rest of the vector.

The type secure version of <-address is the function <-vector of type (vector A) --> number --> A, which accesses the contents of the standard vector. The operation (<-vector V 0) results in an error. If <-vector accesses the failure object then an exception is returned as an error message. Otherwise <-vector behaves exactly as does <-address. The function limit has the type (vector A) --> number and accesses the 0th element of a standard vector V. Both are simply defined in terms of <-address.

A 2-dimensional array is simply a vector of vectors and therefore has a type which is an instance of (vector (vector A)). Note that a vector of vectors may incorporate vectors of different sizes (the result is called a jagged array).

For changing the contents of a vector, the function address-> in (address-> X N V) places X in the Nth element of V. The function vector-> is type secure version of address-> of type ((vector A) --> number --> A --> (vector A) and raises an error for N = 0.

The function vector? returns true iff the argument is a standard vector.

Standard Vectors and Pattern Matching

The polyadic function @v can be used to add elements to a vector or to create a vector. The vector consisting of the numbers 1, 2 and 3 can be created by (@v 1 2 3 <>). The polyadicity is a syntactic fiction maintained by the Shen reader. (@v 1 2 3 <>) is parsed as (@v 1 (@ v 2 (@v 3 <>))) exactly in the manner of @p.

The semantics of @v is as follows: given (@v X Y), if Y is a standard vector of size N, then @v creates and outputs a new vector V of size N+1 and places X in the first position of V, copying the Ith element of Y to the I+1 element of V.

Shen accepts pattern-matching using @v. The following function adds 1 to every element of a vector

(define add1
xx{(vector number) --> (vector number)}
xx<> -> <>
xx(@v X Y) -> (@v (+ X 1) (add1 Y)))

(@v X Y) matches the Y to the tail of the vector - so that matching (@v X Y) to <1 2> matches Y to <2> not 2. Note because @v uses copying, pattern-directed vector manipulation is non-destructive and treats vectors as immutable objects.

Non Standard Vectors and Tuples

A non-standard vector is a vector where the 0th element is not a non-negative integer. The utility of non-standard vectors is that they can be used to construct other data types like tuples.

Tuples in Shen are not primitive to Kl but are represented in the Shen sysfile code as non-standard three element vectors where the 0th element is a tag tuple indicating that the vector represents a tuple and the next two elements are the first and second elements of that tuple. The basic operations such as @p, fst and snd are easily definable as is the recognisor tuple?.

Because tuples are defined internally using Kl primitives as non-standard vectors, the type system of Shen does not recognise them as standard vectors and hence, though they can be manipulated using vector operations they cannot be manipulated like this in a type secure way (i.e. with type checking enabled). Only @p, fst and snd can be used. The significance of this is that with type checking enabled, tuples are immutable objects; that is they cannot be destructively changed with address-> or vector->, merely interrogated for their parts or combined into other data structures.

The identity of tuples with non-standard vectors is purely one of convenience. In platforms which support tuples as a native type, the native type may be used. However the following equations must hold.

  1. The recognisor tuple? returns 'true' to tuples, but not to any other type checked Shen datatype.

  2. The tuple (@p 1 2) is printed off as such.

  3. @p is a two place function which associates to the right; (@p a b c) is just (@p a (@p b c)).

  4. (fst (@p a b)) = a and (snd (@p a b)) = b for all a and b.

Equality

= tests for equality between objects, returning either true or false. Note unlike Qi II, = will work on all vectors generally; two vectors V1 and V2 are equal iff for all I (<-address V1 I) = (<-address V2 I) . However comparison of closures or streams will return false under =. Lists, atoms, tuples and vectors are the proper range of this function. Kl is case sensitive, (= a A) returns false.

Printing

All the print routines in Shen are defined (ultimately) in terms of the primitive pr which takes a string and a stream and prints the string to the stream and returns the string. If the stream argument is omitted then the stream defaults to the terminal.

Note: there is a motion to add write-byte to this list of primitives and to move pr from the status of primitive to defined function. As of writing (September 10, 2011), this has not yet been agreed nor has the status of control characters with respect to the printer.

As with Qi, Shen includes the print, error and output statements in much the same way as in Qi II; they are explained here http://www.lambdassociates.org/Book/page042.htm. print prints its argument exactly as it appears and returns that argument, output prints a string using slots if needed and returns a string. Note in Shen, the slot ~S is supported as well as ~A. The effect of ~S (as in Common Lisp) is that string quoting is preserved. ~% forces a new line. Thus (output "~A" "hello there") prints hello there but (output "~S" "hello there") prints "hello there".

Note that output returns a string as a value which is the same string that it prints.

Shen provides an extra formatting command ~R, which prints the argument using ()s rather than []s, which is useful for printing logical and mathematical formulae.

All three functions depend on make-string which builds a string from a template and a series of arguments. Thus (make-string "~A loves ~A and ~A" John Mary Tim) returns "John loves Mary and Tim". Use make-string to coerce an arbitrary object into a string. To write to a stream, you can use pr and make-string combined as in (pr (make-string "~A loves ~A and ~A" John Mary Tim) <stream>).

There is also a function (nl N) which prints off N new lines and returns zero. If the argument is omitted, then (nl) prints off a single new line.

Note that print, error, output and make-string are all polyadic and that therefore they are syntactic fictions which are represented by internal functions of a fixed arity.

Vectors and lists are printed off using <...>. The vector whose elements from address 1 to the end are 1, 2 and 3 is printed off as <1 2 3>. Vectors or lists of more than 20 elements have the remaining elements printed off under 'etc' e.g a standard vector of the first 100 positive integers would be printed off as

<1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20... etc>

The global *maximum-print-sequence-size* controls this feature and should always be set to a positive whole number.

Note: vectors may in future be printed off in (@v ...) notation so that they can be read back. As of writing (September 10, 2011), this has not yet been finalised.

Non-standard vectors are printed off in a special manner. For example, in porting Shen to a platform which lacked symbols as a basic datatype, the programmer can define symbol as a new immutable datatype comprised of a non-standard vector whose zeroth address holds a tag indicating that the vector represents a symbol and whose first address holds a string representing the symbol.

In this case the programmer can indicate in the non-standard vector itself how the object is supposed to be printed off by making the tag into a print function. This is called a print vector in Shen. Thus the representation of the tuple (@p a b) in Shen is:

tuple a b

The Shen printer, when confronted with a non-standard vector V whose 0th address contains a non positive integer F, uses F as as a formatting function - the function that determines how the non-standard vector is printed. This formatting function tells Shen how the print representation of V is to be bundled into the string which is eventually printed off and hence how that print vector will appear when printed. Hence the tuple function will map the tuple to a string. In Shen it is defined as follows.

(define tuple
X -> (make-string "(@p ~S ~S)" (fst X) (snd X)))

If the non-standard vector has no associated print function then it is printed off as a normal vector but with the 0th element included.

The global *printer* is set by default to a unit list containing the 1-place function shen-sys-print. This global is tied into the pr function and the contents are applied to the input of this function. As such, it affects the printing of all outputs. Functions placed in this global should always be of type string --> string.

Generic Functions

This section deals with the generic functions; defun, let, lambda, eval-without-reader-macros, freeze and thaw.

defun

defun in Kl requires little explanation except to note that all functions defined using it must sustain currying and that the namespace model is dual (see the document on porting for more on this). There is no necessity to support nested definitions in the manner of Scheme.

lambda

lambda in Kl is deliberately spartan; following lambda calculus in defining an abstraction that accepts only one argument. let is strictly otiose being definable by the equation.

(let X Y Z) = ((lambda X Z) Y)

However this form is less natural and less familiar than the traditional local assignment and is not definable except by a macro. Note that in Shen (lambda X X) is legal.

eval-without-macros

eval-without-macros essentially compiles Shen into the native platform. Thus given the list [cons 1 []] as input; eval-without-macros evaluates this to an expression in the platform language that when evaluated by that language returns the list (1) - printed as [1] by Shen.

The function eval is not a primitive in Kl, but eval-without-macros is. eval in Shen applied to an an expression E returns the expression E' that results from evaluating the expression E'' where E'' results from E by the replacement of all the square brackets in E' by round ones (see www.lambdassociates.org/Book/page131.htm). Thus (eval [+ 1 2]) evaluates to what (+ 1 2) evaluates to - which is 3.

The difference between eval and eval-without-macros is that in Shen, when eval is evoked, the input is first macroexpanded by tree walking the expression before being passed to eval-without-macros. The reason for this is to allow the user to evaluate expressions using the same syntax as the Shen repl, which uses these macros in the reader to permit polyadic functions.

eval-without-macros evaluates any Shen expression to its normal form, including Shen definitions, by compiling that expression to native code and executing it. Constructions like Shen-YACC and Shen Prolog which use macros to compile them, will not work under this function though they will work under eval.

freeze and thaw

The function freeze freezes the computation represented by its argument which is not evaluated. Effectively freeze returns a continuation; defined by Wikipedia as "[the reification of] an instance of a computational process at a given point in the process's execution" see http://www.lambdassociates.org/Book/page153.htm. The counterpart to freeze is the function thaw which unfreezes the computation and returns the evaluated result. thaw is not primitive being readily defined in Kl as

(defun thaw (F) (F))

Type Declarations

The primitive type is used to annotate any s-expression. The notation is

(type <S-expr> <type>)

as in (type (+ 1 2) number). This is not in any sense a normal function since the type <type> is discarded in evaluation (i.e. (type <S-expr> <type>) evaluates to the normal form of <S-expr>. The type declaration is not ignored by the type checker which actually expects the expression to have the type attached to it and will signal an error if this is not so. Note that if <type> should be monomorphic

The degree to which any implementation of Kl uses the information provided by this annotation will depend on the platform and the implementation. Note that later versions of the Shen compiler will automatically generate copious type annotations in Kl which can be used by Kl implementors or platform providers of Shen to optimise the resultant code.

External Global Variables

The following variables are external to the Shen package. The variable *language* is bound to a string which indicates the platform under which Shen is running. For Common Lisp this is "Common Lisp". The global *implementation* is bound to the implementation; e.g. "CLisp 2.45". *port* and *porters* are bound to the port version on that platform and the authors of the port. The variable *stinput* is bound to the standard input stream. *home-directory* is bound to a string which denotes the directory relative to which all files are read or written. *version* is bound to a string detailing the release. *maximum-print-sequence-size* determines the maximum size to which a sequence is fully printed. *printer* is bound to the list of functions used to print off any output. *macros* contains the list of current macros.

There is a motion to include *stoutput* to this list. As of 10th September 2011, this has not been finalised.

Property Vectors and Hashing

Like Qi, Shen includes property lists. However they are not implemented using CL property lists, but instead rely on a hashing function into a standard vector *property-vector* which is internal to the Shen package and by default set to hold 20,000 elements.

The expression (put Mark sex male) creates a pointer sex from Mark to the object male. The expression (get Mark sex) retrieves male. If no pointer exists from the object then get returns an error.

The functions put and get index into the vector by converting their first argument A into a number via the hashing function (see below) that involves summing the code point values of constituent components of A. This hashing function h may be a many-one function, hence *property-vector* is a vector of lists and a list search is used at position (h A) in V to locate the correct value. Current performance for hash coding an object is 16,000 hashes per second under CLisp using a 1.3 Ghz Pentium for a 10-character object. This function is subject to any improvements and changes that are consistent with the language specification (see optimising the system functions in the document on porting).

In Qi, get-prop was a 3-place function (get-prop X P Y) where the third argument was returned if no pointer P existed from X. In Shen, if no pointer exists then an error is returned. Using exception handling, assuming X and P are well-defined, get-prop is easily defined in Shen.

(get-prop X P Y) = (trap-error (get X P) (/. E Y))

Unline CL, arguments to get and put can be objects of any type provided their constituents are representable within the default string set. If the hash value (h A) of the argument A exceeds the limit of the property list vector, the modulus M of (h A) to the size of the vector is taken and the data is placed in the Mth address of the vector.

put and get are actually polyadic functions which appear as such by the grace of the Shen reader. There is an optional final argument which should be a standard vector; thus (put Mark sex male (value *myvector*)) will use the vector *myvector* as the hashing table. If the optional argument is missing then *property-vector* is used.

The hash function hash takes as arguments a Shen object X and a positive number N (which need not be whole) and returns a number M between zero and N, where M represents the hash value of X within that interval.

Error Handling

The basic error function is simple-error which takes a string and returns it as an exception. A more useful version is error which, as in Qi II, shares the same syntax as output, except that object returned is an exception. Shen includes a basic type secure exception handling mechanism drawn from the Qi library called trap-error. trap-error receives two arguments, an expression E and a function F. If E evaluates to normal form E', then E' is returned. If an exception C is raised it is sent to F and (F C) is returned. The function error-to-string allows exceptions to be turned into strings and printed or examined. This function is defined only for exceptions and should return an error for any other type of object.

Note that trap-error is type secure and its type is A --> (error --> A) --> A.

Some examples

(trap-error (/ 1 0) (/. E -1)) gives -1 : number

(trap-error (/ 1 0) (/. E (error-to-string E))) gives the error message as a string "division by zero".

Numbers

The numbers in Shen (and Kl) are as follows.

1. Numbers may be positive or negative or zero.
2. Numbers are either integers or floats; there are no complex numbers.
3. +,-,/,* are 2-place and operate on floats and integers. Float contagion is used; if one argument is a float, then the result is a float. / applied to two integers A and B produces an integer if B is a divisor of A otherwise a float.
4. >=, <, <=, >, = operate over all numbers (> 3.5 3) is meaningful and true. (= 1 1.0) is true.
5. The maximum size of any integer or float and the precision of the arithmetic is implementation dependent.

The BNF is

<number> := <sign> <number>
<number> := <digit> <digits> . <digit> <digits> | <digit> <digits>
<digits> :=
e | <digit> <digits>
<digit> := 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
<sign> := + | -

This is deliberately kept simple. There is no distinction between bignums, fixnums etc. There is already a standard maths library developed by Dr Willi Riha which will load in a range of numeric functions including trigonometric functions and complex numbers. There may be packages which offer a richer range of types for numbers (fixnum and bignum etc.) with much greater scope for optimisation within the compiler and which are platform specific. However these will be plugins, not in the language standard or in the standard maths library.

Shen uses simple cancellation when reading signs; thus --3 is read as 3 and ---3 as -3. +3 is just 3. Note (- 3) returns a closure that subtracts its argument from 3.

The maximum size of any number and the precision of the arithmetic are platform dependent.

Shen 1.7 does not use e-notation which is likely to appear in Shen 2.0. As of 10th September, 2011, this has not been finalised.

For users running Shen on other platforms, it is highly likely that the platform has already defined >, +, -, *, /, <, >=, =, etc. in a way that is inconsistent with the semantics allotted to these symbols in Kl. The .kl sources use these symbols in their native dress because implementors wishing to implement Kl will want to use '*' for multiply etc. But platform providers compiling .kl sources to another language and who experience a name clash with a native function, should read carefully the notes on porting in the porting document.

Shen includes a random number generator (which works up to three figures and will be upgraded to handle numbers of any size consistent with the platform) which given a positive whole number N generates a random integer between and inclusive of 0 and N.

Floats and Integers

Division between integers in Shen will yield a float if the divisor is not a whole divisor of the numerator. In some languages, e.g. Python, (/ 3 2) gives 1 (integer division) and not 1.5. It was argued as to whether Shen should follow suit. It was felt that in such cases it was more intuitive to return an answer with a fractional component - most people would consider 3/2 = 1 as false and for people using Shen to do wages calculations etc. 'street' division is more appealing. Integer division has been placed in the standard maths library.

An interesting question concerns the comparison of floats and integers is (= 1 1.0) true or not? Mathematically the decimal notation is simply a shorthand for a sum.

i.e. 1.43 = 1 + 4/10 + 3/100

Hence 1.0 = 1 + 0/100 = 1

Therefore if = represents identity then (= 1 1.0) is true. In Shen (= 1 1.0) is true and 1.0 entered to the top level is returned as 1.

A contrary approach is taken in Prolog where '1 = 1.0' is false. In ML the comparison is meaningless (returns an error) because 1.0 and 1 belong to different types - real and int. This is wrong. Computing has fogged the issues here and committed the traditional error of confusing use and mention in its treatment of integers and floats, in other words, between numbers and the signs we use to represent them. We should not confuse the identity of two numbers with the identity of their representation. If we want to say that 1.0 is not an integer and 1 is, we commit an error, because 1.0 = 1; unless we mean by 'integer' an expression which is applied to the representation itself i.e. '1.0'. In which case the expression 'integer' is predicated of something which is a numeral, in computing terms, a string. In Shen, the integer? test is taken as predicating of numbers and 1.0 is treated as an integer.

The Timer

The get-time function is a 1 place function that returns a floating point number. The range of arguments supported by this function is implementation dependent as are the results.

For the argument real it should return a value representing real time i.e. two invocations of this function on real time should be seperated by an amount of time equal to the difference of their values. As such get-time can be used to measure the wall time elapsed between calls. The exact value of the results returned is implementation dependent as are the number of places used.

It is optional to set this function to record not real time, but run time; that is the actual CPU time. The argument run should return a value that is implementation dependent, but the difference between the two invocations of this function on run time should be seperated by an amount of time equal to the CPU time expended by Shen between the two calls. The exact value of the results returned is implementation dependent as are the number of places used.

The argument date should return a list of numbers representing the year, month, day, hour, minute and second.

Note that not all platforms may support the full range of arguments to this function and it is permitted to exclude date, but either run or real should be supported. In either case the timer function built into Shen should reflect in the information provided whether run time or real time is being measured. If an argument is not supported then an appropriate error should be raised.

There is a motion to include a compulsory argument that returns Coordinated Universal Time, as of September 2011, this has not yet been finalised.

Comments in Shen

One flaw in Qi II and Qi I was that comments were made using \ ..... \ which meant that comments could not themselves be commented out and the syntax clashed with that for characters which also used \. Shen follows the C/C++ convention of starting comments with \* and ending with *\. Comments can include comments.

Extracting Code

The dump function translates and dumps the code generated from a Shen file. This is a 2 place function that receives the name of a Shen file as a string and the desired object language as a string. It is a requirement that Kl code can be generated from this command, but no other languages are mandatory. The object code is placed in a file of the same name but with an extension corresponding to the name of the object language and the corresponding string returned.

As of September 2011, the following conventions are used for the second argument

Extension Language Must be supported?
"kl" Kl yes
"cl" Common Lisp no
"scm" Scheme no
"js" Javascript no

Special Forms

The following are all special forms in Shen and have their own hard-coded type rules; they are not type checked in curried form and do not sustain type secure currying.

@p @s @v cons lambda let type where input+ define defmacro datatype /. synonyms open

Abstract Types for Kl and Shen

A basic insight of C20 philosophy of mathematics and model theory has been the idea that abstract objects are defined by the formal theories that explain them and it is possible that a formal theory can have several models; all equally valid. Such as turned out to be the case with the Peano axioms which received interpretations with respect to ordinal numbers, cardinal numbers (and later Church and Barendregt numerals). This development formed the basis of a famous essay by the American philosopher W.V.O. Quine on ontological relativity and in computing gave rise to the idea of an abstract data type.

Several times in this document and also in the companion document on porting, we refer to basic datatypes like symbols and discuss alternative representations of them as either primitives or tagged vectors or whatever. It is important to grasp that Shen does not insist on a specific representation and what is important is that the observable properties of the type be maintained. This can be more precisely expressed by saying that certain basic logical formulae must remain invariantly true under all representations or all platforms. Such a requirement can be given teeth by stating those formulae.

The following equations represent an initial assay to explicate the abstract equations of Shen and Kl which should be maintained under porting. The metalanguage of the equations is a typed second order logic with a simple vocabulary of metasymbols. The types x : string, x : vector are self-explanatory. natnum is the type of natural numbers. ^ is the concatenation operator which is overloaded for both strings and symbols e.g. "ab"^"cd" = "abcd", ab ^ cd = abcd. The symbol bottom (^ ) indicates the error condition. Bolded expressions are object level symbols of Shen and Kl.

Lists

 

"x cons?(x) = true v cons?(x) = false

"x,y cons(x,y) ^

"x cons?(x) = true  $y,z x = cons(y,z)  

 

"x,y hd(cons (x,y)) = x

"x,y tl(cons (x,y)) = y

 

"x cons?(x)  ~string?(x) 

"x cons?(x)  ~vector?(x)

"x cons?(x)  ~number?(x) 

"x cons?(x)  ~tuple?(x) 

 

Strings

 

"x string?(x) = true v string?(x) = false

"x : string string?(x) = true

 

"x,y : string cn(x,y) = x ^ y

"x : string cn(x,””) = x = cn(“”,x)

"x : string pos(x,0) = y $y,z y ^ z = x & unitstr(y)  

"x : string "n : natnum n > 0 pos(x,n) = pos(tlstr(x), n-1)

"x y : string tlstr(x) = y $z z ^ y = x & unitstr(z)

 

"x : string unitstr(x) "y,z : string y ^ z = x (x = y & z = “”)

 

"x string?(x)  ~vector?(x)

"x string?(x)  ~number?(x) 

"x string?(x)  ~tuple?(x) 

 

Tuples

 

"x tuple?(x) = true v tuple? (x) = false

"x,y @p(x,y) ^

"x tuple?(x) = true  $y,z x = @p(y,z)  

 

"x,y fst(@p (x,y)) = x

"x,y snd(@p (x,y)) = y

 

"x tuple?(x)  ~vector?(x) 

 

Symbols

 

"x symbol?(x) $y : string intern(y) = x

"x ~(string?(x)) intern(x) = ^

"x,y : string intern(x^y) = intern(x)^intern(y)

"x : string str(intern(x)) = x

 

Vectors

 

"x vector?(x) = true v vector?(x) = false

"n : natnum vector?(vector(n)) = true

"x absvector?(x) = true v absvector?(x) = false

"n : natnum absvector?(absvector(n)) = true

 

vector(0) = <>

 

"n : natnum limit(vector(n)) = n = <-address(vector(n), 0)

"x : vector <-address(x, n) = fail() <-vector(x, n) = ^

"x : vector <-vector(x, 0) = ^

"x : vector <-address(x, n) fail() & n 0 <-vector(x, n) = <-address(x, n)

"x : vector "y "n : natnum  n 0 address-> (x, n, y) = vector-> (x, n, y)

"x : vector "n : natnum "y <-address(address->(x, n, y), n) = y

 

Exceptions

 

"x,y string?(error-to-string(^)) = true

"x x ^ (error-to-string(x)) = ^

"x,f trap-error(x,f) = x  x ^

"f trap-error(^, f) = f(^)

 

 Functions Present in Qi II; dropped from Shen

Most of these functions discarded from Shen and present in Qi II either relate specifically to Common Lisp or to the maths library. The biggest change is dropping the rule function in Qi II. This function appeared in Qi II and is undoubtedly very powerful but was dropped in Shen for these reasons.

1. Rule closures are not type secure as explained in here http://www.lambdassociates.org/Book/page328.htm.
2. Their implementation is rather messy and inherently not efficient.
3. They are mainly of interest to those studying computational logic.
4. Much of their functionality, together with efficiency and type security, can be got from using macros.

The following functions are omitted.

array?, assoc-type, character?, congruent?, complex?, debug, echo, delete-file, float?, get-array, get-prop, make-array, m-prolog, newsym, newvar, opaque, put-array, put-prop, quit, rational?, real?, rule, speed, s-prolog, transparent, read-char, read-file-as-charlist, save, strong-warning, sugar, unassoc-type, undebug, unsugar, warning

Functions Present in Shen; not in Qi II

Function

Type

Description

@s   see above
@v   see above
absvector   see above
absvector? A --> boolean recognisor for vectors
address->   see above
<-address   see above
adjoin A --> (list A) --> (list A) cons the first argument X to the second Y if X is not an element of Y
arity A --> number returns the arity of a function or -1 if the argument has no arity
average number --> number --> number return the average of two numbers
bound? symbol --> boolean returns true iff the argument is globally bound

byte->string

number --> string

returns a unit string given the ASCII code

close (stream A) --> (list B) see above
defmacro   see above
defprolog   see above
dump string --> string --> string see above
error-to-string error --> string see above
eval-without-macros   see above
floor number --> number floors a number
function (A --> B) --> (A --> B) see above
get-time symbol --> number see above
get   see above
hash A --> number --> number returns a hashing of the first argument subject to the restriction that the encoding must not be greater than the second argument
hdstr string --> string returns the first element of a non-empty string
hdv (vector A) --> A returns the first element of a standard vector
intern   see above
lambda   see above
limit (vector A) --> number see above
mod number --> number --> number the modulus of X and Y
nl number --> number see above
open   see above
package   see above
pos string --> number --> string see above
pr string --> (stream out) --> string see above
read-byte (stream in) --> number see above
read-file-as-bytelist string --> (list number) read a file as a list of bytes
read-file-as-string string --> string read a file to a string
str A --> string see above
simple-error string --> A see above
systemf

symbol --> (list symbol)

gives the symbol the status of an identifier for a system function; its definition may not then be overwritten. Returns the list of symbols with this status.
tl   returns the tail of a list; if the list is empty the result is implementation defined.
tlstr string --> string returns the tail of a non-empty string
tlv (vector A) --> (vector A) returns the tail of a standard vector using non-destructive copying
vector-> (vector A) --> number --> A --> (vector A) see above
<-vector (vector A) --> number --> A see above
vector? A --> boolean recognisor for standard vectors

Shen and Qi II: differences at a glance

Shen is almost 1/3 the size of Qi II with respect to the primitive instructions needed to implement it and Kl is nearly 20X smaller than Common Lisp with respect to the size of the instruction set. However Shen is actually richer in system functions than Qi II under Common Lisp and more powerful.

The following table summarises the differences as of August 2011.

Qi II Shen
1. Defined in 118 primitive functions. 1. Defined in 45 primitive functions.
2. Streams not part of the language. 2. Streams in standard and used for I/O to files.
3. Cannot read binary files. 3. Uses byte streams to read binary files.
4. Property lists based on CL calls. 4. Property lists defined in Shen using hashing.
5. Equality not defined over vectors. 5. Equality defined over vectors. Tuples are non-standard vectors and immutable if used in a type secure environment.
6. Characters included. 5. Characters not used.
7. rule function present. 7. rule function dropped.
8. No provision for exception handling. 8. Type secure exception handling.
9. Comments not nestable. 9. Comments nestable.
10. Maths follows CL. 10. Maths follows Shen spec.
11. Runs under CL only. 11. Runs under CL, Javascript, and Scheme (2011).
12. Pattern matching over lists and tuples. 12. Pattern matching over lists, tuples, strings and vectors.
13. No packages. 13. Packages.
14. Print macros affect non-string tokens only. 14. Print macros affect all string tokens including error messages from Shen.
15. Commercial license with book. 15. Free $ universal license. No non-standard forks. Any implementation must meet Shen standard.

Mark

copyright (c) 2011, Dr Mark Tarver
dr.mtarver@gmail.com