Saturday, September 28, 2013

Terms with Variables

Cosineau and Mauny present a type suitable for modeling terms with variables convenient for applications involving variables and substitutions. It is defined like this:
(*term_types.ml*)

type ('a, 'b) term = 
| Term of 'a * ('a, 'b) term list
| Var of 'b

Analyzing syntax

To proceed, let us restrict our attention to the type (string, string) term. The concrete syntax we use for such expressions is parenthesized.
  • a (t1, ..., tn) is the syntax of Term (a, [t1; ...; tn])
We disambiguate variables from constants by representing constants as functions applied to an empty list of arguments
  • a (b (), c) is the syntax of Term ("a", [Term ("b", []); Var "c"])
We start towards a function to analyze syntax. First the lexer.
(*lexer.mll*)
{
  open Parser  (*The type token is defined in parser.mli.*)
}

let alpha=['a'-'z' 'A'-'Z']

rule token = parse
    [' ' '\t' '\n']         (* skip blanks *) { token lexbuf }
  | '('                                             { LPAREN }
  | ')'                                             { RPAREN }
  | ','                                              { COMMA }
  | ((alpha)(alpha)*) as s                         { STRING s}
  | eof                                                { EOI }
Now the parser.
/*parser.mly*/

%token  STRING
%token LPAREN RPAREN COMMA EOI

%start main
%type <(string, string) Term_types.term> main
%%
main:
 | exp EOI { $1 }
 ;
exp:
 | tag exp_more           
     { 
       match $2 with
       | None -> Term_types.Var $1
       | Some l -> Term_types.Term ($1, l)
     }
 ;
exp_more:
 | /*nothing*/ { None }
 | LPAREN exp_list RPAREN { Some $2 }
;
exp_list:
 | /*nothing*/ { [] }
 | exp exp_list_more { $1::$2 }
 ;
exp_list_more:
 | /*nothing*/ {[]}
 | COMMA exp exp_list_more { $2::$3 }
 ;
tag:
 | STRING { $1 }
 ;
With these pieces in place we can write the analyzer term_of_string like so
(*term.ml*)

let term_of_string s =
  let parse_buf lexbuf =
    try Parser.main Lexer.token lexbuf
    with 
    | Parsing.Parse_error ->
      begin
        let curr = lexbuf.Lexing.lex_curr_p in
        let line = curr.Lexing.pos_lnum in
        let cnum = curr.Lexing.pos_cnum - curr.Lexing.pos_bol in
        let tok = Lexing.lexeme lexbuf in
        raise 
          (Failure
             (Printf.sprintf 
                "file \"\", line %d, character %d\n\Error : Syntax error \"%s\"" line cnum tok))
      end
  in parse_buf (Lexing.from_string s)
Reverse parsing is done with string_of_term:
let rec string_of_term (t:(string, string)term) : string = 
  let string_of_list (f:'a -> string) (l:'a list) : string = 
    "(" ^ String.concat "," (List.map f l) ^ ")"  in
  match t with
  | Term (s, tl) -> Printf.sprintf "%s" (s^(string_of_list string_of_term tl))
  | Var s -> s
A simple function for printing a term then goes like this:
let print_term (t:(string, string) term) : unit = Printf.printf "%s" (string_of_term t)
For example, this program let tt = term_of_string "a(b(), c)" in print_term tt prints "a(b(), c)" recovering the input as we'd hope.

Traversal

The function term_trav is a workhorse for doing computations over terms. It generalizes all functions defined by recursion on the structure of a term.
(* [term_trav] is a generalization of all functions defined by
   recusion on the structure of an [('a, 'b) term] - a homomorphism.

   The definition of [term_trav] below gives the following signature.

   {[
   val term_trav : ('a * 'b -> 'c) ->
   ('c -> 'b -> 'b) -> 'b -> ('d -> 'c) -> ('a, 'd) term -> 'c
   ]}

   Proof:

   Assume [term_trav] to operate on arguments of type [('a, 'd)
   term]. It "returns" either via [f] or [v], and so [f], [v] and
   [term_trav] must share the same return type, ['c] say.

   Since the intermediate list results from recursive calls to
   [term_trav] then it must have type ['c list]. This means [g] must
   be of type ['c -> 'b -> 'b] for some type ['b] which fixes [x] to
   ['b] whilst completing [f] as ['a * 'b -> 'c].

   Lastly [v] takes arguments of type ['d] meaning it types to ['d ->
   'c].

*)
let rec term_trav f g x v = function
  | Term (a, tl) -> 
    let l = List.map (term_trav f g x v) tl in
    let res = List.fold_right g l x in
    f (a, res)
  | Var b -> v b
For a trivial example, we can express a function to compute the "size" of a term this way:
let term_size (t:('a, 'b) term) : int = 
  (term_trav (fun (_, s) -> s + 1) (fun x y -> x + y) 0 (fun _ -> 1))
    t
I appreciate there's an efficiency gain to be had by eliminating the intermediate list in term_trav but the resulting function definition is more challenging to reason about (well, at least for me).