Saturday, September 28, 2013

Terms with Variables

Cosineau and Mauny present a type suitable for modeling terms with variables convenient for applications involving variables and substitutions. It is defined like this:
(*term_types.ml*)
type (α, β) term = 
| Term of α * (α, β) term list
| Var of β

Analyzing syntax

To proceed, let us restrict our attention to the type (string, string) term. The concrete syntax we use for such expressions is parenthesized.
  • a (t1, ..., tn) is the syntax of Term (a, [t1; ...; tn])
We disambiguate variables from constants by representing constants as functions applied to an empty list of arguments
  • a (b (), c) is the syntax of Term ("a", [Term ("b", []); Var "c"])
We start towards a function to analyze syntax. First the lexer.
(*lexer.mll*)
{
  open Parser  (*The type token is defined in parser.mli.*)
}

let alpha=['a'-'z' 'A'-'Z']

rule token = parse
    [' ' '\t' '\n']         (* skip blanks *) { token lexbuf }
  | '('                                             { LPAREN }
  | ')'                                             { RPAREN }
  | ','                                              { COMMA }
  | ((alpha)(alpha)*) as s                         { STRING s}
  | eof                                                { EOI }
Now the parser.
/*parser.mly*/

%token  STRING
%token LPAREN RPAREN COMMA EOI

%start main
%type <(string, string) Term_types.term> main
%%
main:
 | exp EOI { $1 }
 ;
exp:
 | tag exp_more           
     { 
       match $2 with
       | None -> Term_types.Var $1
       | Some l -> Term_types.Term ($1, l)
     }
 ;
exp_more:
 | /*nothing*/ { None }
 | LPAREN exp_list RPAREN { Some $2 }
;
exp_list:
 | /*nothing*/ { [] }
 | exp exp_list_more { $1::$2 }
 ;
exp_list_more:
 | /*nothing*/ {[]}
 | COMMA exp exp_list_more { $2::$3 }
 ;
tag:
 | STRING { $1 }
 ;
With these pieces in place we can write the analyzer term_of_string like so
(*term.ml*)

let term_of_string s =
  let parse_buf lexbuf =
    try Parser.main Lexer.token lexbuf
    with 
    | Parsing.Parse_error ->
      begin
        let curr = lexbuf.Lexing.lex_curr_p in
        let line = curr.Lexing.pos_lnum in
        let cnum = curr.Lexing.pos_cnum - curr.Lexing.pos_bol in
        let tok = Lexing.lexeme lexbuf in
        raise 
          (Failure
             (Printf.sprintf 
                "file \"\", line %d, character %d\n\Error : Syntax error \"%s\"" line cnum tok))
      end
  in parse_buf (Lexing.from_string s)
Reverse parsing is done with string_of_term:
let rec string_of_term (t:(string, string)term) : string = 
  let string_of_list (f:α -> string) (l:α list) : string = 
    "(" ^ String.concat "," (List.map f l) ^ ")"  in
  match t with
  | Term (s, tl) -> Printf.sprintf "%s" (s^(string_of_list string_of_term tl))
  | Var s -> s
A simple function for printing a term then goes like this:
let print_term (t:(string, string) term) : unit = Printf.printf "%s" (string_of_term t)
For example, this program let tt = term_of_string "a(b(), c)" in print_term tt prints "a(b(), c)" recovering the input as we'd hope.

Traversal

The function term_trav is a workhorse for doing computations over terms. It generalizes all functions defined by recursion on the structure of a term.
(* [term_trav] is a generalization of all functions defined by
   recusion on the structure of an [(α, β) term] - a homomorphism.

   The definition of [term_trav] below gives the following signature.

   {[
   val term_trav : (α * β -> γ) ->
   (γ -> β -> β) -> β -> (δ -> γ) -> (α, δ) term -> γ
   ]}

   Proof:

   Assume [term_trav] to operate on arguments of type [(α, δ)
   term]. It "returns" either via [f] or [v], and so [f], [v] and
   [term_trav] must share the same return type, [γ] say.

   Since the intermediate list results from recursive calls to
   [term_trav] then it must have type [γ list]. This means [g] must
   be of type [γ -> β -> β] for some type [β] which fixes [x] to
   [β] whilst completing [f] as [α * β -> γ].

   Lastly [v] takes arguments of type [δ] meaning it types to [δ ->
   γ].

*)
let rec term_trav f g x v = function
  | Term (a, tl) -> 
    let l = List.map (term_trav f g x v) tl in
    let res = List.fold_right g l x in
    f (a, res)
  | Var b -> v b
For a trivial example, we can express a function to compute the "size" of a term this way:
let term_size (t:(α, β) term) : int = 
  (term_trav (fun (_, s) -> s + 1) (fun x y -> x + y) 0 (fun _ -> 1))
    t
I appreciate there's an efficiency gain to be had by eliminating the intermediate list in term_trav but the resulting function definition is more challenging to reason about (well, at least for me).