Sunday, October 19, 2014

Tail-recursion

Tail-recursion

Stack overflow refers to a condition in the execution of a computer program whereby the stack pointer exceeds the address space allocated for the stack. The usual result of "blowing the stack" is swift and brutal abnormal termination of the program.

The amount of memory allocated by the operating system for a given program's stack is finite and generally little can be done by the programmer to influence the amount that will be made available. The best the programmer can really do is to use what's given wisely.

We can get a sense of the limits of the stack in practical terms with a program like the following.

let rec range s e = 
  if s >= e then [] 
  else s :: range (s + 1) e

let rec loop i =
  let n = 2.0 ** (i |> float_of_int) |> int_of_float in
  try
    let _ = range 0 n in
    loop (i + 1)
  with
  | Stack_overflow -> 
    Printf.printf "Stack overflow at i = %d, n = %d\n" i n
    
let () = loop 0
range is a function that produces the half-open range $\left[s, e\right)$ - the ordered sequence $\left\{s, s + 1, s + 2, \dots, e - 2, e - 1\right\}$. Note that range is defined in terms of itself, that is, it is a recursive function. The idea is to use it in an unbounded loop to build sequences of increasing lengths of powers of $2$ : ${2^0, 2^1, 2^2, \dots}$. We set it off and when we encounter stack overflow, terminate the program gracefully reporting on the power of $2$ found to give rise to the condition. In my case I found that I was able to make $\approx 2^{19} = 524,288$ recursive calls to range before the stack limit was breached. That's very limiting. For realistic programs, one would hope to be able to produce sequences of lengths much greater than that!

What can be done? The answer lies in the definition of range and that thing called tail-recursion. Specifically, range is not tail-recursive. To be a tail-recursive function, the last thing the function needs do is to make a recursive call to itself. That's not the case for range as written as the recursive call to itself is the second to last thing it does before it returns (the last thing it does is to 'cons' a value onto the list returned by the recursive call).

Why being tail-recursive is helpful is that tail-calls can be implemented by the compiler without requiring the addition of a new "stack frame" to the stack. Instead, the current frame can be replaced in setting up the tail-call being modified as necessary and effectively the recursive call is made to be a simple jump. This is called tail-call elimination and its effect is to allow tail-recursive functions to circumvent stack overflow conditions.

Here's a new definition for range, this time implemented with tail-calls.
let range s e = 
  let rec aux acc s e = 
    if s >= e then acc
  else aux (s :: acc) (s + 1) e
  in List.rev (aux [] s e)
With this definition for range I find I can build sequences of length up to around $\approx 2^{26} = 67,108,864$ elements long without any sign of stack overflow which is a huge improvement! At around this point though, my sequence building capabilities start to be limited by the amount of physical memory present on my PC but that's a different story entirely.

Sunday, October 12, 2014

Dimensional analysis (and the units of the universal gas constant 'R')

Dimensional analysis

The problem at hand is to find by dimensional analysis, the SI units of the universal gas constant $R$ (forgive me - whilst this entry is not explicitly about computer programming - it is in fact one of my daughter's homework problems - the obvious relationship to type systems makes it seem to me at least tangentially relevant).

$R$ is defined by the Ideal Gas Law: $PV = nRT$ were $P$ is the absolute pressure of the gas, $V$ is the volume of the gas, $n$ is the amount of substance of gas (measured in moles), and $T$ is the absolute temperature of the gas.

The obvious dimensions are as follows :

$\left[P\right]$: $M \cdot L^{-1}\cdot T^{-2}$, $\left[V\right]$: $L^3$ and $\left[T\right]$: $\Theta$.

Now, one mole of a substance is defined to be $6.0221367\times 10^{23}$ atoms of that substance (Avogardro's number) but even dimensionless numbers can be part of a dimensioned system. The trick is to realize that if one quantity in an equation is "per mole" then so too must be any quantity added to it. Accordingly, if we define a (pseudo) dimension $\Lambda$ for the amount $n$ we can reason that $\left[R\right]$: $M \cdot L^{2} \cdot T^{-2} \cdot \Theta^{-1} \cdot \Lambda^{-1}$. This is enough for us to say the fundamental units for $R$ are

$kg \cdot m^{2} \cdot s^{-2} \cdot K^{-1} \cdot mol^{-1}$.

We can go a little further though. Since $work = force \times length$ we see that $M \cdot L \cdot T^{-2}$ can be expressed in units of energy and indeed $1J = kg \cdot m^{2} \cdot s^{-2}$. Thus we arrive at our final conclusion. $R$ can be written with units

$J \cdot K^{-1} \cdot mol^{-1}$.

The beautiful thing though is this. The physical interpretation of the ideal gas law is saying that, for an ideal gas, of any kind, "the energy per degree per mole" is a constant (that constant being $\approx 8.3144 \frac{^J/_K}{mol}$)!

Friday, October 3, 2014

Recursive list

Recursive list

Way, way back we looked at a function to estimate the value of the sin function utilizing its representation as a Taylor polynomial about x = 0. When one looks closely at that Taylor polynomial, one can observe a pattern in the coefficients involving the infinite sequence 0, 1, 0, -1, 0, 1, 0, -1, .... My buddy Marcelo in his formulation of that Taylor polynomial wrote this function that for a given n produces the first n values of this sequence as a list:

let rec sin_coeff n =
  match n with
  | 0 -> []
  | 1 -> [0.]
  | 2 -> [0.; 1.]
  | 3 -> [0.; 1.; 0. ]
  | _  -> 0. :: 1. :: 0. :: -1. :: sin_coeff (n-4)
He mused to me on the day that "surely there is a more elegant way to produce this list?". Oh, by golly Marcelo there certainly is!
let rec sin_coeff = 0 :: 1 :: 0 :: -1 :: sin_coeff
This little construction builds the list value sin_coeff recursively (that is, in terms of itself). Now, of course it's not impossible to define structures like this in C or C++ but good luck matching the brevity and elegance afforded to us in OCaml for this sort of thing! We still need a little function of course that can pull off this endless list a list containing just the first n elements. The ubiquitous take function will do for this purpose and I show it here just for completeness.
let rec list_take k l =
  if k <= 0 then []
  else
    match l with
    | [] -> []
    | (x :: xs)  -> x :: (list_take (k-1) xs)

Sunday, September 21, 2014

Correlation Coefficient

Correlation coefficient

It's been a while since we looked at anything from the domain of statistics so here's another little bite-sized piece - a function to compute Pearson's "product moment correlation coefficient".

It's a measure of dependence between two data sets. We'll express it in terms of unbiased standard deviation which I didn't write out before so I'll include that function too.

let unbiased_standard_deviation t =
  (*http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation

    In statistics and in particular statistical theory, unbiased
    estimation of a standard deviation is the calculation from a
    statistical sample of an estimated value of the standard deviation
    (a measure of statistical dispersion) of a population of values,
    in such a way that the expected value of the calculation equals
    the true value.

  *)
 let av = arithmetic_mean t in
 let squared_diffs =
   List.fold_left (fun acc xi -> ((xi -. av) *. (xi -. av)) :: acc) [] t
 in sqrt ((sum squared_diffs)/.((float_of_int (List.length t)) -. 1.0))

let correlation_coefficient x y =
  (*http://en.wikipedia.org/wiki/Correlation_and_dependence

    The most familiar measure of dependence between two quantities is
    the Pearson product-moment correlation coefficient, or "Pearson's
    correlation coefficient", commonly called simply "the correlation
    coefficient". It is obtained by dividing the covariance of the two
    variables by the product of their standard deviations.  
  *) 

  let x_b = arithmetic_mean x in
  let y_b = arithmetic_mean y in
  let s_x = unbiased_standard_deviation x in
  let s_y = unbiased_standard_deviation y in

  if s_x = 0. || s_y = 0. then 0.
  else
    let f acc x_i y_i =
      acc +. ((x_i -. x_b) *. (y_i -. y_b)) in
    let n = float_of_int (List.length x) in
    let s = List.fold_left2 f 0.0 x y  in
    s/.((n -. 1.) *. s_x *. s_y)

Saturday, September 6, 2014

Concatenation of a list of strings

Concatenation of a list of strings

Here's another fun (but probably silly) exercise. Its value I posit, is in highlighting the fundamental similarities that exist between the C++ and OCaml languages (that emerge when one "peeks" beyond the apparent dissimilarities on the surface). Maybe this sort of comparison aids in "lowering the barrier to entry" for the C++ programmer embarking on a journey into OCaml? Anyway, here we go.

The OCaml String module contains a function concat which concatenates a list of strings whilst inserting a separator between each of the elements. Prior to OCaml 4.02 at least, it's implementation went as follows:

let concat sep l =
  match l with
    [] -> ""
  | hd :: tl ->
      let num = ref 0 and len = ref 0 in
      List.iter (fun s -> incr num; len := !len + length s) l;
      let r = create (!len + length sep * (!num - 1)) in
      unsafe_blit hd 0 r 0 (length hd);
      let pos = ref(length hd) in
      List.iter
        (fun s ->
          unsafe_blit sep 0 r !pos (length sep);
          pos := !pos + length sep;
          unsafe_blit s 0 r !pos (length s);
          pos := !pos + length s)
        tl;
      r

So, a faithful translation of this program into C++ (unsafe_blit 'n all), yields this:

#include <boost/range.hpp>

#include <string>
#include <numeric>
#include <cstring>

namespace string_util /*In honor of Stefano of http://spacifico.org/ :)*/ 
{
  template <class RgT>
  std::string concat (std::string const& sep, RgT lst)
  {
    if (boost::empty (lst)) return "";

    std::size_t num = 0, len = 0;
    std::accumulate (
      boost::begin (lst), boost::end (lst), 0,
      [&](int _, std::string const& s) -> 
      int { ++num, len += s.size(); return _; } );
    std::string r(len + sep.size () * (num - 1), '\0');
    std::string const& hd = *(boost::begin (lst));
    std::memcpy ((void*)(r.data ()), (void*)(hd.data ()), hd.size());
    std::size_t pos = hd.size();
    std::accumulate (
      boost::next (boost::begin (lst)), boost::end (lst), 0,
      [&](int _, std::string const& s) -> 
      int {
        std::memcpy((void*)(r.data()+pos),(void*)(sep.data()),sep.size());
        pos += sep.size ();
        std::memcpy ((void*)(r.data()+pos),(void*)(s.data()),s.size());
        pos += s.size ();
        return _; });
  
    return r;
  }
}//namespace<string_util>
For example, this fragment
  #include <boost/assign/list_of.hpp>

  // ...

  std::list  lst = boost::assign::list_of ("foo")("bar")("baz");
  std::string r = string_util::concat (",", lst);
will produce the string "foo,bar,baz".

So there it is... As usual, a little more verbosity required on the C++ side but otherwise, not much between them IMHO. Agree?

Monday, August 25, 2014

Terms With Variables (C++)

Terms with Variables (C++)

In this earlier post I showed a nifty OCaml type for modeling terms with variables for problems involving substitutions. I got interested in what it would take to implement the type in C++(03) (doing 'sum' types in C++ elegantly is a perennial problem). It ain't nowhere near as succinct but we got there nonetheless.
#include <list>

#include <boost/variant.hpp>

/*
  type ('a, 'b) term =
    | Term of 'a * ('a, 'b) term list
    | Var of 'b
*/

template <class A, class B> struct term;
template <class B> struct var;

template <class A, class B>
struct make_tree
{
  typedef boost::variant <
    boost::recursive_wrapper<term <A, B> >, var<B> >   type;
};

template <class A, class B>
struct term
{
  typedef typename make_tree <A, B>::type tree;
  A a;
  std::list <tree> children;
  term (A a
    , std::list<tree> const& children)
    :       a (a), children (children)
  {}
};

template <class A, class B> 
inline term <A, B> make_term (
  A a, std::list<typename make_tree<A, B>::type> const& c, B const*)
{
  return term<A, B> (a, c);
}

template <class B> 
struct var
{
  B tag;
  var (B tag) : tag (tag) {}
};

template <class B>
inline var<B> make_var (B tag) { return var<B> (tag); }
For example, this little program builds the term represented by the concrete syntax "a(b(), c)".
int main ()
{
  typedef make_tree<std::string, std::string>::type tree;
  typedef std::list<tree> term_list;
  std::string const* tag_str=(std::string const*)0L;

  // a(b(), c)
  term_list l;
  l.push_back (make_term(std::string("b"), term_list (), tag_str));
  l.push_back (make_var(std::string("c")));
  tree t = make_term(std::string("a"), l, tag_str);
  
  return 0;
}

Saturday, August 9, 2014

Digits

Digits

Counting is a common problem in computer programming. I recently had a need for an algorithm to compute the number of base 10 digits of an integer (for example, there are 3 digits in the number 569). Here's what I worked out.
module type S=sig
    (**Count the number of base 10 digits of an integer*)
    val digits : int -> int 

    (**Prepend with zeroes as necessary to get the 'right'
       number of columns *)
    val digits_10 : int -> int -> string

end

module A : S = struct 
  let digits n =
    let rec loop acc n =
      let k = n / 10 in
      if k = 0 then acc else loop (acc + 1) k
    in loop 1 n

  let digits_10 i n =
    let l = digits n in
    let s = string_of_int (abs n) in
    let maybe_sign = if n < 0 then "-" else "" in
    if l >= i then (maybe_sign^s) 
    else
      let lz=string_of_int 0 in
      let rec aux acc j = 
        if j = 0 then acc 
        else 
          aux (lz^acc) (j - 1) in
      maybe_sign ^ aux s (i - l)
end

module Test_io = struct

  let test_0 () = 
    let print_line i = 
      Printf.printf "There are %d digits in %d\n" (A.digits i) i
    in
    let rec loop i n =
      if i = n then () 
      else
        let x  =
          int_of_float (10. ** (float_of_int i)) in
        print_line x; loop (i + 1) n
    in (*Range over 0 to to 10^19 *) loop 0 19

  let test_1 () =
    let print_line i = 
      Printf.printf "%s\n" i
    in
    let rec loop i n =
      if i = n then () 
      else
        let x = A.digits_10 3 i in
        print_line x; loop (i + 1) n
    in (*Range over 0 to 100, 3 columns*)loop 0 100

  let run () = test_0 (); test_1 ()

end

let () = Test_io.run ()
The program outputs
There are 1 digits in 1
There are 2 digits in 10
There are 3 digits in 100
There are 4 digits in 1000
There are 5 digits in 10000
There are 6 digits in 100000
There are 7 digits in 1000000
There are 8 digits in 10000000
There are 9 digits in 100000000
There are 10 digits in 1000000000
There are 11 digits in 10000000000
There are 12 digits in 100000000000
There are 13 digits in 1000000000000
There are 14 digits in 10000000000000
There are 15 digits in 100000000000000
There are 16 digits in 1000000000000000
There are 17 digits in 10000000000000000
There are 18 digits in 100000000000000000
There are 19 digits in 1000000000000000000
000
001
002
003
004
005
006
 .
 .
 .
097
098
099

Fun, fun, fun!