Sunday, September 21, 2014

Correlation Coefficient

Correlation coefficient

It's been a while since we looked at anything from the domain of statistics so here's another little bite-sized piece - a function to compute Pearson's "product moment correlation coefficient".

It's a measure of dependence between two data sets. We'll express it in terms of unbiased standard deviation which I didn't write out before so I'll include that function too.

let unbiased_standard_deviation t =
  (*http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation

    In statistics and in particular statistical theory, unbiased
    estimation of a standard deviation is the calculation from a
    statistical sample of an estimated value of the standard deviation
    (a measure of statistical dispersion) of a population of values,
    in such a way that the expected value of the calculation equals
    the true value.

  *)
 let av = arithmetic_mean t in
 let squared_diffs =
   List.fold_left (fun acc xi -> ((xi -. av) *. (xi -. av)) :: acc) [] t
 in sqrt ((sum squared_diffs)/.((float_of_int (List.length t)) -. 1.0))

let correlation_coefficient x y =
  (*http://en.wikipedia.org/wiki/Correlation_and_dependence

    The most familiar measure of dependence between two quantities is
    the Pearson product-moment correlation coefficient, or "Pearson's
    correlation coefficient", commonly called simply "the correlation
    coefficient". It is obtained by dividing the covariance of the two
    variables by the product of their standard deviations.  
  *) 

  let x_b = arithmetic_mean x in
  let y_b = arithmetic_mean y in
  let s_x = unbiased_standard_deviation x in
  let s_y = unbiased_standard_deviation y in

  if s_x = 0. || s_y = 0. then 0.
  else
    let f acc x_i y_i =
      acc +. ((x_i -. x_b) *. (y_i -. y_b)) in
    let n = float_of_int (List.length x) in
    let s = List.fold_left2 f 0.0 x y  in
    s/.((n -. 1.) *. s_x *. s_y)

Saturday, September 6, 2014

Concatenation of a list of strings

Concatenation of a list of strings

Here's another fun (but probably silly) exercise. Its value I posit, is in highlighting the fundamental similarities that exist between the C++ and OCaml languages (that emerge when one "peeks" beyond the apparent dissimilarities on the surface). Maybe this sort of comparison aids in "lowering the barrier to entry" for the C++ programmer embarking on a journey into OCaml? Anyway, here we go.

The OCaml String module contains a function concat which concatenates a list of strings whilst inserting a separator between each of the elements. Prior to OCaml 4.02 at least, it's implementation went as follows:

let concat sep l =
  match l with
    [] -> ""
  | hd :: tl ->
      let num = ref 0 and len = ref 0 in
      List.iter (fun s -> incr num; len := !len + length s) l;
      let r = create (!len + length sep * (!num - 1)) in
      unsafe_blit hd 0 r 0 (length hd);
      let pos = ref(length hd) in
      List.iter
        (fun s ->
          unsafe_blit sep 0 r !pos (length sep);
          pos := !pos + length sep;
          unsafe_blit s 0 r !pos (length s);
          pos := !pos + length s)
        tl;
      r

So, a faithful translation of this program into C++ (unsafe_blit 'n all), yields this:

#include <boost/range.hpp>

#include <string>
#include <numeric>
#include <cstring>

namespace string_util /*In honor of Stefano of http://spacifico.org/ :)*/ 
{
  template <class RgT>
  std::string concat (std::string const& sep, RgT lst)
  {
    if (boost::empty (lst)) return "";

    std::size_t num = 0, len = 0;
    std::accumulate (
      boost::begin (lst), boost::end (lst), 0,
      [&](int _, std::string const& s) -> 
      int { ++num, len += s.size(); return _; } );
    std::string r(len + sep.size () * (num - 1), '\0');
    std::string const& hd = *(boost::begin (lst));
    std::memcpy ((void*)(r.data ()), (void*)(hd.data ()), hd.size());
    std::size_t pos = hd.size();
    std::accumulate (
      boost::next (boost::begin (lst)), boost::end (lst), 0,
      [&](int _, std::string const& s) -> 
      int {
        std::memcpy((void*)(r.data()+pos),(void*)(sep.data()),sep.size());
        pos += sep.size ();
        std::memcpy ((void*)(r.data()+pos),(void*)(s.data()),s.size());
        pos += s.size ();
        return _; });
  
    return r;
  }
}//namespace<string_util>
For example, this fragment
  #include <boost/assign/list_of.hpp>

  // ...

  std::list  lst = boost::assign::list_of ("foo")("bar")("baz");
  std::string r = string_util::concat (",", lst);
will produce the string "foo,bar,baz".

So there it is... As usual, a little more verbosity required on the C++ side but otherwise, not much between them IMHO. Agree?

Monday, August 25, 2014

Terms With Variables (C++)

Terms with Variables (C++)

In this earlier post I showed a nifty OCaml type for modeling terms with variables for problems involving substitutions. I got interested in what it would take to implement the type in C++(03) (doing 'sum' types in C++ elegantly is a perennial problem). It ain't nowhere near as succinct but we got there nonetheless.
#include <list>

#include <boost/variant.hpp>

/*
  type ('a, 'b) term =
    | Term of 'a * ('a, 'b) term list
    | Var of 'b
*/

template <class A, class B> struct term;
template <class B> struct var;

template <class A, class B>
struct make_tree
{
  typedef boost::variant <
    boost::recursive_wrapper<term <A, B> >, var<B> >   type;
};

template <class A, class B>
struct term
{
  typedef typename make_tree <A, B>::type tree;
  A a;
  std::list <tree> children;
  term (A a
    , std::list<tree> const& children)
    :       a (a), children (children)
  {}
};

template <class A, class B> 
inline term <A, B> make_term (
  A a, std::list<typename make_tree<A, B>::type> const& c, B const*)
{
  return term<A, B> (a, c);
}

template <class B> 
struct var
{
  B tag;
  var (B tag) : tag (tag) {}
};

template <class B>
inline var<B> make_var (B tag) { return var<B> (tag); }
For example, this little program builds the term represented by the concrete syntax "a(b(), c)".
int main ()
{
  typedef make_tree<std::string, std::string>::type tree;
  typedef std::list<tree> term_list;
  std::string const* tag_str=(std::string const*)0L;

  // a(b(), c)
  term_list l;
  l.push_back (make_term(std::string("b"), term_list (), tag_str));
  l.push_back (make_var(std::string("c")));
  tree t = make_term(std::string("a"), l, tag_str);
  
  return 0;
}

Saturday, August 9, 2014

Digits

Digits

Counting is a common problem in computer programming. I recently had a need for an algorithm to compute the number of base 10 digits of an integer (for example, there are 3 digits in the number 569). Here's what I worked out.
module type S=sig
    (**Count the number of base 10 digits of an integer*)
    val digits : int -> int 

    (**Prepend with zeroes as necessary to get the 'right'
       number of columns *)
    val digits_10 : int -> int -> string

end

module A : S = struct 
  let digits n =
    let rec loop acc n =
      let k = n / 10 in
      if k = 0 then acc else loop (acc + 1) k
    in loop 1 n

  let digits_10 i n =
    let l = digits n in
    let s = string_of_int (abs n) in
    let maybe_sign = if n < 0 then "-" else "" in
    if l >= i then (maybe_sign^s) 
    else
      let lz=string_of_int 0 in
      let rec aux acc j = 
        if j = 0 then acc 
        else 
          aux (lz^acc) (j - 1) in
      maybe_sign ^ aux s (i - l)
end

module Test_io = struct

  let test_0 () = 
    let print_line i = 
      Printf.printf "There are %d digits in %d\n" (A.digits i) i
    in
    let rec loop i n =
      if i = n then () 
      else
        let x  =
          int_of_float (10. ** (float_of_int i)) in
        print_line x; loop (i + 1) n
    in (*Range over 0 to to 10^19 *) loop 0 19

  let test_1 () =
    let print_line i = 
      Printf.printf "%s\n" i
    in
    let rec loop i n =
      if i = n then () 
      else
        let x = A.digits_10 3 i in
        print_line x; loop (i + 1) n
    in (*Range over 0 to 100, 3 columns*)loop 0 100

  let run () = test_0 (); test_1 ()

end

let () = Test_io.run ()
The program outputs
There are 1 digits in 1
There are 2 digits in 10
There are 3 digits in 100
There are 4 digits in 1000
There are 5 digits in 10000
There are 6 digits in 100000
There are 7 digits in 1000000
There are 8 digits in 10000000
There are 9 digits in 100000000
There are 10 digits in 1000000000
There are 11 digits in 10000000000
There are 12 digits in 100000000000
There are 13 digits in 1000000000000
There are 14 digits in 10000000000000
There are 15 digits in 100000000000000
There are 16 digits in 1000000000000000
There are 17 digits in 10000000000000000
There are 18 digits in 100000000000000000
There are 19 digits in 1000000000000000000
000
001
002
003
004
005
006
 .
 .
 .
097
098
099

Fun, fun, fun!

Friday, July 25, 2014

Cartesian product (redux)

Cartesian product (redux)

In this blog post we looked at programs to compute Cartesian Products. One algorithm (given here in OCaml) if you recall is
let prod l r =
  let g acc a =
    let f acc x =
      (a, x) :: acc
    in List.fold_left f acc r
  in List.fold_left g [] l |> List.rev

In that earlier post I translated the above program into C++. This time I'm doing the same straightforward translation but using the Boost.MPL library to compute such products at compile time. It looks like this:

#include <boost/mpl/pair.hpp>
#include <boost/mpl/vector.hpp>
#include <boost/mpl/push_front.hpp>
#include <boost/mpl/fold.hpp>
#include <boost/mpl/reverse.hpp>
#include <boost/mpl/placeholders.hpp>

using namespace boost::mpl::placeholders;

template <class L, class R>
struct product
{
  struct g {
    template <class AccT, class A>
    struct apply {
       typedef typename boost::mpl::fold <
         R, AccT
        , boost::mpl::push_front<_1, boost::mpl::pair<A, _2> > 
         >::type type;
    };
  };

  typedef typename boost::mpl::reverse <
    typename boost::mpl::fold <
                  L, boost::mpl::vector<>, g>::type>::type type;
};
The translation proceeds almost mechanically! Does it work though? You betcha! Here's a test we can convince ourselves with:
#include <boost/mpl/equal.hpp>
#include <boost/mpl/for_each.hpp>
#include <boost/mpl/int.hpp>

#include <iostream>

using namespace boost::mpl;

typedef vector<int_<1>, int_<2> > A;
typedef vector<int_<3>, int_<4>, int_<5> > B;
typedef product <A, B>::type result;

BOOST_MPL_ASSERT((
  equal<
    result
  , vector<
         pair<int_<1>, int_<3> >
       , pair<int_<1>, int_<4> >
       , pair<int_<1>, int_<5> >
       , pair<int_<2>, int_<3> >
       , pair<int_<2>, int_<4> >
       , pair<int_<2>, int_<5> >
      > 
  > ));

struct print_class_name {
    template <typename T>
    void operator()( T t ) const {
       std::cout << typeid(t).name() << " ";
    }
};

int main ()
{
  boost::mpl::for_each<result>(print_class_name());

  return 0;
}
The takeaway is, learning yourself some functional programming is a great way to improve your template meta-programming fu! (That of course should come as no surprise... )

Friday, July 18, 2014

Merge sort

Merge sort

Here's another divide and conquer algorithm. This one is for sorting a sequence.

Conceptually, a merge sort works like this (see http://en.wikipedia.org/wiki/Merge_sort):

  • Divide the unsorted list into n sub-lists, each containing one element (a list of one element is trivially sorted);
  • Repeatedly merge sublists to produce new sorted sub-lists until there is only one sub-list remaining : this will be the sorted list.

In the following OCaml, we are drawing on inspiration from the the Haskell standard prelude for the part of the algorithm concerned with dividing the unsorted list : functions take, drop and split.

(**{b Merge sort}, an {i O(n log n)} comparison based sorting
   algorithm *)
module type MERGESORT = sig

  (**[take n] applied to a list [xs], returns the prefix of [xs] of
     length [n] or [xs] itself if [n > List.length xs] e.g. [take 2
     [1; 2; 3]] {i = } [[1; 2]]*)
  val take : int -> 'a list -> 'a list

  (**[drop n] applied to a list [xs], returns the suffix of [xs]
     after the first [n] elements or, [[]] if [n > List.length xs]
     e.g. [drop 2 [1; 2; 3]] {i = } [[3]]*)
  val drop : int -> 'a list -> 'a list

  (**[split n xs] is equivalent to [take n xs, drop n xs] e.g.
     [split 2 [1; 2; 3]] {i = } [([1; 2], [3])]*)
  val split : int -> 'a list -> 'a list * 'a list

  (**[merge] given two {b sorted} sequences [xs] and [ys] returns a
     single sorted sequence of the elements of [xs] and [ys]
     e.g. [merge [1; 3] [2; 4]] {i = } [[1; 2; 3; 4]]*)
  val merge : 'a list -> 'a list -> 'a list

  (**[merge_sort] works by splitting a sequence into two parts,
     recursively sorting the two parts and merging the results into
     a single sorted sequence e.g. [merge_sort [1; 2; -1; 0; 3]] 
     {i = } [[-1; 0; 1; 2; 3]]*)
  val merge_sort : 'a list -> 'a list
end

module Merge_sort : MERGESORT = struct

  let rec take k l =
    match (k, l) with
    | n, _ when n <= 0 -> []
    | _, [] -> []
    | n, (x :: xs) -> x :: take (n - 1) xs

  let rec drop k l =
    match (k, l) with
    | n, xs when n <= 0 -> xs
    | _, [] -> []
    | n, (_ :: xs) -> drop (n - 1) xs

  let rec split k l = take k l, drop k l

  let rec merge l m =
    match (l, m) with
    | [], ys -> ys
    | xs, [] -> xs
    | ((x :: xs) as t), ((y :: ys) as s) -> 
      if x <= y then x :: (merge xs s) else y :: (merge t ys)
        
  let rec merge_sort l =
    let i = (List.length l) / 2 in
    if i = 0 then l
    else
      let u, v = split i l in
      let xs, ys = merge_sort u, merge_sort v in
      merge xs ys

end

We can test our little program in the top-level like this:

# let l = Merge_sort.merge_sort [-1; 2; 0; 4; 3; 5];;
val l : int list = [-1; 0; 2; 3; 4; 5]

Here are the functions in C++ phrased as range algorithms.

//"c:/users/fletch/desktop/setenv.cmd"
//cl /EHsc /Femerge.exe /I c:/project/boost_1_55_0 merge.cpp 

#include <boost/next_prior.hpp>
#include <boost/range.hpp>
#include <boost/range/algorithm/copy.hpp>

#include <vector>
#include <cstdlib>

namespace algo
{
  //take

  template <class S, class D>
  D take (std::size_t n, S src, D dst)
  {
    typedef boost::range_iterator<S>::type it_t;
    it_t curr = boost::begin (src), last = boost::end (src);
    if (n <= 0) return dst;
    if (boost::empty (src))  return dst;
    take (n-1, S (boost::next (curr), last), *dst++ = *curr);
    return dst;
  }

  //drop

  template <class S, class D>
  D drop (std::size_t n, S src, D dst)
  {
    typedef boost::range_iterator<S>::type it_t;
    it_t curr = boost::begin (src), last = boost::end (src);
    if (n <= 0) return boost::range::copy (src, dst);
    if (boost::empty (src))return dst;
    drop (n-1, S (boost::next (curr), last), dst);
    return dst;
  }

  //split

  template <class S, class D1, class D2>
  void split (int n, S src, D1 dst1, D2 dst2)
  { take (n, src, dst1); drop (n, src, dst2); }

  //merge

  template <class S1, class S2, class D>
  D merge (S1 src1, S2 src2, D dst)
  {
    typedef boost::range_iterator<S1>::type it1_t;
    typedef boost::range_iterator<S2>::type it2_t;
    typedef std::iterator_traits<it1_t>::value_type value1_t;
    typedef std::iterator_traits<it2_t>::value_type value2_t;
    if (boost::empty (src1)) return boost::copy (src2, dst);
    if (boost::empty (src2)) return boost::copy (src1, dst);
    it1_t curr1 = boost::begin (src1), last1 = boost::end (src1);
    it2_t curr2 = boost::begin (src2), last2 = boost::end (src2);
    value1_t x = *curr1;
    value2_t y = *curr2;
    if (x <= y)
      merge (S1 (boost::next (curr1), last1), src2, *dst++ = x);
    else
      merge (src1, S2 (boost::next (curr2), last2), *dst++ = y);
    return dst;
  }

  //merge_sort

  template <class S, class D>
  D merge_sort (S src, D dst)
  {
    typedef boost::range_iterator<S>::type it_t;
    typedef std::iterator_traits<it_t>::value_type value_t;
    std::size_t i = boost::size (src)/2;
    if (i == 0) return boost::range::copy (src, dst);
    std::vector<value_t> u, v;
    split (i, src, std::back_inserter (u), std::back_inserter (v));
    std::vector<value_t> xs, ys;
    merge_sort (u, std::back_inserter (xs));
    merge_sort (v, std::back_inserter (ys));
    merge (xs, ys, dst);
    return dst;
  }
  
}//namespace algo

The following usage example provides a lightweight test.

#include <boost/assign/list_of.hpp>

#include <utility>
#include <iterator>
#include <iostream>

int main ()
{
  using std::pair;
  using std::make_pair;
  using std::ostream_iterator;
  using boost::assign::list_of;

  int ord[] = {1, 2, 3, 4};

  auto src=make_pair(ord, ord + 4);
  auto dst=ostream_iterator<int>(std::cout, ", ");

  std::cout << "\ntake ():\n";

  algo::take (0u, src, dst); std::cout << std::endl;
  algo::take (1u, src, dst); std::cout << std::endl;
  algo::take (2u, src, dst); std::cout << std::endl;
  algo::take (3u, src, dst); std::cout << std::endl;
  algo::take (4u, src, dst); std::cout << std::endl;
  algo::take (5u, src, dst); std::cout << std::endl;

  std::cout << "\ndrop ():\n";

  algo::drop (5u, src, dst); std::cout << std::endl;
  algo::drop (4u, src, dst); std::cout << std::endl;
  algo::drop (3u, src, dst); std::cout << std::endl;
  algo::drop (2u, src, dst); std::cout << std::endl;
  algo::drop (1u, src, dst); std::cout << std::endl;
  algo::drop (0u, src, dst); std::cout << std::endl;

  std::cout << "\nsplit ():\n\n";

  algo::split (0u, src, dst, dst); std::cout << std::endl;
  algo::split (1u, src, dst, dst); std::cout << std::endl;
  algo::split (2u, src, dst, dst); std::cout << std::endl;
  algo::split (3u, src, dst, dst); std::cout << std::endl;
  algo::split (4u, src, dst, dst); std::cout << std::endl;
  algo::split (5u, src, dst, dst); std::cout << std::endl;

  std::cout << "\nmerge ():\n\n";

  std::vector <int> l=list_of(-1)(2), r=list_of(0)(1);
  algo::merge (l, r, dst); std::cout << std::endl;

  std::cout << "\nmerge_sort ():\n\n";

  int unord[] = {-1, 2, 0, 4, 3, 5};
  algo::merge_sort (make_pair (unord, unord + 6), dst);

  return 0;
}
The above program produces the following output.
take ():

1,
1, 2,
1, 2, 3,
1, 2, 3, 4,
1, 2, 3, 4,

drop ():


4,
3, 4,
2, 3, 4,
1, 2, 3, 4,

split ():

1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4,

merge ():

-1, 0, 1, 2,

merge_sort ():

-1, 0, 2, 3, 4, 5,

Saturday, June 21, 2014

Permutations

Permutations

Combinations are selections that disregard order. The powerset P(S) of a set S is the set of all possible combinations of the n elements of S. There are 2n of these. Permutations are arrangements of sequences into orders. The number of permutations of a set is denoted n! For example, the set {x, y, z} has 6 permutations : {x, y, z}, {x, z, y}, {y, x, z}, {y, z, x}, {z, y, x} and {z, x, y}. Suppose f exists that for given k provides all permutations of S that start with sk. All permutations could then be found by f with k ranging over [0, N - 1]. But f trivially exists as it can be computed by prepending sk to the permutations of the smaller set S - { sk }. Here are two programs to compute the permutations of a set based on this idea.