Saturday, September 6, 2014

Concatenation of a list of strings

Concatenation of a list of strings

Here's another fun (but probably silly) exercise. Its value I posit, is in highlighting the fundamental similarities that exist between the C++ and OCaml languages (that emerge when one "peeks" beyond the apparent dissimilarities on the surface). Maybe this sort of comparison aids in "lowering the barrier to entry" for the C++ programmer embarking on a journey into OCaml? Anyway, here we go.

The OCaml String module contains a function concat which concatenates a list of strings whilst inserting a separator between each of the elements. Prior to OCaml 4.02 at least, it's implementation went as follows:

let concat sep l =
  match l with
    [] -> ""
  | hd :: tl ->
      let num = ref 0 and len = ref 0 in
      List.iter (fun s -> incr num; len := !len + length s) l;
      let r = create (!len + length sep * (!num - 1)) in
      unsafe_blit hd 0 r 0 (length hd);
      let pos = ref(length hd) in
      List.iter
        (fun s ->
          unsafe_blit sep 0 r !pos (length sep);
          pos := !pos + length sep;
          unsafe_blit s 0 r !pos (length s);
          pos := !pos + length s)
        tl;
      r

So, a faithful translation of this program into C++ (unsafe_blit 'n all), yields this:

#include <boost/range.hpp>

#include <string>
#include <numeric>
#include <cstring>

namespace string_util /*In honor of Stefano of http://spacifico.org/ :)*/ 
{
  template <class RgT>
  std::string concat (std::string const& sep, RgT lst)
  {
    if (boost::empty (lst)) return "";

    std::size_t num = 0, len = 0;
    std::accumulate (
      boost::begin (lst), boost::end (lst), 0,
      [&](int _, std::string const& s) -> 
      int { ++num, len += s.size(); return _; } );
    std::string r(len + sep.size () * (num - 1), '\0');
    std::string const& hd = *(boost::begin (lst));
    std::memcpy ((void*)(r.data ()), (void*)(hd.data ()), hd.size());
    std::size_t pos = hd.size();
    std::accumulate (
      boost::next (boost::begin (lst)), boost::end (lst), 0,
      [&](int _, std::string const& s) -> 
      int {
        std::memcpy((void*)(r.data()+pos),(void*)(sep.data()),sep.size());
        pos += sep.size ();
        std::memcpy ((void*)(r.data()+pos),(void*)(s.data()),s.size());
        pos += s.size ();
        return _; });
  
    return r;
  }
}//namespace<string_util>
For example, this fragment
  #include <boost/assign/list_of.hpp>

  // ...

  std::list  lst = boost::assign::list_of ("foo")("bar")("baz");
  std::string r = string_util::concat (",", lst);
will produce the string "foo,bar,baz".

So there it is... As usual, a little more verbosity required on the C++ side but otherwise, not much between them IMHO. Agree?