Using Harlan in C++ Programs

So far, Harlan programs have primarily existed in a vacuum. You'd compile them, run them, and they might produce some output. Certainly none of them received any input from the outside world. Most of the test cases use small sets of data, and the larger benchmarks generated incredibly synthetic data, like a vector of 16 million ones. My focus has been on building the compiler itself, so this has been a tolerable situation up to this point. However, Harlan is at the point where it needs more realistic applications and it's clear the foreign function interface (FFI) story just won't cut it anymore.

I'm happy to report that it's now possible to pass non-trivial amounts of data from C++ to Harlan. Two new features made this possible. First, there are library functions like unsafe-deref-float and unsafe-set!-float which allow reading and writing from raw pointers. Second, there's a new primitive form called unsafe-vec-ptr which gives a raw pointer to the contents of a vector. These are very low level, but they give us the tools we need to build a reasonably usable FFI. Let's see how to use these to implement a dot product in Harlan and use it from a C++ program.

First, we need to write the dot product function. This is pretty short in Harlan.

(module
  (import ffi)

  (define (harlan_dot N pa pb)
    (let ((a (import-float-vec pa N))
          (b (import-float-vec pb N)))
      (reduce + (kernel ((a a) (b b)) (* a b))))))

For the most part, this is a straightforward dot product written in Harlan. The main new thing is the call to import-float-vec, which copies a C-style array into a Harlan vector. If you're curious, it's implementation is in ffi.kfc.

Unlike most Harlan programs, this does not define a main function. Instead, we compile it to a shared library by running the following from your Harlan directory.

./harlanc --shared dotprod.kfc

When this is done, you'll have a dotprod.so which you can link to your C++ programs. The harlan_dot function is exposed under the signature float harlan_dot(int N, float *pa, float *pb).

Now, let's plug this function into the dot product benchmark I wrote about previously. Basically, we add a prototype for harlan_dot, then add a call to TIME(harlan_dot) along with the rest of the benchmarks. You can see the full set of changes here. I commented out the CUBLAS version because I ran into runtime errors that I didn't feel like debugging. Below is a graph of how Harlan compares with the other implementations.

Execution time for dot product on 33,554,432 element vectors (shorter bars are better).

Yikes!

Clearly I've got some performance issues to deal with. On the bright side, Harlan runs faster on the GPU than it does on the CPU. I'll be investigating these performance problems soon.

As far as the FFI goes, there are some usability issues that remain too. For example, the Harlan compiler and the code it produces have some relative paths hard coded, which means they must run from the Harlan source directory. These shouldn't be hard to fix. In the meantime, it's now possible to integrate Harlan code into projects written in other languages.