• ## Matching Patterns with Scheme

A while back, I wrote a post about macros in Scheme. Today I want to take a look at how one might begin to implement a macro system. In Scheme, whether you use syntax-rules or syntax-case do write your macros, at some point you’ll write patterns and templates. Macros match their input against a pattern and then use this to instantiate a template. Let’s consider a two-way or macro:

• ## Access Patterns Matter, Part 2

A couple of readers pointed out some improvements and corrections to my last post on GPU access patterns. These were pretty significant, so I thought it’d be worth doing a follow up post to see how the change things.

• ## Access patterns matter

One of the oft cited difficulties of GPU programming is dealing with memory layout and access patterns. In order to achieve maximum memory bandwidth, it is important to structure your application so that different threads do not access the same bank of memory at the same time. In other words, you need to avoid bank conflicts.

• ## Modeling How Programmers Read Code (via Mike Hansen)

My last post includes a video of my eye movements as I read and interpret a piece of code. I mentioned that this was part of an experiment being conducted by Mike Hansen. He just put up a new post with more details about his work and a video of another programmer reading a similar program. Check it out!

• ## How do we read code?

I recently got to participate in a psychological experiment for programmers. A friend of mine, Mike Hansen, is doing research on how people comprehend programs. The goal is to figure out some way of measuring what features in programming systems help programmers understand what they are doing, and how this can be used to make systems that lead to higher quality software. Mike is currently running an experiment where he shows people several short Python programs and asks them to tell the output of the program. The test subject is sitting in front of an eye tracker, so afterwards Mike can see where you were looking at various times during the experiment.

• ## Optimizing Dot Product

Lately I’ve seen quite a few papers on GPU programming languages that use dot product as a benchmark, including a paper I’ve written. As I’ve thought about it some more, it seems like this may not be the most useful benchmark. The reason is that dot product does very little actual computation, but accesses a lot of data. Any decent dot product implementation should be bound by the memory bandwidth. This is true of many algorithms, but many offer opportunities to exploit caches due to data reuse. Because dot product only reads each value once, we do not have this benefit.

• ## Compiling Rust for GPUs

A couple of days back, I tweeted that I had just ran code written in Rust on the GPU. It’s about time I provided some more details. This is a project I worked on with Milinda Pathirage, a fellow student at IU. I should emphasize that this is very much in the proof of concept stage. I doubt it will work well enough to do anything useful, but it does work well enough to do something and it would certainly be possible to extend this. That said, I will include links to our code so the valiant hackers out there can try it out if they wish. For posterity’s sake, here is, to my knowledge, the first fragment of Rust code to ever execute on a GPU:

• ## A Look at Macros in Scheme

One of the features that sets Scheme apart as a programming language is its powerful macro system. In the same way that procedures allow you to reuse bits of code, macros allow you to reuse syntax. Macros and procedures can express many of the same things, but macros are particularly useful when you want to be careful about control flow and effects. Consider the following program.

• ## A look at GPU memory transfer

One of the trickier things in programming with multiple devices is managing the transfer of data between devices. This applies whether you’re programming a cluster or a machine with a CPU and GPU. Transferring data takes time and the programmer must be careful that the transfer time doesn’t overpower any performance gains from parallelizing your algorithm. When talking about transfer time, we usually think of it as having two components: the time due to latency and the time due to bandwidth. The total time to transfer the data is then,

• ## Hello, World!

I’ve decided to try entering the brave new world of Octopress. My old blog was hosted by WordPress, which is a perfectly fine blogging framework. However, I found that it seems to have a lot of features for large teams of writers that I don’t really need. More importantly, I found writing about code snippets really tedious, since I had to do the HTML myself and avoid the WYSIWYG editor. In reality, I ended up writing my posts in Markdown and then pasting the generated HTML into WordPress. This would require further tweaking to make sure everything would still look nice after the import. Since I was using Markdown anyway, it makes sense to try a blogging framework based around that.