GoLisp: Same great taste, less filling

31 Jan 2015

by Dave Astels

When I first wrote GoLisp I wasn’t overly concerned with space efficiency as the immediate use was quite small scale. I was also fairly new to Go at the time. As time has gone on, we have been writing more and more production code in GoLisp. Space and time performance is something that we would now like to improve as we go forward.

Brute force

A data element in GoLisp can be one of several types:

ConsCell
Integer
Float
Boolean
String
Symbol
Function
Macro
Primitive
BoxedObject
Frame

To keep things simple and easy to work with as I was evolving the language, I went with a simple, brute force approach to storage management:

  type Data struct {
  	Type    int
  	Car     *Data
  	Cdr     *Data
  	String  string
  	Integer int64
  	Float   float32
  	Func    *Function
  	Mac     *Macro
  	Prim    *PrimitiveFunction
  	Frame   *FrameMap
  	ObjType string
  	Obj     unsafe.Pointer
  }

The type field is used by every data element, and indicates the type of the element. This in effect defines what other field (or fields) are used. For most types that’s just a single other field. Conscells and boxed objects each take two fields.

In 64-bit Go with 8 byte alignment this takes 112 bytes per data element. For any given element, most of this is unused.

Safety last

If I were using C, I could just use a union and do away with the unused space. If I were using Ruby (and I have see http://daveastels.com/rubylisp/language-ref.html) I could simply use a single field for the element’s value.

But, I’m using Go and those aren’t on the table. What Go does provide is a way to step outside the type system and use what amounts to a plain old void*.

Let’s look back at the Data structure for a second. As I mention, most types require a single field in addition to the type field. There are two exceptions: cons cells which require car and cdr pointers to a Data, and boxed objects which have a type and value.

Go provides the unsafe package for letting us avoid the constraints of the type system. Of particular interest to us is unsafe.Pointer which is basically a way of having void* in a Go friendly way.

You convert an object (called something) to an unsafe.Pointer very simply:

unsafe.Pointer(&something)

The result is a generic pointer with no attached type information. If you later want to get the data that is being pointed to, you must cast the unsafe.Pointer back to the appropriate type and dereference it:

x := *((*int)(my_unsafe_pointer))

Note that we have to put the *int in parens to make it clear that we’re casting to a pointer.

Putting GoLisp on a diet

Using unsafe pointers we can change the Data struct to:

type Data struct {
  Type uint8
  Value unsafe.Pointer
}

Noticed the Type field has been constrained to 8 bits, although we don’t realize much of a savings due to field alignment.

For the cons cell and boxed object types I added structs to group the required fields:

type ConsCell struct {
	Car *Data
	Cdr *Data
}

type BoxedObject struct {
	ObjType string
	Obj unsafe.Pointer
}

There’s one other slight wrinkle: you can’t take the address of a boolean literal. That means you can’t convert it directly to an unsafe.Pointer. To get around this I simply made a variable for each value, and used these to make the boolean constants:

var b_true = true
var b_false = false

var True *Data = &Data{Type: BooleanType, Value: unsafe.Pointer(&b_true)}
var False *Data = &Data{Type: BooleanType, Value: unsafe.Pointer(&b_false)}

With these changes, the size of the core data entity is 16 bytes. That’s 14% of the original size. Granted, cons cells (which are very common) require another 16 bytes for the Car and Cdr pointers. That’s a total of 32 bytes for a cons cell compared to the previous 112. Still a significant savings.

Busy work

So now data has to be converted to and from unsafe.Pointers as it goes in and out of Lisp data elements. Thankfully that work is all in the data.go file. For each type of Lisp data there is a set of functions that check the type, wrap, and unwrap raw values. Code outside of data.go knows nothing about how storage is managed. For example, here are the function for integers (ignoring coercion):

func IntegerP(d *Data) bool {
  return d != nil && TypeOf(d) == IntegerType
}

func IntegerWithValue(n int64) *Data {
  return &Data{Type: IntegerType, Value: unsafe.Pointer(&n)}
}

func IntegerValue(d *Data) int64 {
  if d == nil {
    return 0
  }

  if IntegerP(d) {
    return *((*int64)(d.Value))
  }

  return 0
}

Each type of Lisp data element will have a similar set of functions.

Conclusion

This project proved the value of two aspects of the GoLisp project. First, data storage encapsulated in one file. Actually there had been some leakage of that, but now all data access is done via functions in data.go. Second, a quite exhaustive suite of tests, split between Go and Lisp level testing. Getting all the tests passing after what was fairly major surgery, resulted in a fully functioning system.

Initially this will live on the smaller branch. It looks solid so far, and once it’s proved to be ready for production I’ll move it to master.

Update: This has been merged into the master branch and is now just how GoLisp works.