Dan's Musings

Benchmarking the CLOS

Introduction

I asked this question on the Lisp Discord recently:

I keep wanting to use generic methods and the CLOS, but I've heard bad things about its speed. Anyone have experience putting a generic method in the hot path?

I also read a posted article where someone went from classes to structs for a 4.5x speedup. The article seemed only to mention moving from classes to structs, not abandoning generic methods altogether. On the channel, people seemed to think that avoiding generic methods and moving to using etypecase for dispatch might be a good idea. I wanted to know for myself what was faster.

So I decided to benchmark the CLOS.

The usual disclaimers about benchmarks apply before we get to the main results. I'm "just on my own machine", I didn't have time to make the set-up perfect, the results are just rough, etc. However, I think they are still interesting results worth sharing.

Method

I benchmarked on a 2021 HP Spectre x360 laptop, with Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz - 1.50 GHz, with 8 GB RAM.

I created a repository called clos-speed.

In it, there are three files of note: src/classes.lisp, src/structs.lisp, and src/structs-etypecase.lisp.

I ran (declaim (optimize (speed 3) (safety 1) (debug 0))) at the beginning of each test.

The files all implement an arithmetic tree of addition and subtraction operations. I chose this approach in hopes that the time taken resolving CLOS methods was much higher than the actual method bodies. Addition and subtraction are very quick operations on machines. A randomly generated tree of 2^16 such operations are generated. 64 such trees are generated, and asked to evaluate the result of the operation using some form of dispatch: either generic methods or using etypecase.

To give the reader an idea, here is an excerpt from the main body of the classes.lisp file:

(defgeneric evaluate (thing))

(defclass op ()
  ((a
     :initarg :a
     :accessor a)
   (b
     :initarg :b
     :accessor b)))

(defclass minus (op)
  ())

(defclass plus (op)
  ())

(defmethod evaluate ((thing number))
  thing)

(defmethod evaluate ((thing plus))
  (+ (evaluate (a thing)) (evaluate (b thing))))

(defmethod evaluate ((thing minus))
  (- (evaluate (a thing)) (evaluate (b thing))))

(defun generate (depth)
  (if (zerop depth)
      (random 10000)
      (make-instance
        (case
            (random 2)
          (0
           'minus)
          (1
           'plus))
        :a
        (generate (1- depth))
        :b
        (generate (1- depth)))))

(defun prepare ()
  (loop for i from 1 to 64
        collect (generate 16)))

(defun speed-test (thing)
  (loop for x in thing
        collect (evaluate x)))

The idea is that I copied this implementation to structs.lisp, but replaced classes with structs:

(defstruct (op (:conc-name))
            a b)

(defstruct (minus (:include op)))

(defstruct (plus (:include op)))

Then, I copied that into structs-etypecase.lisp, but replaced the generic function and methods with an etypecase dispatch function:

(defun evaluate (thing)
  (etypecase thing
    (number thing)
    (minus (- (evaluate (a thing))
              (evaluate (b thing))))
    (plus (+ (evaluate (a thing))
             (evaluate (b thing))))))

When testing any given implementation, I fired up the REPL, copied and pasted the file into it, ensured the declaim line was run, and then ran the following and recorded the results:

(let ((thing (prepare)))
  (time (speed-test thing)))

To check results, I occasionally ran the same test over to test for locality/warm CPU cache effects. I observed in some cases a slight speedup, but not enough to invalidate the results I obtained.

Results

For Classes with Generic Methods case:

Real Time Run Time Bytes Consed Impl Version
0.371 0.364675 0 SBCL 2.4.1
0.427779 0.426224 1334 CCL 1.13
2.082 2.081 95984 CLASP 2.5.0
6.936 5.546 268435648 ECL 24.5.10
20.937 234879359 ABCL 1.9.2

For the Structs with Generic Methods case:

Real Time Run Time Bytes Consed Impl Version
0.233 0.230753 1079632 SBCL 2.4.1
0.479867 0.476301 1440 CCL 1.13
11.262 11.262 ???* CLASP 2.5.0
6.28 6.28 268437408 ECL 24.5.10
20.798 234879359 ABCL 1.9.2

* = I accidentally deleted this cell in the spreadsheet, sorry :(

For the Structs with etypecase Dispatch:

Real Time Run Time Bytes Consed Impl Version
0.169 0.166186 0 SBCL 2.4.1
0.325479 0.324396 1040 CCL 1.13
4.914 4.914 1560 CLASP 2.5.0
12.685 11.351 536867024 ECL 24.5.10
24.86 33554252 ABCL 1.9.2

I tried to benchmark allegro, but the heap size was maxed out and it core dumped, even for workloads of half the size.

~~I tried to benchmark LispWorks, but I only have the free version, and I exceeded the free versions' maximum heap size.~~ UPDATE! apr3vau of reddit ran some benchmarks for LispWorks for us! the results are at the end of this blog post.

Discussion

Even with real-world variance and such, there are some very interesting conclusions to be drawn, at least with this dataset.

SBCL and CCL are Fast

To get this rather obvious fact out of the way, SBCL and CCL are so much faster than their neighbors as to be in their own league. Perhaps this is why some say that "CLOS is slow" is a myth. At those speeds, does this discussion really matter?

SBCL Likes Structs, But No One Else Does

SBCL ran the fastest using structs, but no other tested implementation did better with structs over classes.

CCL did better with structs+etypecase than with classes, but this might be due to how much faster etypecase is, not how much faster structs are. When comparing structs vs classes against generic methods, classes were faster.

Other implementations agreed with CCL in more dramatic ways. ECL performed better with classes, and ABCL saw a tie between classes and structs when only looking at the cases that used generic methods.

etypecase is a Mixed Bag

Again, here we see SBCL and CCL doing great with etypecase. SBCL was twice as fast without generic methods as with them, and we see a marked speed increase with CCL as well.

However! All other implementations were SLOWER using structs+etypecase than using classes with generic methods. With the exception of CLASP, they were even slower than using structs with generic methods too. ECL was rougly twice as slow with structs and etypecase than the other two cases.

Conclusion

Based on this data, I make the following conclusion:

Just use classes and generic methods.

This is slower on SBCL and CCL, but those implementations are super fast, so it doesn't matter as much. Also, classes didn't use up any memory during use on SBCL, during evaluation, and structs use tons of memory on SBCL unless etypecase is also used. On the other implementations, classes and generic methods represent the best option. If on CLASP, this is the clear winner, and beats other strategies on ECL as well. ABCL likes it about as well as structs, and much prefers the use of generic functions over etypecase.

I was very surprised at these results. They seem to buck the trend of the conventional wisdom of the community. Perhaps this may be due to my tests being very simple, not covering all cases. However, the data is pretty good, and represents at least one set of cases. Looking at this data, classes and generic methods represent the most portable solution for speed when dealing with multiple-dispatch problems.

Just Kidding, There's More: Consing

After publishing the above, I decided to do a second benchmark, this time including the consing of the structures/classes:

(time (let ((thing (prepare)))
  (speed-test thing)))

Here is the data:

ABCL, sorted by fastest:

Case Real Time Total Run Bytes Consed
Structs 41.148 255850638
Structs-etypecase 47.527 54525531
Classes 118.869 448785963

CCL, sorted by fastest:

Case Real Time Total Run Bytes Consed
Structs-etypecase 2.508045 2.481351 134217760
Structs 2.552674 2.527197 134218160
Classes 13.831 13.776429 268437680

CLASP, sorted by fastest:

Case Real Time Total Run Bytes Consed
Classes 12.21 12.21 369274080
Structs 18.251 18.251 369096240
Structs-etypecase 19.546 19.546 369096240

ECL, sorted by fastest:

Case Real Time Total Run Bytes Consed
Structs 17.422 22.345 2214569696
Structs-etypecase 25.557 39.433 2483001744
Classes 57.494 62.041 2751436384

SBCL, sorted by fastest:

Case Real Time Total Run Bytes Consed
Structs 0.859 0.856056 135348176
Structs-etypecase 1.381 1.365472 134415328
Classes 2.544 2.522089 571356720

LispWorks Benchmarks

apr3vau was kind enough to run benchmarks for us on a Lispworks 8.0.1 macintosh, MacBook Pro 2020 with Intel CPU, Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz.

Here are the runs for non-consing speed tests (source), ordered by speed:

Case Real Time Total Run Bytes Consed
Structs-etypecase .140 .142 7736
Structs .217 .220 12040
Classes .261 .265 8672

Here are the results from speed tests that included consing (source), ordered by speed:

Case Real Time Total Run Bytes Consed
Structs-etypecase .479 .495 134396592
Structs .764 .779 134417040
Classes 1.387 1.408 469951856

LispWorks seems to favor etypecase, but structs are still faster than classes by nearly a factor of 2 when including consing. Thanks, apr3vau!

Updated Conclusion

So that's why people say "structs are faster", because they are, if you include consing them.

All in all, I'd say that the most portable thing might be to use structs as the data structure and generic methods as the dispatch scheme (when doing something performance sensitive). Most implementations seem to handle this combination well.