Introduction
I asked this question on the Lisp Discord recently:
I keep wanting to use generic methods and the CLOS, but I've heard bad things about its speed. Anyone have experience putting a generic method in the hot path?
I also read a posted article where someone went from classes to structs for a 4.5x
speedup. The
article seemed only to mention moving from classes to structs, not abandoning
generic methods altogether. On the channel, people seemed to think that avoiding
generic methods and moving to using etypecase
for dispatch might be a good
idea. I wanted to know for myself what was faster.
So I decided to benchmark the CLOS.
The usual disclaimers about benchmarks apply before we get to the main results. I'm "just on my own machine", I didn't have time to make the set-up perfect, the results are just rough, etc. However, I think they are still interesting results worth sharing.
Method
I benchmarked on a 2021 HP Spectre x360 laptop, with Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz - 1.50 GHz, with 8 GB RAM.
I created a repository called clos-speed.
In it, there are three files of note: src/classes.lisp
, src/structs.lisp
,
and src/structs-etypecase.lisp
.
I ran (declaim (optimize (speed 3) (safety 1) (debug 0)))
at the beginning of each test.
The files all implement an arithmetic tree of addition and subtraction
operations. I chose this approach in hopes that the time taken resolving CLOS methods was much higher than the actual method bodies. Addition and subtraction are very quick operations on machines. A randomly generated tree of 2^16 such operations are generated. 64
such trees are generated, and asked to evaluate the result of the operation
using some form of dispatch: either generic methods or using etypecase
.
To give the reader an idea, here is an excerpt from the main body of the
classes.lisp
file:
(defgeneric evaluate (thing)) (defclass op () ((a :initarg :a :accessor a) (b :initarg :b :accessor b))) (defclass minus (op) ()) (defclass plus (op) ()) (defmethod evaluate ((thing number)) thing) (defmethod evaluate ((thing plus)) (+ (evaluate (a thing)) (evaluate (b thing)))) (defmethod evaluate ((thing minus)) (- (evaluate (a thing)) (evaluate (b thing)))) (defun generate (depth) (if (zerop depth) (random 10000) (make-instance (case (random 2) (0 'minus) (1 'plus)) :a (generate (1- depth)) :b (generate (1- depth))))) (defun prepare () (loop for i from 1 to 64 collect (generate 16))) (defun speed-test (thing) (loop for x in thing collect (evaluate x)))
The idea is that I copied this implementation to structs.lisp
, but replaced
classes with structs:
(defstruct (op (:conc-name)) a b) (defstruct (minus (:include op))) (defstruct (plus (:include op)))
Then, I copied that into structs-etypecase.lisp
, but replaced the generic
function and methods with an etypecase
dispatch function:
(defun evaluate (thing) (etypecase thing (number thing) (minus (- (evaluate (a thing)) (evaluate (b thing)))) (plus (+ (evaluate (a thing)) (evaluate (b thing))))))
When testing any given implementation, I fired up the REPL, copied and pasted
the file into it, ensured the declaim
line was run, and then ran the following
and recorded the results:
(let ((thing (prepare))) (time (speed-test thing)))
To check results, I occasionally ran the same test over to test for locality/warm CPU cache effects. I observed in some cases a slight speedup, but not enough to invalidate the results I obtained.
Results
For Classes with Generic Methods case:
Real Time | Run Time | Bytes Consed | Impl | Version |
---|---|---|---|---|
0.371 | 0.364675 | 0 | SBCL | 2.4.1 |
0.427779 | 0.426224 | 1334 | CCL | 1.13 |
2.082 | 2.081 | 95984 | CLASP | 2.5.0 |
6.936 | 5.546 | 268435648 | ECL | 24.5.10 |
20.937 | 234879359 | ABCL | 1.9.2 |
For the Structs with Generic Methods case:
Real Time | Run Time | Bytes Consed | Impl | Version |
---|---|---|---|---|
0.233 | 0.230753 | 1079632 | SBCL | 2.4.1 |
0.479867 | 0.476301 | 1440 | CCL | 1.13 |
11.262 | 11.262 | ???* | CLASP | 2.5.0 |
6.28 | 6.28 | 268437408 | ECL | 24.5.10 |
20.798 | 234879359 | ABCL | 1.9.2 |
*
= I accidentally deleted this cell in the spreadsheet, sorry :(
For the Structs with etypecase
Dispatch:
Real Time | Run Time | Bytes Consed | Impl | Version |
---|---|---|---|---|
0.169 | 0.166186 | 0 | SBCL | 2.4.1 |
0.325479 | 0.324396 | 1040 | CCL | 1.13 |
4.914 | 4.914 | 1560 | CLASP | 2.5.0 |
12.685 | 11.351 | 536867024 | ECL | 24.5.10 |
24.86 | 33554252 | ABCL | 1.9.2 |
I tried to benchmark allegro, but the heap size was maxed out and it core dumped, even for workloads of half the size.
~~I tried to benchmark LispWorks, but I only have the free version, and I exceeded the free versions' maximum heap size.~~ UPDATE! apr3vau of reddit ran some benchmarks for LispWorks for us! the results are at the end of this blog post.
Discussion
Even with real-world variance and such, there are some very interesting conclusions to be drawn, at least with this dataset.
SBCL and CCL are Fast
To get this rather obvious fact out of the way, SBCL and CCL are so much faster than their neighbors as to be in their own league. Perhaps this is why some say that "CLOS is slow" is a myth. At those speeds, does this discussion really matter?
SBCL Likes Structs, But No One Else Does
SBCL ran the fastest using structs, but no other tested implementation did better with structs over classes.
CCL did better with structs+etypecase
than with classes, but this might be due
to how much faster etypecase
is, not how much faster structs are. When
comparing structs vs classes against generic methods, classes were faster.
Other implementations agreed with CCL in more dramatic ways. ECL performed better with classes, and ABCL saw a tie between classes and structs when only looking at the cases that used generic methods.
etypecase
is a Mixed Bag
Again, here we see SBCL and CCL doing great with etypecase
. SBCL was twice
as fast without generic methods as with them, and we see a marked speed increase
with CCL as well.
However! All other implementations were SLOWER using structs+etypecase
than
using classes with generic methods. With the exception of CLASP, they were
even slower than using structs with generic methods too. ECL was rougly twice as slow
with structs and etypecase
than the other two cases.
Conclusion
Based on this data, I make the following conclusion:
Just use classes and generic methods.
This is slower on SBCL and CCL, but those implementations are super fast, so
it doesn't matter as much. Also, classes didn't use up any memory during use on
SBCL, during evaluation, and structs use tons of memory on SBCL unless
etypecase
is also used. On the other implementations, classes and generic
methods represent the best option. If on CLASP, this is the clear winner, and
beats other strategies on ECL as well. ABCL likes it about as well as structs,
and much prefers the use of generic functions over etypecase
.
I was very surprised at these results. They seem to buck the trend of the conventional wisdom of the community. Perhaps this may be due to my tests being very simple, not covering all cases. However, the data is pretty good, and represents at least one set of cases. Looking at this data, classes and generic methods represent the most portable solution for speed when dealing with multiple-dispatch problems.
Just Kidding, There's More: Consing
After publishing the above, I decided to do a second benchmark, this time including the consing of the structures/classes:
(time (let ((thing (prepare))) (speed-test thing)))
Here is the data:
ABCL, sorted by fastest:
Case | Real Time | Total Run | Bytes Consed |
---|---|---|---|
Structs | 41.148 | 255850638 | |
Structs-etypecase | 47.527 | 54525531 | |
Classes | 118.869 | 448785963 |
CCL, sorted by fastest:
Case | Real Time | Total Run | Bytes Consed |
---|---|---|---|
Structs-etypecase | 2.508045 | 2.481351 | 134217760 |
Structs | 2.552674 | 2.527197 | 134218160 |
Classes | 13.831 | 13.776429 | 268437680 |
CLASP, sorted by fastest:
Case | Real Time | Total Run | Bytes Consed |
---|---|---|---|
Classes | 12.21 | 12.21 | 369274080 |
Structs | 18.251 | 18.251 | 369096240 |
Structs-etypecase | 19.546 | 19.546 | 369096240 |
ECL, sorted by fastest:
Case | Real Time | Total Run | Bytes Consed |
---|---|---|---|
Structs | 17.422 | 22.345 | 2214569696 |
Structs-etypecase | 25.557 | 39.433 | 2483001744 |
Classes | 57.494 | 62.041 | 2751436384 |
SBCL, sorted by fastest:
Case | Real Time | Total Run | Bytes Consed |
---|---|---|---|
Structs | 0.859 | 0.856056 | 135348176 |
Structs-etypecase | 1.381 | 1.365472 | 134415328 |
Classes | 2.544 | 2.522089 | 571356720 |
LispWorks Benchmarks
apr3vau was kind enough to run benchmarks for us on a Lispworks 8.0.1 macintosh, MacBook Pro 2020 with Intel CPU, Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz.
Here are the runs for non-consing speed tests (source), ordered by speed:
Case | Real Time | Total Run | Bytes Consed |
---|---|---|---|
Structs-etypecase | .140 | .142 | 7736 |
Structs | .217 | .220 | 12040 |
Classes | .261 | .265 | 8672 |
Here are the results from speed tests that included consing (source), ordered by speed:
Case | Real Time | Total Run | Bytes Consed |
---|---|---|---|
Structs-etypecase | .479 | .495 | 134396592 |
Structs | .764 | .779 | 134417040 |
Classes | 1.387 | 1.408 | 469951856 |
LispWorks seems to favor etypecase
, but structs are still faster than classes by nearly a factor of 2 when including consing. Thanks, apr3vau!
Updated Conclusion
So that's why people say "structs are faster", because they are, if you include consing them.
All in all, I'd say that the most portable thing might be to use structs as the data structure and generic methods as the dispatch scheme (when doing something performance sensitive). Most implementations seem to handle this combination well.