numericals | dense-numericals - Performance of NumPy with the convenience of Common Lisp

1. Quick Tutorial

2. Quick API

Configuration Variables:

*multithreaded-threshold*
*default-float-format*
*inline-with-multithreading*
*array-element-type* and *array-element-type-alist*
*array-layout*
*broadcast-automatically*

Simple Utilities

asarray
zeros zeros-like
ones ones-like
rand rand-like
full full-like
eye
aref
astype
array=
copy-array transpose concat reshape

Arithmetic Operations:

+ - * /
add subtract multiply divide
two-arg-matmul

Comparison Operators:

< <= = /= > >=

Transcendental Operators:

sin asin sinh asinh
cos acos cosh acosh
tan atan tanh atanh
exp expt
log

Rounding Operators:

abs ffloor floor fceiling ftruncate

Reduction Operators:

sum
vdot
max
min
maximum
minimum

3. Optimization Overview

Guidelines

Avoid allocations wherever possible: use in-place operators ending with '!' or equivalently, provide an OUT argument and BROADCAST as NIL
Declare types and (optimize speed)
Trade off between the use of simple-array and avoiding broadcasts

4. Benchmark

The above numbers indicate how many times is the corresponding lisp function is faster than the corresponding numpy or pytorch function. The numbers were obtained on an Intel i7-8750H running at 3GHz. The lisp code was compiled with (optimize speed) settings for arrays of sizes smaller than 80000. Appropriate type declarations were also used. The dense-numericals code used 6 threads for arrays of sizes larger than 80000, while PyTorch was left at default settings, with torch.get_num_threads() returning a value of 6. Thus, lisp and PyTorch had the benefit of multithreading, while numpy did not.

SBCL Version: 2.2.6

Python Version: 3.8.5

Numpy Version: 1.19.0

PyTorch Version: 1.7.1.post2

And while it is true that the lisp code was inlined and compiled wherever appropriate, that is precisely the point. Note that lisp uses incremental compilation and incremental typing, providing you the benefits of dynamicity as default. But in addition, certain compilers like SBCL also bring to you the safety and performance of static typing. The safety and type-guarantees are certainly far from languages like Haskell, but it gets the work done.

None of the code used (safety 0) declarations; thus in some cases, even higher performance can be attainable at the cost of safety and the potential of running into segmentation faults if types were declared incorrectly.

5. Goals

I think that numericals / dense-numericals as projects separate from existing projects is justified because of the existence of several unachievable-without-significant-rewrite-or-change-of-goals-or-approach for numcl or magicl. These perhaps exist because numericals and dense-numericals and its dependencies depend on CLTL2 API, while numcl and magicl try to stick to ANSI CL. On the other hand, it also seems that Common Lisp without CLTL2 API would be terribly ill-suited for numerical computing. (See here for my wishlist of features for a numerical computing library.)

Goals

Remain AOT, avoid JAOT. Like SBCL, a combination of (optimize speed) and appropriate type declarations should result in maximal inlining and minimal runtime spent on function calls.
Inlining should avoid code bloat.
Like SBCL, provide useful compiler-notes to the user to help them optimize their code.
Keep the API close to numpy.
Provide ways to avoid copying arrays.
The printed representation of the array object should be transparent in terms of its properties method.
Array broadcasting should be optional, to avoid confusion.
Cooperate with existing libraries wherever possible.

Implications of the goals

Enable high performance even for arrays as small as size 10. (Goal 1 and 2)
Need CLTL2, through cl-environments, compiler-notes, and polymorphic-functions. (Goal 1 and 3)
It is easy to start coding. (Goal 4)
It is easy to optimize. (Goal 3)
Needs a custom array class that provides multidimensional strides and offsets. This is provided through abstract-arrays, dense-arrays, and dense-numericals. (Goals 4 and 6)
Wherever appropriate, functions should have an OUT parameter. (Goal 5)
Provide inplace operators ending with '!' to avoid explicitly supplying an OUT parameter. For instance, (numericals:sin array :out array :broadcast nil) can be shortened to (numericals:sin! array). (Goal 5)
Wherever appropriate, functions should have a BROADCAST parameter that determines whether the arrays should or should not be broadcasted. Amongst the lisp libraries I know of, only numcl provides broadcasting, and even it makes it compulsory. (Goal 7)
Arrays created with numcl should "just work". (Goal 8)
Interoperability with magicl should be easy if not seamless. (Goal 8)

Comparison against numcl and magicl in terms of the (achievability of) goals

Goal	numericals	numcl	magicl	Description
1. Maximal inlining	✔	✔	?	needs compiler-macros or polymorphic-functions to dispatch on specialized arrays
2. Inlining without code bloat	✔	?	?	needs separate handling of broadcast and non-broadcasting operations
3. Compiler-notes	✔	?	?	can use compiler-macro-notes, but requires compiler-macros
4. Numpy-like API	✔	✔	＋	-
5. OUT parameter	✔	＋	✔	-
6. Transparent printed representation	✔	?	✔	needs wrapper structures/classes, for example dense-arrays or magicl:tensor
7. Optional array broadcasting	✔	＋	?	-

✔ - available
? - perhaps unachievable without significant rewrite or change of goals/approach
＋ - can be improved, or is doable without much rewrite or change of goals