Prime Number Searches — Functional and Imperative revisited

tl;dr In a previous blog post, I compared an imperative implementation of the Sieve of Eratosthenes prime number search algorithm with trial by division and a naïve functional version. Based on feedback, comments and further reading of Melissa O’Neill’s paper [1], I revisit the functional Sieve of Erathosthenes with an improved version using lists of prime factors instead of a single sieve, which outperforms trial by division. I then further improve the functional sieve by using a priority queue and lazy evaluation. The performance of the lazy sieve is a couple of orders olf magnitude slower than the imperative sieve, but does not appear to get significantly worse than that with increasing sieve sizes. Furthermore, it is far more scalable due to lower memory requirements.

Recap

Imperative Sieve of Eratosthenes

My initial imperative implementation of the Sieve of Eratosthenes was straightforward but I missed a further optimisation mentioned in the Rosetta Code guidelines [2]: if we’re finding primes up to n then we can discontinue “crossing off” prime multiples from the sieve when we reach the square root of n.

(defun primes (n)
  (let ((sieve (make-array (1+ n) :element-type '(unsigned-byte 8)
                                  :initial-element 0)))

    (loop for i from 2 upto (isqrt n) when (zerop (aref sieve i)) do
       (loop for j from (* i i) upto n by i do
          (setf (aref sieve j) 1)))

    (loop for i from 2 upto n when (zerop (aref sieve i)) collect i)))

Trial by Division

I’ve also made some tweaks to my original recursive trial by division version to improve readability and to reduce consing and garbage collection.

(defun primes (n)
   (labels ((find-primes (primes candidates)
     (if (null candidates)
        primes
        (let ((i (first candidates)))
          (find-primes (cons i primes)
                       (delete-if #'(lambda (x) (zerop (rem x i))) 
                                 (rest candidates)))))))

      (nreverse 
        (find-primes nil (loop for i from 2 upto n collect i)))))

Performance

The times to compute primes up to a search limit n of the imperative sieve and trial by division functions above, and the naïve functional sieve from my previous blog post, are plotted below (SBCL 1.0.58, Microsoft Windows 7 64-bit, 2.8 GHz Intel Core i5 760).

Performance: imperative sieve, trial by division and naive functional sieve
Trial by division is an order of magnitude slower than the imperative sieve for n=100 and its relative performance steadily grows worse with increasing n. The naïve functional sieve shows the same trend as trial by division but starts off two orders of magnitude slower than the imperative sieve at n=100. For calculating primes to 1,000,000, the imperative sieve takes around 20 milliseconds while trial by division takes over a minute. I got bored waiting for the naïve functional sieve.

Functional Sieve of Eratosthenes revisited

A list-based sieve

My first attempt at a functional Eratosthenes sieve stuck too rigidly to the idea of the imperative version by representing the sieve as a single “cross-off” list with an entry for every number. In my previous blog post, I defined a k-sieve for prime number k as the multiples of k from k2 up to our prime search limit n. (These are the numbers we “cross out” from the sieve for k.) Each time a prime was discovered, its k-sieve was calculated and then combined into the cross-off list which, in accordance with the functional programming tenet of no mutable data, was recreated each time. Further, although primes are found increasingly less frequently as the search progresses, searching for the next prime involved traversing the cross-off list to find the first zero value, which became progressively further from the head of the list.

To avoid repeatedly merging the k-sieves for each prime into a single sieve, we can store each k-sieve separately in a list of sieves. To avoid searching through numbers we’ve already eliminated, we can remove numbers from the list as they are crossed off. I argue that this is still the Sieve of Eratosthenes algorithm because we’re still “checking off” multiples of primes, even though the sieve itself is not structured as a linear list or array and is progressively discarded. The algorithm is as follows:

  1. Start with a list of candidate numbers from 2 to n, an empty list of primes and an empty list of k-sieves.
  2. Search for the next candidate i in all the k-sieves. (If the sieves are generated in order, we only have to test i against the first element of each k-sieve.)
  3. If we don’t find a match, i must be prime. Add it to the list of primes, and if i is less than the square root of n, add its k-sieve to the list of sieves.
  4. If we do find a match, then i cannot be prime. Remove it from the all of the k-sieves in which it occurs. If any k-sieves become empty as a result, remove them from the sieve list.
  5. Iterate until no candidates remain.

Here’s an example to illustrate this. We start with an empty list of sieves, an empty list of primes and a list of candidates from 2 to n.

Sieves     -> ()
Primes     -> ()
Candidates -> ( 2 3 4 5 6 ... n )

Our first candidate number is 2. There are no sieves, so we declare it to be prime, add it to the list of primes, and add the 2-sieve to the list of sieves.

Sieves     -> ( ( 4  6  8 10 12 14 ... ) )
Primes     -> ( 2 )
Candidates -> ( 3 4 5 6 7 ... n )

We now look at the next candidate, 3. It doesn’t occur in any of the k-sieves, so we again declare it to be prime and add the 3-sieve to the list of sieves.

Sieves     -> ( ( 9 12 15 18 21 24 ...)
                ( 4  6  8 10 12 14 ...) )
Primes     -> ( 2 3 )
Candidates -> ( 4 5 6 7 8 ... n )

The next candidate is 4. It occurs in the 2-sieve, so it can’t be prime. We delete it from any k-sieves in which it occurs, and continue to the next candidate.

Sieves     -> ( ( 9 12 15 18 21 24 ... )
                ( 6  8 10 12 14 16 ... ) )
Primes     -> ( 2 3 )
Candidates -> ( 5 6 7 8 9 ... n)

Similarly to the imperative sieve above, when a prime greater than the square root of n is found, we do not need to generate its k-sieve since its first element will be greater than n.

Knocking up a simple implementation is straightforward and takes less than 60 lines of code, including a few comments.

(defun make-sieve (k n)
  "Create a sieve for prime k consisting of the
   square of k, then each multiple of k up to n."
  (loop for i from (* k k) upto n by k collect i))

(defun in-sieve-p (i sieve)
  "Test whether i is in the sieve. (Tests whether i is equal 
   to the first element of sieve.)
   Returns the result of the test and either a copy of the
   sieve with i removed or the original sieve."
  (let ((in-sieve (equal i (first sieve))))
    (values in-sieve
            (if in-sieve (rest sieve) sieve))))

(defun prime-p (i sieve-list)
  "Test whether i is in any of the sieves in sieve-list.
   Returns the result and updated sieves."
  (labels ((test-sieve (is-prime out-sieves in-sieves)
             (if (null in-sieves)
                 (values is-prime out-sieves)
                 (multiple-value-bind (not-prime new-sieve)
                                      (in-sieve-p i (first in-sieves))
                   (test-sieve (if not-prime nil is-prime)
                               (cons new-sieve out-sieves)
                               (rest in-sieves))))))

     (test-sieve t nil sieve-list)))

(defun primes (n)
  (let ((sqrt-n (isqrt n)))

    (labels ((test-prime (primes sieves candidates)
               (if (null candidates)
                   primes
                   (let ((i (first candidates)))
                     (multiple-value-bind (is-prime new-sieve-list)
                                          (prime-p i sieves)
                       (test-prime (if is-prime (cons i primes) primes)
                                   (if (and is-prime (< i sqrt-n))
                                       (cons (make-sieve i n) new-sieve-list)
                                       new-sieve-list)
                                   (rest candidates)))))))

      (nreverse
        (test-prime nil nil (loop for i from 2 upto n collect i))))))

What about its performance?
Untitled-2

This is a major win. It not only performs an order of magnitude better than than the naïve functional sieve for n=100, it also outperforms trial by division despite a more complex implementation. Not only that, the gradient is flatter; it remains within two orders of magnitude of the imperative sieve performance until around n=1,000,000. Can we do better still?

Sorting the list: a Priority Queue

Further consideration of our list-based approach reveals a few possible optimisations. New k-sieves are added onto the front of the sieve list, and the prime number search function prime-p examines all the sieves. However, If we can keep the list of k-sieves ordered by their first element, we can stop searching as soon as we encounter a sieve with a first element greater than i. To keep the list sorted, we need to consider whether or not i prime.

  • If i is prime, we add the i-sieve to the list. Its first element will be greater than that of any other sieve in the list, so it will be added to the end of the list. For a purely functional (immutable) list structure, this is an expensive operation since it means that we will have to copy all of the cons cells in the original list. Fortunately, primes occur progressively less frequently as i increases.
  • If i is not prime, we delete the corresponding first elements from the matching k-sieves. This means we will have to sort the list, but the list will be in an almost sorted state. Sorting can be accomplished by removing the k-sieves that have i as their first element from the sieve list, deleting their first elements, then re-inserting them into the sieve list at the appropriate locations.

This in fact will effectively implement a simple priority queue [3], with the sieve head elements determining the priority. Melissa O’Neill also examines this approach in her paper.

I implemented a simple purely functional (i.e. immutable) priority queue package with the following functions: pq-insert and pq-remove, which removes and returns all elements of a given priority as well as the updated queue. The code is as follows:

(defun pq-insert (element queue)
  "Insert element into the priority queue."
   (if (null queue)
       (list element)
       (let ((priority (first element))
	     (next-element (first queue)))
	 (if (>= (first next-element) priority)
	     (cons element queue)
	     (cons next-element (pq-insert element (rest queue)))))))

(defun pq-remove (priority queue)
  "Remove elements with a given priority from the queue
   and return them along with the updated queue."
  (labels ((delete-elements (deleted q)
             (if (null q)
                 (values deleted nil)
                 (let* ((next-element (first q))
                        (cmp (- priority (first next-element))))
                   (cond ((minusp cmp)
                          (values deleted q))
                         ((zerop cmp)
                          (delete-elements 
                             (cons next-element deleted)
                             (rest q)))
                         (t
                          (multiple-value-bind (del-lst in-lst)
                                               (delete-elements deleted (rest q))
                            (values del-lst (cons next-element in-lst)))))))))

  (delete-elements nil queue)))

(I implement a combined “delete-and-return matching sieves” function because if search and delete are separate functions, in the majority of cases, where i is not prime, we will need to search the list twice — first for matching sieves and then to delete them. However, there is a penalty when a prime is encountered because the entire list will be copied. In practice, there doesn’t seem to be a huge performance difference between the two approaches.)

The Eratosthenes sieve is implemented based on this data structure as follows:

(defun build-sieve (k n) 
  (loop for i from (* k k) upto n by k collect i))

(defun prime-p (i sieves)
  (multiple-value-bind (matches new-sieves) 
                       (pq-remove i sieves)
    (if (null matches)
        (values t sieves)
        (values nil (reduce #'(lambda (q s) (pq-insert s q))
                            (remove-if 'null
                                       (mapcar 'rest matches))
                            :initial-value new-sieves)))))

(defun primes (n) 
  (let ((sqrt-n (isqrt n)))

    (labels ((test-prime (primes sieves candidates)
                (if (null candidates)
                primes
                (let ((i (first candidates)))
                   (multiple-value-bind (is-prime new-sieves)
                                        (prime-p i sieves)
                      (test-prime (if is-prime (cons i primes) primes)
                                  (if (and is-prime (< i sqrt-n))
                                      (pq-insert (make-sieve i n) new-sieves)
                                      new-sieves)
                                  (rest candidates)))))))

      (nreverse
        (test-prime nil nil (loop for i from 2 upto n collect i))))))

The question is whether the overhead of maintaining the priority queue is worth the effort. Let’s see:

Surprisingly, the priority queue overhead is sufficiently high that our brain-dead list-based implementation beats it for n less than around 100,000, and also performs worse than trial by division for n less than around 2,000. To understand why, let’s find out what the maximum length the sieve list will reach for n=100,000 — it should be equal to the number of primes found by sqrt(100,000):

CL-USER> (isqrt 100000)
316
CL-USER> (length (functional-list-primes 316))
66

So the maximum number of k-sieves in the list in this case will be 66, which will be attained at i=316, after which it will continue to diminish as we approach i=100,000.

This explains why simply using an unsorted list and testing all its elements serves us well for so long; the maximum number of sieves in the list only grows slowly with increasing n. The greater overhead of sorting (be it a priority queue, a set, a binary tree or whatever abstraction) therefore doesn’t pay off until n is fairly large. Even after the crossover point the performance gap only grows gradually because primes become increasingly scarce so the sieve list growth becomes slower and slower with increasing n. Moreover, by the time we start getting performance wins from sorting, our approach starts running into another scalability issue: memory consumption.

No sets please, we’re British

What about a more sophisticated data structure? Finding whether a candidate number i is prime can be viewed as testing whether i is a member of a set that is the union of all k-sieves. (In fact, we only need to test whether it is in the set of lowest values in all k-sieves if we remove elements as they are “crossed off”.) Sets (and general priority queues, for that matter) are often implemented based on a binary search tree [4] or some variant thereof, so would this not outperform a simple list?

To test this, I implemented a pure functional binary search tree and tried again. While a BST may be suitable for general applications which require sorting and rapid seaching, it may not be so useful for an Erathosthenes sieve. Because we generate the sieve incrementally, the tree becomes highly unbalanced; k-sieves are only added to the right subtree, and the tree thus degenerates into a linear linked list. (There may be the occasional left subtree of some nodes depending on the order in which we re-insert k-sieves after removing their first elements.) A self-balancing tree would be more complex to implement and might actually reduce performance because (a) the balancing operation adds overhead, and (b) the most frequently accessed nodes are those with the lowest values (the 2-sieve, followed by the 3-sieve, followed by the 5-sieve, etc.), but these would be pushed deeper into the tree by re-balancing.

Lazy Evaluation

Our list- or priority queue-based sieves run into a memory usage problem — the input list of candidates contains n-2 elements and each k-sieve is initialised with all multiples of k from k2 to n. Memory consumption is an issue that also afflicts the imperative sieve. We can address the issue by using lazy evaluation. A lazy Eratosthenes sieve would not only reduce memory usage considerably, it would also allow us to answer questions like what is the xth prime?, for which we would otherwise need to know the size of the sieve in advance or find it by trial and error.

Although Common Lisp uses strict evaluation, we can implement lazy evaluation quite easily using a generator; a function that, when called, returns the next value in the sequence. A lightweight method would be to store each k-list as a tuple containing the “current value”, the prime k and the square root of the search limit n. Given those three quantities we can simply calculate the next value when required using a general function. A heavier method (but one that is generally used) is to wrap these values in a thunk (a piece of code) that closes around them, hiding them from view and keeping the functional abstraction at a higher level.

With functional programming in non-pure languages, there’s often a tradeoff in terms of the level at which we present the “no mutable data” abstraction. Of course, even languages like Miranda and Haskell must alter memory cells to work but this is hidden behind the scenes by the compiler. Thunks that mutate data inside closures are used in functional programming to implement memoisation and lazy evaluation.

In this implementation, I’ll select the latter approach and represent a k-list as a dotted pair (cons cell) containing the current value in the CAR and the thunk in the CDR. We can access the next value in the k-list in the same way as we did before using first (or car, which does the same thing.) However, to get the next value, instead of using rest we need a lazy equivalent lazy-rest which calls the thunk to generate the next value. Here’s the lazy list abstraction:

(declaim (inline make-lazy-list
                 lazy-first
                 lazy-rest))

(defun make-lazy-list (thunk)
  (let ((value (funcall thunk)))
    (if (null value)
        nil
        (cons value thunk))))

(defun lazy-first (lazy-list)
  (car lazy-list))

(defun lazy-rest (lazy-list)
  (make-lazy-list (cdr lazy-list)))

Here’s a couple of functions that create a lazy k-list and a lazy list of candidates. (These are so similar it’d be worth creating a macro to make an arbitrary lazy list comprehension if we used this technique often.)

(defun make-lazy-sieve (k n)
  "Lazy k-sieve."
  (let ((next-value (* k k)))
    (make-lazy-list #'(lambda ()
             (if (null next-value) 
                 nil
                 (let ((current-value next-value)
                       (nv (+ k next-value)))
                   (setf next-value (if (> nv n) nil nv))
                   current-value))))))

(defun make-lazy-candidates (n)
  "Lazy prime number candidates list."
  (let ((next-value 2))
    (make-lazy-list #'(lambda ()
      (if (null next-value)
          nil
          (let ((current-value next-value)
                (nv (1+ next-value)))
               (setf next-value (if (> nv n) nil nv))
               current-value))))))

Our priority queue functions remain as before. (We could substitute function calls to first with lazy-first but these do exactly the same thing.) There are only slight modifications to the prime-p function and primes (shown underlined):

(defun prime-p (i sieves)
  (multiple-value-bind (matches new-sieves) 
                       (pq-remove i sieves)
    (if (null matches)
        (values t sieves)
        (values nil (reduce #'(lambda (q s) (pq-insert s q))
                            (remove-if 'null
                                       (mapcar 'lazy-rest matches))
                            :initial-value new-sieves)))))

(defun primes (n) 
  (let ((sqrt-n (isqrt n)))

    (labels ((test-prime (primes sieves candidates)
                (if (null candidates)
                primes
                (let ((i (lazy-first candidates)))
                   (multiple-value-bind (is-prime new-sieves)
                                        (prime-p i sieves)
                      (test-prime (if is-prime (cons i primes) primes)
                                  (if (and is-prime (< i sqrt-n))
                                      (pq-insert (make-sieve i n) new-sieves)
                                      new-sieves)
                                  (lazy-rest candidates)))))))

      (nreverse
        (test-prime nil nil (make-lazy-candidates n))))))

Now let’s look and see how big a penalty we had to pay for that laziness.

Basically, very little. The lazy sieve (using a priority queue) keeps pace with the non-lazy priority queue while remaining a couple of orders magnitude slower than the imperative sieve up to the greatest value of n tested (5 x 107). It also boasts far lower memory consumption; the maximum k-list size (number of entries in the priority queue) is the number of primes up to sqrt(50,000,000), which is 908 — a very modest number.

Conclusion

In my original blog article, I concluded that the Sieve of Eratosthenes algorithm was basically imperative, and that even with a superior data structure the functional version would have no advantage over the imperative version so would not be worth the effort. Revisiting the problem has shown that re-framing it in a way more amenable to tackling using functional techniques makes it not only tractable, the functional approach using lazy evaluation is superior to the imperative sieve in an important aspect; it is ultimately more scalable because it has far lower memory consumption. Furthermore, because the sieve is evaluated in a lazy fashion, with some very minor modifications we can write a program to calculate the nth prime number, something which we could not do with non-lazy or imperative sieves without determining the sieve size in advance.

References

[1] M. E. O’Neill, The Genuine Sieve of Eratosthenes.
[2] Rosetta Code. Sieve of Eratosthenes.
[3] Wikipedia. Priority Queue.
[4] Wikipedia. Binary Search Tree.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s