Saturday, October 23, 2021

Idiomatic Clojure without sacrificing performance

This use case is perfect for transducers. They were written exactly for a series of sequence transformations:

(->> xs
     (map f)
     (filter g)
     (map h))
;; is equivalent to =>
(sequence
 (comp
  (map f)
  (filter g)
  (map h))
 xs)

With the added bonus of removing intermediary allocations. That's pretty neat. The only problem is we're missing a crucial component. There is no transducer equivalent to (partition n step coll).

Are we doomed? Not quite.

There is a close transducer, partition-all, which has no step arity. Let's look at how it's implemented:

(defn partition-all
  ([^long n]
   (fn [rf]
     (let [a (java.util.ArrayList. n)]
       (fn
         ([] (rf))
         ([result]
          (let [result (if (.isEmpty a)
                         result
                         (let [v (vec (.toArray a))]
                           ;;clear first!
                           (.clear a)
                           (unreduced (rf result v))))]
            (rf result)))
         ([result input]
          (.add a input)
          (if (= n (.size a))
            (let [v (vec (.toArray a))]
              (.clear a)
              (rf result v))
            result)))))))

If we wanted a sliding window, all we have to do was replace the ArrayList with a Queue!

(defn sliding
  ([^long n]
   (sliding n 1))
  ([^long n ^long step]
   (fn [rf]
     (let [a (java.util.ArrayDeque. n)] ;; Queue here
       (fn
         ([] (rf))
         ([result] (rf result)) ;; don't need leftovers
         ([result input]
          (.add a input)
          (if (= n (.size a))
            (let [v (vec (.toArray a))]
              ;; Remove `step` elements instead of clear
              (dotimes [_ step] (.removeFirst a))
              (rf result v))
            result)))))))

Let's convinces ourselves it works:

(sequence (sliding 3 1) '[a b c d e]);; => ([a b c] [b c d] [c d e])

Now we can define an equivalent transducer:

(def baseline-xf
  (comp
   (sliding 8 1)
   (map (juxt identity
              (comp (partial apply -)
                    (juxt last first))))
   (filter (comp (partial > 1000) second))))

(cc/quick-bench (doall (sequence baseline-xf times-v)))
;; Evaluation count : 6 in 6 samples of 1 calls.
;;              Execution time mean : 462.921956 ms
;;     Execution time std-deviation : 20.213288 ms
;;    Execution time lower quantile : 453.931650 ms ( 2.5%)
;;    Execution time upper quantile : 497.963799 ms (97.5%)
;;                    Overhead used : 2.079753 ns

And we're already ~2.5x faster



from Hacker News https://ift.tt/3vmc96V

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.