Under the hood of Rosette

Rosette is a solver-aided language. Its documentation has good overview of how it can be used. It has been used to find bug in eBPF, reasoning about SQL equivalence and in many other ways. I have covered Rosette previously in this blog by using it to solve a puzzle. Rosette is implemented in Racket which has extensive support for building programming languages. In this blog, we will try to peek into how Rosette works through following code.

#lang rosette/safe

(define-symbolic x y integer?)

(define (double x) (+ x x))

(define lst
  (filter even? (list x y)))

(define sol
  (solve (begin (assert (< 0 x))
                (assert (< (double x) y))
                (assert (= 1 (length lst))))))

(when (sat? sol)
  (evaluate (list x y) sol))

Syntax

Syntax of Rosette is similar to Racket with additional definitions like define-symbolic, solve, assert, evaluate etc. Creating language which looks like it is breeze in Racket with help of syntax/module-reader, which Rosette makes use of. Rosette overrides various definitions of Racket to work with symbolic values. Internally these are defined with name identical to their Racket counterpart but prefixed with @. Above code will look like following after removing that masking.

#lang racket

(require rosette/base/adt/list)
(require rosette/base/core/bool)
(require rosette/base/core/numerics)
(require rosette/base/core/real)
(require rosette/base/form/define)
(require rosette/query/form)
(require rosette/query/eval)
(require rosette/solver/solution)

(define-symbolic x y @integer?)

(define (double x) (@+ x x))

(define lst
  (@filter @even? (@list x y)))

(define sol
  (solve (begin (@assert (@< 0 x))
                (@assert (@< (double x) y))
                (@assert (@= 1 (@length lst))))))

(when (sat? sol)
  (evaluate (@list x y) sol))

Semantics

Rosette lifts some types (integer?, boolean? etc) so that symbolic constant can be defined for those. Inside Rosette these types (@integer?, @boolean? etc) are represented as struct which holds type specific methods like type-cast, type-name and solver specific methods like solveable-default, solveable-range. Since @integer? is exposed as integer? at surface syntax, it should work as predicate that is (integer? 1) or (integer? x) (x is symbolic integer) should result into true. This is achieved through struct-index-field which allows struct to be used as function.

To introduce symbolic constant in Rosette define-symbolic macro is used. (define-symbolic x integer?) expands to (define-values (x) ((make-const #'x integer?)). All lifted definitions do as much as concrete evaluation as possible. For example (+ 1 2) evaluates to 3 whereas (+ 1 x 3) evaluates to (+ 4 x). Further it reduces symbolic expression in canonical form by sorting subexpressions based on term order. Term order is natural number which accounts for term introduction order in program. In above program term order of x is 0 whereas it is 1 for y, making (+ x y) canonical form of (+ y x). This makes SMT-Lib encoding small. While generating SMT-Lib Rosette will only output (define-fun e1 () Int (+ 4 c0)) (c0 is symbolic integer representing x in SMT-Lib) even when surface syntax contains both expressions (+ x 4) and (+ 4 x). Another way to think about it is reasoning about their equivalence is pushed up at Rosette level instead of pushing down to SMT solvers. This allows Rosette to do some optimizations. For example (if (= (+ x 4) (+ 4 x)) x 4) reduces to x contrast that to naive SMT-Lib encoding which will generate something like following

(declare-fun c0 () Int)
(define-fun e1 () Int (+ c0 4))
(define-fun e2 () Int (+ 4 c0))
(define-fun e3 () Int (ite (= e1 e2) c0 4))

Practically it means SMT-Lib generated by Rosette is easy to read and understand as it strips away various artificial encodings.

In presence of if expression program can branch into different paths depending on predicate. If predicate is concrete value or can be optimized away to it, it is not problem otherwise Rosette uses union to encode it. union is list of predicate and expression pair. In our code lst is a union as filter uses if internally, even? being its predicate. This is how lst union looks like.

>> (union-contents lst)
(list
 (list (&& (= 0 (remainder x 2)) (= 0 (remainder y 2))) x y)
 (list (&& (! (= 0 (remainder x 2))) (! (= 0 (remainder y 2)))))
 (list
  (|| (&& (! (= 0 (remainder x 2))) (= 0 (remainder y 2))) 
      (&& (= 0 (remainder x 2)) (! (= 0 (remainder y 2)))))
  (ite* (⊢ (&& (! (= 0 (remainder x 2))) (= 0 (remainder y 2))) y) 
        (⊢ (&& (= 0 (remainder x 2)) (! (= 0 (remainder y 2)))) x))))

In plain english, it is saying that

If x and y are both even then lst is (list x y)
If x and y are both odd then lst is (list)
If x is even and y is odd then lst is (list x)
If x is odd and y is even then lst is (list y)

Rosette maintains that predicates of union are mutually exclusive and union length is within polynomial bound of program size. From user perspective lst is list so they should able to apply list functions on it.

> (union-contents (cons 1 lst))
(list
 (list (&& (= 0 (remainder x 2)) (= 0 (remainder y 2))) 1 x y)
 (list (&& (! (= 0 (remainder x 2))) (! (= 0 (remainder y 2)))) 1)
 (list
  (|| (&& (! (= 0 (remainder x 2))) (= 0 (remainder y 2))) 
      (&& (= 0 (remainder x 2)) (! (= 0 (remainder y 2)))))
  (ite* (⊢ (&& (! (= 0 (remainder x 2))) (= 0 (remainder y 2))) 1 y) 
        (⊢ (&& (= 0 (remainder x 2)) (! (= 0 (remainder y 2)))) 1 x))))

Sweet, Rosette goes under each guards (predicates) of union and applies cons to underlying expression. Finally we move to solve part of program. Whenever Rosette sees assert statement it adds corresponding boolean expression in vc parameter. In our program after last assert statement, vc contains following asserts that need to be checked. Since we haven’t used any assume in our program assumes in vc is trivial expression #t.

(&& (&& (|| (&& (= 0 (remainder x 2)) (= 0 (remainder y 2)))
            (&& (! (= 0 (remainder x 2))) (! (= 0 (remainder y 2))))
            (|| (&& (! (= 0 (remainder x 2))) (= 0 (remainder y 2)))
                (&& (= 0 (remainder x 2)) (! (= 0 (remainder y 2))))))
        (&& (< 0 x) (< (+ x x) y)))
    (= 1 (ite* (⊢ (&& (= 0 (remainder x 2)) (= 0 (remainder y 2))) 2)
               (⊢ (&& (! (= 0 (remainder x 2))) (! (= 0 (remainder y 2)))) 0)
               (⊢ (|| (&& (! (= 0 (remainder x 2))) (= 0 (remainder y 2)))
                      (&& (= 0 (remainder x 2)) (! (= 0 (remainder y 2))))) 1))))

Notice like cons example above, Rosette strips away length function by applying it to expressions of union. Combination of union and concrete evaluation make it possible for Rosette to avoid encoding list and its associated axioms in SMT-Lib. On syntax level, Rosette can handle various ADT like struct, box, vector etc. But when Rosette emits SMT-Lib it only deals with simple data types like Int, Bool, BitVec of SMT-Lib which is possible due to representation of branch using union and enough concrete evaluation of definitions. Inside solve, Rosette encodes vc to SMT-Lib, send it to SMT solver and read result back.