% How to prove large software projects correct % Gabriella Gonzalez % Big Tech Day - June 12, 2015
We would like to prove large software projects correct:
... without a proof assistant
... without a type-checker
... without a computer!
We will approach complex proofs using the Unix philosophy:
-
Prove one thing and prove it well
-
Compose smaller proofs to build larger proofs
-
Everything is a
StringMonoid
-
Brief introduction to Haskell
-
Prove one thing and prove it well
-
Compose smaller proofs to build larger proofs
-
Everything is a
StringMonoid -
Conclusion
I will be using the Haskell programming language for all examples, because:
- Haskell is the most widely used purely functional language
- I want the examples to fit on the slides
I will teach the language as I go along
Traditional formal methods:
- Define a formal language for expressing properties about your program
- Create a specification for your program in this formal language
- Verify that your program meets this specification
Equational reasoning:
- The programming language is your formal language
- You define equations directly within the programming language
- You specify desired behavior in terms of equations
We specify the behavior of (&&)
in terms of two equations:
False && x = False -- Equation #1
True && x = x -- Equation #2
Then we can use the above two equations to prove properties about (&&)
:
y && True = y
-- Case #1: y = True
True && True = True -- (2)
-- Case #2: y = False
False && True = False -- (1)
-- QED
These equations are "executable"!
>>> True && False
False
"Purely functional" languages support equational reasoning:
- Haskell
- Purescript
- Elm
- Idris
These languages lower the barrier to formal reasoning. You learn two languages for the price of one: a programming language and a formal language.
Haskell programs are just equations
For example, you specify what to run by equating main
with a subroutine
-- example.hs
main :: IO ()
main = putStrLn "Hello, world!"
$ ghc -O2 example.hs
$ ./example
Hello, world!
putStrLn :: String -> IO ()
Haskell functions are defined using equations:
exclaim x = x ++ "!"
main = putStrLn (exclaim "Hello, world")
$ ./example
Hello, world!
exclaim x = x ++ "!"
Equality means that everywhere we see exclaim x
we can replace it with
x ++ "!"
:
putStrLn (exclaim "Hello, world")
-- exclaim x = x ++ "!"
= putStrLn ("Hello, world" ++ "!")
Vice versa, anywhere we see x ++ "!"
, we can replace it with exclaim x
:
putStrLn ("Goodbye, world" ++ "!")
-- "Goodbye, world" ++ "!" = exclaim "Goodbye, world"
= print (exclaim "Goodbye, world")
Bidirectional substitution is safe because evaluation is benign in Haskell; evaluation does not trigger side effects.
- Functions have arguments, but no side effects
even :: Int -> Bool -- A function from `Int` to `Bool`
- Subroutines have side effects, but no arguments
getLine :: IO String -- A subroutine that returns a `String`
What about putStrLn
?
putStrLn :: String -> IO () -- A function from `String` to a subroutine
putStrLn
is a pure function whose result is a subroutine
Evaluating putStrLn
does not trigger the subroutine's side effects!
The only way to run a side effect is to equate the side effect with main
.
x = putStrLn "Goodbye, world!"
main = putStrLn "Hello, world!"
$ ./example
Hello, world!
Even if we force evaluation of x
, nothing changes:
main = x `seq` putStrLn "Hello, world!"
So how do we execute more than one side effect?
Haskell design patterns are "algebraic"
Informally, "algebraic" means that when you combine things you end up back where you started.
(>>) :: IO () -> IO () -> IO ()
-- (x >> y) >> z = x >> (y >> z)
main = putStrLn "Hello, world!" >> putStrLn "Goodbye, world!"
$ ./example
Hello, world!
Goodbye, world!
We don't instrument main
to accept lists of subroutines. Instead, we
combine smaller subroutines into a larger subroutine.
main = do
putStrLn "Hello, world!"
putStrLn "Goodbye, world!"
... desugars to:
main = putStrLn "Hello, world!" >> putStrLn "Goodbye, world!"
main = do
str <- getLine
putStrLn str
... desugars to:
main = getLine >>= (\str -> putStrLn str)
-- same as:
main = getLine >>= putStrLn
(>>=) :: IO a -> (a -> IO b) -> IO b
getLine :: IO String
putStrLn :: String -> IO ()
getLine >>= putStrLn :: IO ()
What if we have nothing to run?
return :: a -> IO a
main = return ()
-
Brief introduction to Haskell
-
Prove one thing and prove it well
-
Compose smaller proofs to build larger proofs
-
Everything is a
StringMonoid -
Conclusion
Let's begin with a simple Haskell program:
main = do
putStrLn "!"
putStrLn "!"
putStrLn "!"
This obviously prints "!"
three times:
$ ./example
!
!
!
We can use the replicateM_
utility from Haskell's standard library to run a
command multiple times:
-- Simplified a little from the original definition
replicateM_ :: Int -> IO () -> IO ()
replicateM_ 0 io = return ()
replicateM_ n io = io >> replicateM_ (n - 1) io
Now we can write:
import Control.Monad (replicateM_)
main = replicateM_ 3 (putStrLn "!")
... and the program behaves the same:
$ ./example
!
!
!
... but is this really equal to the previous program?
We wish to prove that:
replicateM_ 3 (putStrLn "!") = putStrLn "!" >> putStrLn "!" >> putStrLn "!"
We could prove this using the two equations that define replicateM_
:
replicateM_ 0 io = return () -- Equation #1
replicateM_ n io = io >> replicateM_ (n - 1) io -- Equation #2
replicateM_ 3 (putStrLn "!")
= putStrLn "!" >> replicateM_ 2 (putStrLn "!") -- (2)
= putStrLn "!" >> putStrLn "!" >> replicateM_ 1 (putStrLn "!") -- (2)
= putStrLn "!" >> putStrLn "!" >> putStrLn "!" >> replicateM_ 0 (putStrLn "!") -- (2)
= putStrLn "!" >> putStrLn "!" >> putStrLn "!" >> return () -- (1)
-- (io :: IO ()) >> return () = io
= putStrLn "!" >> putStrLn "!" >> putStrLn "!"
However, this won't scale to larger programs
For example, try proving that:
replicateM_ 5 (replicateM_ 4 io) = replicateM_ 10 io >> replicateM_ 10 io
Intuitively, you "know" it's true, but why?
We can make our "intuition" precise by using four equations:
replicateM_ 0 io = return ()
replicateM_ (x + y) io = replicateM_ x io >> replicateM_ y io
replicateM_ 1 io = io
replicateM_ (x * y) io = replicateM_ x (replicateM_ y io)
We can write the latter two equations in a more point-free form:
replicateM_ 1 = id
replicateM_ (x * y) = replicateM_ x . replicateM_ y
id :: a -> a
id x = x
(.) :: (b -> c) -> (a -> b) -> (a -> c)
(f . g) x = f (g x)
replicateM_ 5 (replicateM_ 4 io)
= replicateM_ (5 * 4) io
= replicateM_ 20 io
= replicateM_ (10 + 10) io
= replicateM_ 10 io >> replicateM_ 10 io
We reduced a complex manipulation to a simpler manipulation
replicateM_ 3 io
-- 3 = 1 + 1 + 1
= replicateM_ (1 + 1 + 1) io
-- replicateM_ (x + y) io = replicateM_ x io >> replicateM_ y io
= replicateM_ 1 io >> replicateM_ 1 io >> replicateM_ 1 io
-- replicateM_ 1 io = io
= io >> io >> io
-
Brief introduction to Haskell
-
Prove one thing and prove it well
-
Compose smaller proofs to build larger proofs
-
Everything is a
StringMonoid -
Conclusion
Assume we have a "simple" associative operator named (+₁)
with identity 0₁
:
(x +₁ y) +₁ z = x +₁ (y +₁ z)
x +₁ 0₁ = x
0₁ +₁ x = x
... and a "complex" associative operator named (+₂)
with identity
0₂
:
(x +₂ y) +₂ z = x +₂ (y +₂ z)
x +₂ 0₂ = x
0₂ +₂ x = x
... and a function that bridges between the two operators and their identities::
map (x +₁ y) = (map x) +₂ (map y)
map 0₁ = 0₂
We can use f
to bridge between complex code using (+₂)
/0₂
and simpler code
using (+₁)
/0₁
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = (f x):(map f xs)
>>> map (+ 1) [1,2,3]
[2,3,4]
map = map
(+₁) = (.)
0₁ = id
(+₂) = (.)
0₂ = id
map (f . g) = map f . map g
map id = id
map = map f
(+₁) = (++)
0₁ = []
(+₂) = (++)
0₂ = []
map f (xs ++ ys) = (map f xs) ++ (map f ys)
map f [] = []
map x = replicatem_ x io
(+₁) = (+)
0₁ = 0
(+₂) = (>>)
0₂ = return ()
replicatem_ (x + y) io = replicatem_ x io >> replicatem_ y io
replicateM_ 0 io = return ()
map = replicateM_
(+₁) = (*)
0₁ = 1
(+₂) = (.)
0₂ = id
replicateM_ (x * y) = replicateM_ x . replicateM_ y
replicateM_ 1 = id
length :: [a] -> Int
length (xs ++ ys) = (length xs) + (length ys)
length [] = 0
-- Assuming: x, y >= 0
replicate :: Int -> a -> [a]
replicate (x + y) a = replicate x a ++ replicate y a
replicate 0 a = []
not :: Bool -> Bool
not (x && y) = (not x) || (not y)
not True = False
exp :: Double -> Double
exp (x + y) = (exp x) * (exp y)
exp 0 = 1
null :: [a] -> Bool
null (xs ++ ys) = (null xs) && (null ys)
null [] = True
putStr :: String -> IO ()
putStr (str1 ++ str2) = putStr str1 >> putStr str2
putStr "" = return ()
-- Assuming: x, y > 0
not (null (replicate x a)) || not (null (replicate y a))
= not (null (replicate x a) && null (replicate y a))
= not (null (replicate x a ++ replicate y a))
= not (null (replicate (x + y) a))
-
Brief introduction to Haskell
-
Prove one thing and prove it well
-
Compose smaller proofs to build larger proofs
-
Everything is a
StringMonoid -
Conclusion
class Monoid m where
mempty :: m
mappend :: m -> m -> m
(<>) :: Monoid m => m -> m -> m
(<>) = mappend
-- (x <> y) <> z = x <> (y <> z)
--
-- x <> mempty = x
--
-- mempty <> x = x
instance Monoid [a] where
mempty = []
mappend = (++)
>>> [1,2] <> [3,4]
[1,2,3,4]
>>> mempty :: [Int]
[]
instance Monoid () where
mempty = ()
mappend () () = ()
>>> () <> ()
()
>>> mempty :: ()
()
instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)
mappend (xL, yL) (xR, yR) = (mappend xL xR, mappend yL, yR)
>>> ([1,2],[5,6]) <> ([3,4],[7,8])
([1,2,3,4],[5,6,7,8])
>>> mempty :: ([Int], [Int])
([],[])
instance Monoid b => Monoid (a -> b) where
mempty = \_ -> mempty
mappend f g = \x -> mappend (f x) (g x)
>>> (id <> reverse) [1,2,3,4]
[1,2,3,4,4,3,2,1]
>>> mempty (True, "story") :: [Int]
[]
instance Monoid b => Monoid (IO b) where
mempty = return mempty
mappend io1 io2 = do
r1 <- io1
r2 <- io2
return (mappend r1 r2)
readLn :: Read a => IO a
>>> readLn :: IO [Int]
[1,2]<Enter>
[1,2]
>>> readLn <> readLn :: IO [Int]
[1,2]<Enter>
[3,4]<Enter>
[1,2,3,4]
>>> mempty :: IO [Int]
[]
The type of putStrLn
is a Monoid
putStrLn :: String -> IO ()
Here is how the compiler deduces that this is a Monoid
:
()
is aMonoid
instance Monoid () where ...
- If
()
is aMonoid
, thenIO ()
is aMonoid
instance Monoid b => Monoid (IO b) where ...
- If
IO ()
is aMonoid
, thenString -> IO ()
is aMonoid
instance Monoid b => Monoid (a -> b)
Let's try this out:
>>> (putStrLn <> putStrLn) "Print me twice"
Print me twice
Print me twice
These are all Monoid
s:
- Nested subroutines:
IO (IO (IO (IO (IO (IO ())))))
- Functions of multiple arguments:
Name -> Age -> Address -> [People]
Name -> (Age -> (Address -> [People]))
- Deeply nested tuples:
(([Int], [Int]), ([Int], [Int]))
- All of the above:
IO (Config -> [IO ([Textures], [Models])])
There's no limit to how many layers we can chain.
For example, this is a valid monoid:
-- +-- Generate files
-- |
-- | +-- Package them
-- | |
-- | | +-- Upload
-- | | |
-- | | | +-- Deploy
-- | | | |
-- | | | | +-- Log
-- | | | | |
-- v v v v v
IO (IO (IO (IO (IO ()))))
-- join :: IO (IO a) -> IO a
deploy :: IO (IO (IO (IO (IO ())))) -> IO ()
deploy = join . join . join . join
import Control.Monad (join)
import Data.Monoid (Monoid(..), (<>))
instance Monoid b => Monoid (IO b) where
mempty = return mempty
mappend io1 io2 = do
r1 <- io1
r2 <- io2
return (mappend r1 r2)
deploy :: IO (IO (IO (IO (IO ())))) -> IO ()
deploy = join . join . join . join
job :: Int -> IO (IO (IO (IO (IO ()))))
job n = do
putStrLn ("Generating job #" <> show n)
return (do
putStrLn ("Packaging job #" <> show n)
return (do
putStrLn ("Uploading job #" <> show n)
return (do
putStrLn ("Deploying job #" <> show n)
return (do
putStrLn ("Logging job #" <> show n)
return () ) ) ) )
Combining commands interleaves their phases:
>>> deploy (job 1 <> job 2 <> job 3)
Generating job #1
Generating job #2
Generating job #3
Packaging job #1
Packaging job #2
Packaging job #3
Uploading job #1
Uploading job #2
Uploading job #3
Deploying job #1
Deploying job #2
Deploying job #3
Logging job #1
Logging job #2
Logging job #3
The empty job does nothing:
>>> deploy mempty
>>> -- Nothing happens
We wrote a staged deploy system:
... in two lines of code (the type signature was optional):
import Control.Monad (join)
deploy = join . join . join . join
... that is well-behaved:
(job1 <> job2) <> job3 = job1 <> (job2 <> job3)
job <> mempty = job
mempty <> job = job
... and we can easily prove correctness by just doing induction on the type:
IO (IO (IO (IO (IO ()))))
The type of a plugin will be:
-- +-- Acquire resources
-- |
-- | +-- Free resources
-- | Char handler |
-- v vvvvvvvvvvvvv v
IO (Char -> IO (), IO ())
sinkTo :: IO (Char -> IO (), IO ()) -> IO ()
sinkTo open = bracket open snd (\(handle, _) -> do
hSetEcho False
let loop = do
eof <- isEOF
unless eof (do
char <- getChar
handle char
loop )
loop )
console :: IO (Char -> IO (), IO ())
console = return (putChar, mempty)
file :: FilePath -> IO (Char -> IO (), IO ())
file path = do
handle <- openFile path WriteMode
return (hPutChar handle, hClose handle)
counter :: IO (Char -> IO (), IO ())
counter = do
counter <- newIORef (0 :: Int)
let handle char = do
n <- readIORef counter
print n
writeIORef counter (n + 1)
return (handle, mempty)
main = sinkTo (console <> file "test.txt" <> file "test2.txt")
-- = sinkTo (counter <> file "test.txt" <> file "test2.txt")
-
Brief introduction to Haskell
-
Prove one thing and prove it well
-
Compose smaller proofs to build larger proofs
-
Everything is a
StringMonoid -
Conclusion
We build complex systems by connecting together small, provably correct pieces:
IO
(a -> b)
(a, b)
[a]
These are analogous to Unix utilities like:
wc
head
grep
cat
We build larger components by connecting smaller components:
IO (IO (IO (IO (IO ()))))
IO (Char -> IO (), IO ())
Composition of components preserves correctness
We use Monoid
as a reusable and general interface for serendipitous
composition of proofs
Each proof takes a Monoid
instance as its input and produces a Monoid
instance as its output:
instance Monoid b => Monoid (IO b) where
We can use the Unix philosophy to scale proofs to complex software:
-
Prove one thing and prove it well
-
Compose smaller components to build larger components
-
Everything is a
StringMonoid
My blog: haskellforall.com
Twitter: @GabriellaG439