I recently started reading Bayesian Reasoning and Machine Learning by David Barber which I'm really enjoying so far. To help with the exercises the author wrote the BRMLtoolbox which he makes available on his website. It's quite extensive but it's written in matlab. I don't really hate matlab but I would definitely prefer Haskell. The book starts with exercises on probability so I looked and found the probability package for Probabilistic Functional Programming.
So here is the matlab code:
function demoClouseau %DEMOCLOUSEAU inspector clouseau example butler=1; maid=2; knife=3; % Variable order is arbitary murderer=1; notmurderer=2; used=1; notused=2; % define states, starting from 1. % The following definitions of variable are not necessary for computation, % but are useful for displaying table entries: variable(butler).name='butler'; variable(butler).domain = {'murderer','not murderer'}; variable(maid).name='maid'; variable(maid).domain ={'murderer','not murderer'}; variable(knife).name='knife'; variable(knife).domain={'used','not used'}; % Three potentials since p(butler,maid,knife)=p(knife|butler,maid)p(butler)p(maid). % potential numbering is arbitary pot(butler).variables=butler; pot(butler).table(murderer)=0.6; pot(butler).table(notmurderer)=0.4; pot(maid).variables=maid; pot(maid).table(murderer)=0.2; pot(maid).table(notmurderer)=0.8; pot(knife).variables=[knife,butler,maid]; % define array below using this variable order pot(knife).table(used, notmurderer, notmurderer)=0.3; pot(knife).table(used, notmurderer, murderer) =0.2; pot(knife).table(used, murderer, notmurderer)=0.6; pot(knife).table(used, murderer, murderer) =0.1; pot(knife).table(notused,:,:)=1-pot(knife).table(used,:,:); % due to normalisation jointpot = multpots(pot([butler maid knife])); % joint distribution drawNet(dag(pot),variable); disp('p(butler|knife=used):') disptable(condpot(setpot(jointpot,knife,used),butler),variable);Inspector Clouseau tries to find a murderer. Suspects are the butler and the maid. A knife is considered as the murder weapon. Clouseau has prior probabilities for many combinations of suspects and weapons (it is also possible that both suspects guilty) and he now wants to know what the probability is of the butler being the murderer given the knife was used.
Let's see what it looks like in Haskell. After installing the packages we can import the necessary modules:
import Control.Monad import qualified Numeric.Probability.Distribution as Dist import Numeric.Probability.Distribution ((>>=?))Similarly to lists, probability distributions are Monads and we will make heavy use of that fact. The function
(>>=?)
is an infix version of filter
and goes nicely along with (>>=)
as we will see.Next are some definitions:
type Probability = Rational type Dist a = Dist.T Probability aInstead of
Rational
it's also possible to use Float
or Double
. Dist a
is now a distribution that assigns probabilities to values of type a
.Now it's time to initialize those values. In the matlab code this was a bit inelegant as we had to assign a number to every variable that we wanted to use. In Haskell we can do this:
data Suspect = Murderer | Notmurderer deriving (Show, Eq, Ord) type Butler = Suspect type Maid = Suspect data Knife = Used | Notused deriving (Show, Eq, Ord)I think this is self-explanatory. Let's take a look at the corresponding probability distribution.
clouseau :: Dist (Butler, Maid, Knife) clouseau = do butler <- Dist.choose 0.6 Murderer Notmurderer maid <- Dist.choose 0.2 Murderer Notmurderer knife <- case (butler, maid) of (Murderer, Murderer) -> Dist.choose 0.1 Used Notused (Murderer, Notmurderer) -> Dist.choose 0.6 Used Notused (Notmurderer, Murderer) -> Dist.choose 0.2 Used Notused (Notmurderer, Notmurderer) -> Dist.choose 0.3 Used Notused return (butler, maid, knife)The
do
-notation provides a very convenient way to produce a joint distribution for the three variables Butler
, Maid
and Knife
. The function choose
returns a distribution with only two values. It's type looks like this:Dist.choose :: Num prob => prob -> a -> a -> Dist.T prob a
prob
is the probability of the first given value. The second value has 1-prob
accordingly. For variables with more than two possible values there is relative
:
Dist.relative :: Fractional prob => [prob] -> [a] -> Dist.T prob aThis function also makes sure that the probabilities sum up to
1
.The
Monad
-implementation then puts it all together and we can be sure that all the probabilities have been taken care of. The result looks like this:
ghci> Dist.decons clouseau [((Murderer,Murderer,Used),3 % 250),((Murderer,Murderer,Notused),27 % 250),((Murderer,Notmurderer,Used),36 % 125),((Murderer,Notmurderer,Notused),24 % 125),((Notmurderer,Murderer,Used),2 % 125),((Notmurderer,Murderer,Notused),8 % 125),((Notmurderer,Notmurderer,Used),12 % 125),((Notmurderer,Notmurderer,Notused),28 % 125)]
Dist.decons
deconstructs the distribution so that it can be displayed. Perhaps there is a way to print it prettier but it was not that important for me. The numbers all agree with the matlab computation.In the last step we're interested in the distribution for the butler given the fact that the knife was used. This is done by
butlerGivenKnife :: Dist (Butler, Maid, Knife) -> Dist Butler butlerGivenKnife x = Dist.norm $ x >>=? givenknife >>= onlybutler where givenknife (_, _, knife) = (knife==Used) onlybutler (butler, _, _) = return butlerSo, what happens here? First,
>>=?
filters the given distribution x
through the function givenknife
which returns True
if the knife was used. The resulting distribution is then passed to the function onlybutler
which throws away everything that is not butler. Finally, Dist.norm
is called upon the result. This can never do harm as it ensures that the distribution is properly normalized. In this case it is necessary because applying onlybutler
results in many duplicates in the distribution.And that's it! 28 lines of code, including imports and two
type
-definitions for convenience at the beginning. Granted, it was a very easy example. The final result is this:
ghci> Dist.decons $ butlerGivenKnife clouseau [(Murderer,75 % 103),(Notmurderer,28 % 103)]So, chances are pretty high that the butler is guilty. Again, the numbers are identical to matlab's result. An interesting question would be how efficient the Haskell code runs. Maybe the compiler is able to reduce it all to multiplications and summations. On the other hand, the package uses
list
s for the probability distributions which are not very efficient. But overall I'm very satisfied with the result.