Saturday, December 14, 2013

Example from BRML in Haskell

I recently started reading Bayesian Reasoning and Machine Learning by David Barber which I'm really enjoying so far. To help with the exercises the author wrote the BRMLtoolbox which he makes available on his website. It's quite extensive but it's written in matlab. I don't really hate matlab but I would definitely prefer Haskell. The book starts with exercises on probability so I looked and found the probability package for Probabilistic Functional Programming.
To get acquainted with this library I decided to reimplement the first demo example from the BRMLtoolbox in Haskell. Mostly I wanted to know if it was possible to do this and if the resulting code would maybe even be more concise and instructive than the matlab code (spoiler: I think it is).
So here is the matlab code:
function demoClouseau
%DEMOCLOUSEAU inspector clouseau example
butler=1; maid=2; knife=3; % Variable order is arbitary
murderer=1; notmurderer=2; used=1; notused=2; % define states, starting from 1.

% The following definitions of variable are not necessary for computation,
% but are useful for displaying table entries:
variable(butler).name='butler'; variable(butler).domain = {'murderer','not murderer'};
variable(maid).name='maid'; variable(maid).domain ={'murderer','not murderer'};
variable(knife).name='knife'; variable(knife).domain={'used','not used'};

% Three potentials since p(butler,maid,knife)=p(knife|butler,maid)p(butler)p(maid).
% potential numbering is arbitary
pot(butler).variables=butler;
pot(butler).table(murderer)=0.6;
pot(butler).table(notmurderer)=0.4;

pot(maid).variables=maid;
pot(maid).table(murderer)=0.2;
pot(maid).table(notmurderer)=0.8;

pot(knife).variables=[knife,butler,maid]; % define array below using this variable order
pot(knife).table(used, notmurderer, notmurderer)=0.3;  
pot(knife).table(used, notmurderer, murderer)   =0.2;
pot(knife).table(used, murderer,    notmurderer)=0.6;
pot(knife).table(used, murderer,    murderer)   =0.1;
pot(knife).table(notused,:,:)=1-pot(knife).table(used,:,:); % due to normalisation

jointpot = multpots(pot([butler maid knife])); % joint distribution

drawNet(dag(pot),variable);
disp('p(butler|knife=used):')
disptable(condpot(setpot(jointpot,knife,used),butler),variable);
Inspector Clouseau tries to find a murderer. Suspects are the butler and the maid. A knife is considered as the murder weapon. Clouseau has prior probabilities for many combinations of suspects and weapons (it is also possible that both suspects guilty) and he now wants to know what the probability is of the butler being the murderer given the knife was used.
Let's see what it looks like in Haskell. After installing the packages we can import the necessary modules:
import Control.Monad
import qualified Numeric.Probability.Distribution as Dist
import Numeric.Probability.Distribution ((>>=?))
Similarly to lists, probability distributions are Monads and we will make heavy use of that fact. The function (>>=?) is an infix version of filter and goes nicely along with (>>=) as we will see.
Next are some definitions:
type Probability = Rational
type Dist a = Dist.T Probability a
Instead of Rational it's also possible to use Float or Double. Dist a is now a distribution that assigns probabilities to values of type a.
Now it's time to initialize those values. In the matlab code this was a bit inelegant as we had to assign a number to every variable that we wanted to use. In Haskell we can do this:
data Suspect = Murderer | Notmurderer deriving (Show, Eq, Ord)
type Butler = Suspect
type Maid = Suspect
data Knife = Used | Notused deriving (Show, Eq, Ord)
I think this is self-explanatory. Let's take a look at the corresponding probability distribution.
clouseau :: Dist (Butler, Maid, Knife)
clouseau = do
    butler <- Dist.choose 0.6 Murderer Notmurderer
    maid   <- Dist.choose 0.2 Murderer Notmurderer
    knife  <- case (butler, maid) of
            (Murderer, Murderer)       -> Dist.choose 0.1 Used Notused
            (Murderer, Notmurderer)    -> Dist.choose 0.6 Used Notused
            (Notmurderer, Murderer)    -> Dist.choose 0.2 Used Notused
            (Notmurderer, Notmurderer) -> Dist.choose 0.3 Used Notused
    return (butler, maid, knife)
The do-notation provides a very convenient way to produce a joint distribution for the three variables Butler, Maid and Knife. The function choose returns a distribution with only two values. It's type looks like this:
Dist.choose :: Num prob => prob -> a -> a -> Dist.T prob a
prob is the probability of the first given value. The second value has 1-prob accordingly. For variables with more than two possible values there is relative:
Dist.relative :: Fractional prob => [prob] -> [a] -> Dist.T prob a
This function also makes sure that the probabilities sum up to 1.
The Monad-implementation then puts it all together and we can be sure that all the probabilities have been taken care of. The result looks like this:
ghci> Dist.decons clouseau 
[((Murderer,Murderer,Used),3 % 250),((Murderer,Murderer,Notused),27 % 250),((Murderer,Notmurderer,Used),36 % 125),((Murderer,Notmurderer,Notused),24 % 125),((Notmurderer,Murderer,Used),2 % 125),((Notmurderer,Murderer,Notused),8 % 125),((Notmurderer,Notmurderer,Used),12 % 125),((Notmurderer,Notmurderer,Notused),28 % 125)]
Dist.decons deconstructs the distribution so that it can be displayed. Perhaps there is a way to print it prettier but it was not that important for me. The numbers all agree with the matlab computation.
In the last step we're interested in the distribution for the butler given the fact that the knife was used. This is done by
butlerGivenKnife :: Dist (Butler, Maid, Knife) -> Dist Butler
butlerGivenKnife x =  Dist.norm $ x >>=? givenknife >>= onlybutler
    where
        givenknife (_, _, knife) = (knife==Used)
        onlybutler (butler, _, _) = return butler
So, what happens here? First, >>=? filters the given distribution x through the function givenknife which returns True if the knife was used. The resulting distribution is then passed to the function onlybutler which throws away everything that is not butler. Finally, Dist.norm is called upon the result. This can never do harm as it ensures that the distribution is properly normalized. In this case it is necessary because applying onlybutler results in many duplicates in the distribution.
And that's it! 28 lines of code, including imports and two type-definitions for convenience at the beginning. Granted, it was a very easy example. The final result is this:
ghci> Dist.decons $ butlerGivenKnife clouseau 
[(Murderer,75 % 103),(Notmurderer,28 % 103)]
So, chances are pretty high that the butler is guilty. Again, the numbers are identical to matlab's result. An interesting question would be how efficient the Haskell code runs. Maybe the compiler is able to reduce it all to multiplications and summations. On the other hand, the package uses lists for the probability distributions which are not very efficient. But overall I'm very satisfied with the result.