BAYESIAN EVIDENCE AS A DIAGNOSTIC
FOR TESTING MODELS OF GRAVITATIONAL WAVE DATA
The Laser Interferometer Space Antena (LISA) is expected to detect a
large number of gravitational wave sources in a frequency range from
0.0001Hz to 0.1Hz. The problem we address is how well the parameters
such as the number of sources M, the amplitude A, the frequency f and
the phase can be extracted from the data.
First of all, we consider the problem of estimating the number of
sinusoids, M. In the document, 'An Introduction to
the problem of estimating the number of sinusoids, M' I have provided a brief introduction to the
problem and its formulation using Bayesian techniques.
In the document above, we obtained a general expression for the
probability of M given the data and any prior information I. Our next
step is to generate simulated data and attempt to recover the value of
M from it. For our simulated data we take the amplitudes equal to 1,3
and 5 and the frequncies equal to 0.01,0.03 and 0.05 and add random
noise of mean zero and sigma 1. The code we used to generate the
simulated data is given here. The time-series and the power
spectrum might give more of an insight into the simulated data. We
generate 100 time samples evenly spaced with 1 second time intervals
between them. We also set the phase equal to zero in order to consider
a simpler problem at the outset.
Having generated the simulated data, our next task is to attempt to
recover the most probable value of 'M'. At this stage, to simplify the
problem even further, we consider a discrete problem i.e. we allow only
certain values of M, A and f. The exact details of the formulation can
be found in the document 'Formulation of the
discrete problem.'
In order to determine the most probable value of M, it is necessary to
determine the probabilities for each value of M. In the above
document we derived an expression for the probability of M. We use that
expression to calculate the probabilities for each value of M. One of
the programs used for this - 'prob3' which calculates the probability
that 'M' is three is given here.
With the above formulation of the problem, we worked out the
probability distribution for M which you can find here.
After this, we regenerated simulated data with different standard
deviations of the noise. The probability distribution for sigma=5 and sigma=15
show us that the program is able to extract the
value of M, even for sigma = 15. We also looked at other sets of
simulated data such as: 'actual M = 1', 'actual M =2' and data with no signal but just
random noise of a particular sigma. ( sigma = 1, sigma = 10)
Having determined the most probable value of M, we proceed to attempt
to recover the most probable values of the amplitude A and the
frequency f given the value of M. We saw earlier that the probability
of a particular set of amplitudes and frequencies was proportional to
the exponential of the negative chisquared value for the corresponding
set of parameters. In this part we consider only the chisquared values,
hence the minimum of the chisquared surface (or the slice through it)
will give the most probable value. An additional reason for considering
chisquared values rather than probabilities is that we shall have to
perform integrations over the chisquared hypersurface, and it may prove
helpful to achieve a degree of familiarity with them.
We construct a grid of possible amplitude and frequency values and
calculate chisquared values for each set of them. Thus we obtain a 2M
dimensional chisquared hypersurface. We look at particular
lower-dimensional regions of this hypersurface to determine the minimum
values of chisquared in that region and hence the most probable value
of the parameter under consideration. It is worth reiterating that the
simulated data signal is given by a superposition (sum) of three
sinusoids i.e. A1sin(2pif1t)+A2sin(2pif2t)+A3sin(2pif3t).
By considering suitable regions of the chisquared hypersurace, we
obtain probability distributions for the three amplitudes - A1, A2 and A3 (the three amplitudes and their associated
frequencies are completely symmetric in that swapping A1and f1 with
A3and f3 does not change anything) and the three frequencies f1, f2 and f3. We can see that it is easier to determine the
frequency than the amplitude, as the chisquared value is more sensitive
to a small change in frequency than a corresponding one in amplitude.
We can also look at contours where we consider the probability
distributions of two parameters simultaneously. First of all we
consider an amplitude-amplitude contour
(a1=1 and a2=3) with all other parameters at their correct values
and one with slightly erroneous
frequencies. Note that the chisquared values which are given on the
contour plots are much larger for the erroneous frequencies although we
are unable to delineate the actual amplitudes. We go on to consider an amplitude-frequency contour map and a
frequency-frequency one.
So far we have looked at determining particular amplitudes and
frequencies given the value of the other parameters. Now, we look at
determining the most probable value of a particular amplitude (say A1)
or a particular frequency (say f1) by marginalising over all possible
values of the other parameters. However, as was pointed out
earlier, there is a symmetry between the elements of the set of
amplitudes as well as the elements of the set of frequencies i.e. there
is no way of saying which amplitude is A1, which is A2 and so on. This
is illustrated by looking at the probability distribution for A1 and f1 with a
symmetrical prior.
To resolve this problem, we use a hierarchical prior on the amplitude
or frequency. The hierarchical prior on amplitude stipulates that
A1>=A2>=A3. This would enable us to delineate three separate
amplitudes - A1, A2 and A3 along with their associated frequencies f1,
f2 and f3. With the hierarchical prior on amplitude, we produce
the posterior probablity distributions for A1, A2 and A3 and their
associated frequencies f1, f2 and f3. Then, we change over to a
hierarchical prior on frequency i.e. f1>f2>f3. Note that we are
excluding the possibility that two sinusoids may have the same
frequency although we are allowing them to have the same amplitude.
With the hierarchical prior on frequency we produce probability
distributions for f1,
f2 and f3 and their
associated amplitudes A1, A2 and A3.
As this does not give much information about the preferability or
otherwise of each prior, we consider noise of sigma=5 and repeat the
above excercise. For the prior on amplitude, we obtain probability
distributions for
A1, A2, A3, f1, f2 and f3
and for the prior on frequency - A1,
A2, A3, f1, f2 and f3. It is
evident from these graphs that the prior on frequency may be slightly
more preferable than the prior on amplitude. However, the prior on
amplitude has the advantage that all the strong signals appear at the
beginning and the weaker ones appear towards the end.
After this we hope to extend the above analysis to higher values of
'm', the number of signals. We would also like to consider problems
where the signal amplitudes and frequencies are not restricted to
particular values as has been the case so far. A comprehensive
integration (actually a sum over a particular grid) for such a problem
would be beyond the scope of 'present-day' computing power. In order to
address such problems, we require Markov chain Monte-Carlo techniques
(MCMC for short).
If we proceed as before, we arrive at the expression for the
probability of a particular set of parameters, which is the exponential
of the chisquared value divided by two. We need to explore this
multidimensional probability space in order to find the maximum
likelihood or minimum chisquared and explore the space around it.
However, as mentioned earlier a comprehensive exploration of this
multidimensional space is difficult to achieve and thus the requirement
for MCMC.
MCMC makes use of random numbers and is based on the idea of a random
walk. From a given point in the probability distribution, a step is
taken in a random direction, with the length of the step drawn from a
Gaussian distribution with a given mean and standard deviation. If the
probability is higher at the new location, then the step is accepted.
If the probability is lower, then the step is accepted with a certain
probability. After a large number of iterations, we expect the
parameters to converge to their 'true' values.
First of all we generate our simulated data set. This would contain a
superposition of m sinusoids and random noise added to it. The program
used to generate the simulated data is given here.
Then, we set up suitable starting values for all the parameters i.e.
each of the m amplitudes, frequencies and phases. In general we would
use the same starting point for each of the m amplitudes, each of the
frequencies and each of the phases. We set the mean of the step size to
be zero and the standard deviation to decay exponentially with a lower
limit. At the ouset the step size is of the order of the parameter
value. But, towards the end the standard deviation of the step size is
one-hundredth of the order of the parameters.
We also make another modification of the standard MCMC technique.
We
do not allow the program to explore regions of the probability
distribution where the probability is lower than its 'current location'
for the first few iterations. We expect the program to find the maxima
of the probability distribution by this time. After a few thousand
iterations, it is allowed to step into regions of lower probability
with a certain probability. So, we essentially employ a two step
process. In the first step, the program attempts to find the maximum
for each of the parameters and in the second it explores the region
around the maximum. The programs used for this are mcmc.m
and mcmcprog.m.
We attempt to look at the three sinusoid problem considered earlier
with mcmc techniques. We generate three sinusoids of amplitudes 1,3 and
5 with frequencies 0.01,0.03 and 0.05 respectively. The phases for each
of these is zero, 100 time samples are taken with an even spacing of
one second between the samples.Random noise of mean 0 and standard
deviation 1 is added to the sum of the three sinusoids. We use the
programs given above to extract the parameters. The mcmc chains for the
parameters are given here : mcmc chain for amplitude, frequency and phase. All of them converge to the
expected values if one takes into account the fact that a phase of 2pi
is equivalent to a phase of 0.
We then set up a similar problem, only changing the amplitudes to
0.1,0.3 and 0.5 and correspondingly changing the amplitude step size.
The results for this indicate that the program has been unable to
extract the parameters satisfactorily. The chains for amplitude, frequency and phase are given here. We see that
the signals have not converged due to the low signal to noise ratio.
Our next step is to adapt this program to find the probability
distribution for 'm' - the number of sinusoids. We shall try to find
the number of sinusoids 'm' using mcmc techniques.
We inroduce 'm' into the mcmc chains as a parameter. While calculating
probabilities in the MCMC chains, only 'm' sources are considered. We
can do this for example by multiplying the amplitude with a vector
whose first m elememts are 1's and the rest are 0's. At the end of each
iteration of all the parameters, a new candidate value of m is proposed
which would be m plus a random number with mean 0 and standard
deviation 1 which has been rounded off to the nearest integer. We do
this to avoid situations with a non-integer value of m. So, unlike the
other parameters m is a discrete variable.
We set up a similar procedure as in the previous program such that
steps of the mcmc chain for m into regions of higher probability are
always accepted and steps into regions of lower probability are
accepted with a probability. However, for the first few thousand steps,
i.e. for what can be termed the 'burn in' only steps into regions of
higher probability are accepted. The program used for this is given
here - mcmcprogm.m. The function which does
the mcmc chain for m is also given - mcmch.m.
We generated a simulated signal composed of a superposition of three
sinusoids of amplitudes 1,3 & 5 and frequencies 0.01,0.03 &
0.05 respectively. We restricted the values of 'm' - the number of
sinusoids to be between 1 and 5. The probability distribution for m
looked like this. The mcmc chains for the amplitude and frequency
are given here. The red, blue and green coloured chains represent the
first three amplitudes which have converged. The other two chains
represent the fourth and fifth amplitudes. As the most probable value
of m is 3, we need only consider the first three amplitude chains which
correspond to the red, green and blue ones. The other two are in a
sense 'dummy parameters' as they are the parameters for non-existant
signals.
Here , you can find a link to the interim
version of my report. This will be updated whenever possible and I
shall try to have the latest version on the web.