BAYESIAN EVIDENCE AS A DIAGNOSTIC FOR TESTING MODELS OF GRAVITATIONAL WAVE DATA


~ Vimal  Simha  (0204082s@student.gla.ac.uk)
University of Glasgow

The Laser Interferometer Space Antena (LISA) is expected to detect a large number of gravitational wave sources in a frequency range from 0.0001Hz to 0.1Hz. The problem we address is how well the parameters such as the number of sources M, the amplitude A, the frequency f and the phase can be extracted from the data.

First of all, we consider the problem of estimating the number of sinusoids, M. In the document, 'An Introduction to the problem of estimating the number of sinusoids, M' I have provided a brief introduction to the problem and its formulation using Bayesian techniques.

In the document above, we obtained a general expression for the probability of M given the data and any prior information I. Our next step is to generate simulated data and attempt to recover the value of M from it. For our simulated data we take the amplitudes equal to 1,3 and 5 and the frequncies equal to 0.01,0.03 and 0.05 and add random noise of  mean zero and sigma 1. The code we used to generate the simulated data is given here. The time-series and the power spectrum might give more of an insight into the simulated data. We generate 100 time samples evenly spaced with 1 second time intervals between them. We also set the phase equal to zero in order to consider a simpler problem at the outset.

Having generated the simulated data, our next task is to attempt to recover the most probable value of 'M'. At this stage, to simplify the problem even further, we consider a discrete problem i.e. we allow only certain values of M, A and f. The exact details of the formulation can be found in the document 'Formulation of the discrete problem.'

In order to determine the most probable value of M, it is necessary to determine the probabilities for each value of M. In  the above document we derived an expression for the probability of M. We use that expression to calculate the probabilities for each value of M. One of the programs used for this - 'prob3' which calculates the probability that 'M' is three is given here.

With the above formulation of the problem, we worked out the probability distribution for M which you can find here. After this, we regenerated simulated data with different standard deviations of the noise. The probability distribution for sigma=5 and sigma=15  show us that the program is able to extract the
value of M, even for sigma = 15. We also looked at other sets of simulated data such as: 'actual M = 1', 'actual M =2' and data with no signal but just random noise of a particular sigma. ( sigma = 1 sigma = 10)

Having determined the most probable value of M, we proceed to attempt to recover the most probable values of the amplitude A and the frequency f given the value of M. We saw earlier that the probability of a particular set of amplitudes and frequencies was proportional to the exponential of the negative chisquared value for the corresponding set of parameters. In this part we consider only the chisquared values, hence the minimum of the chisquared surface (or the slice through it) will give the most probable value. An additional reason for considering chisquared values rather than probabilities is that we shall have to perform integrations over the chisquared hypersurface, and it may prove helpful to achieve a degree of familiarity with them.

We construct a grid of possible amplitude and frequency values and calculate chisquared values for each set of them. Thus we obtain a 2M dimensional chisquared hypersurface. We look at particular lower-dimensional regions of this hypersurface to determine the minimum values of chisquared in that region and hence the most probable value of the parameter under consideration. It is worth reiterating that the simulated data signal is given by a superposition (sum) of three sinusoids i.e. A1sin(2pif1t)+A2sin(2pif2t)+A3sin(2pif3t).

By considering suitable regions of the chisquared hypersurace, we obtain probability distributions for the three amplitudes - A1, A2 and A3 (the three amplitudes and their associated frequencies are completely symmetric in that swapping A1and f1 with A3and f3 does not change anything) and the three frequencies f1, f2 and f3. We can see that it is easier to determine the frequency than the amplitude, as the chisquared value is more sensitive to a small change in frequency than a corresponding one in amplitude.

We can also look at contours where we consider the probability distributions of two parameters simultaneously. First of all we consider an amplitude-amplitude contour (a1=1 and a2=3)  with all other parameters at their correct values and one with slightly erroneous frequencies. Note that the chisquared values which are given on the contour plots are much larger for the erroneous frequencies although we are unable to delineate the actual amplitudes. We go on to consider an amplitude-frequency contour map and a frequency-frequency one.

So far we have looked at determining particular amplitudes and frequencies given the value of the other parameters. Now, we look at determining the most probable value of a particular amplitude (say A1) or a particular frequency (say f1) by marginalising over all possible values of the other parameters.  However, as was pointed out earlier, there is a symmetry between the elements of the set of amplitudes as well as the elements of the set of frequencies i.e. there is no way of saying which amplitude is A1, which is A2 and so on. This is illustrated by looking at the probability distribution for A1 and f1 with a symmetrical prior.

To resolve this problem, we use a hierarchical prior on the amplitude or frequency. The hierarchical prior on amplitude stipulates that A1>=A2>=A3. This would enable us to delineate three separate amplitudes - A1, A2 and A3 along with their associated frequencies f1, f2 and f3. With the hierarchical prior on amplitude,  we produce the posterior probablity distributions for A1, A2 and A3 and their associated frequencies f1, f2 and f3. Then, we change over to a hierarchical prior on frequency i.e. f1>f2>f3. Note that we are excluding the possibility that two sinusoids may have the same frequency although we are allowing them to have the same amplitude. With the hierarchical prior on frequency we produce probability distributions for f1, f2 and f3 and their associated amplitudes A1, A2 and A3.

As this does not give much information about the preferability or otherwise of each prior, we consider noise of sigma=5 and repeat the above excercise. For the prior on amplitude, we obtain probability distributions for A1, A2, A3, f1, f2 and f3 and for the prior on frequency - A1, A2, A3, f1, f2 and f3. It is evident from these graphs that the prior on frequency may be slightly more preferable than the prior on amplitude. However, the prior on amplitude has the advantage that all the strong signals appear at the beginning and the weaker ones appear towards the end.

After this we hope to extend the above analysis to higher values of 'm', the number of signals. We would also like to consider problems where the signal amplitudes and frequencies are not restricted to particular values as has been the case so far. A comprehensive integration (actually a sum over a particular grid) for such a problem would be beyond the scope of 'present-day' computing power. In order to address such problems, we require Markov chain Monte-Carlo techniques (MCMC for short).

If we proceed as before, we arrive at the expression for the probability of a particular set of parameters, which is the exponential of the chisquared value divided by two. We need to explore this multidimensional probability space in order to find the maximum likelihood or minimum chisquared and explore the space around it. However, as mentioned earlier a comprehensive exploration of this multidimensional space is difficult to achieve and thus the requirement for MCMC.

MCMC makes use of random numbers and is based on the idea of a random walk. From a given point in the probability distribution, a step is taken in a random direction, with the length of the step drawn from a Gaussian distribution with a given mean and standard deviation. If the probability is higher at the new location, then the step is accepted. If the probability is lower, then the step is accepted with a certain probability. After a large number of iterations, we expect the parameters to converge to their 'true' values.

First of all we generate our simulated data set. This would contain a superposition of m sinusoids and random noise added to it. The program used to generate the simulated data is given here. Then, we set up suitable starting values for all the parameters i.e. each of the m amplitudes, frequencies and phases. In general we would use the same starting point for each of the m amplitudes, each of the frequencies and each of the phases. We set the mean of the step size to be zero and the standard deviation to decay exponentially with a lower limit. At the ouset the step size is of the order of the parameter value. But, towards the end the standard deviation of the step size is one-hundredth of the order of the parameters.

We also make another modification of the standard MCMC technique.  We do not allow the program to explore regions of the probability distribution where the probability is lower than its 'current location' for the first few iterations. We expect the program to find the maxima of the probability distribution by this time. After a few thousand iterations, it is allowed to step into regions of lower probability with a certain probability. So, we essentially employ a two step process. In the first step, the program attempts to find the maximum for each of the parameters and in the second it explores the region around the maximum. The programs used for this are mcmc.m and mcmcprog.m.

We attempt to look at the three sinusoid problem considered earlier with mcmc techniques. We generate three sinusoids of amplitudes 1,3 and 5 with frequencies 0.01,0.03 and 0.05 respectively. The phases for each of these is zero, 100 time samples are taken with an even spacing of one second between the samples.Random noise of mean 0 and standard deviation 1 is added to the sum of the three sinusoids. We use the programs given above to extract the parameters. The mcmc chains for the parameters are given here : mcmc chain for amplitude, frequency and phase. All of them converge to the expected values if one takes into account the fact that a phase of 2pi is equivalent to a phase of 0.

We then set up a similar problem, only changing the amplitudes to 0.1,0.3 and 0.5 and correspondingly changing the amplitude step size. The results for this indicate that the program has been unable to extract the parameters satisfactorily. The chains for amplitude, frequency and phase are given here. We see that the signals have not converged due to the low signal to noise ratio.

Our next step is to adapt this program to find the probability distribution for 'm' - the number of sinusoids. We shall try to find the number of sinusoids 'm' using mcmc techniques.

We inroduce 'm' into the mcmc chains as a parameter. While calculating probabilities in the MCMC chains, only 'm' sources are considered. We can do this for example by multiplying the amplitude with a vector whose first m elememts are 1's and the rest are 0's. At the end of each iteration of all the parameters, a new candidate value of m is proposed which would be m plus a random number with mean 0 and standard deviation 1 which has been rounded off to the nearest integer. We do this to avoid situations with a non-integer value of m. So, unlike the other parameters m is a discrete variable.

We set up a similar procedure as in the previous program such that steps of the mcmc chain for m into regions of higher probability are always accepted and steps into regions of lower probability are accepted with a probability. However, for the first few thousand steps, i.e. for what can be termed the 'burn in' only steps into regions of higher probability are accepted. The program used for this is given here - mcmcprogm.m. The function which does the mcmc chain for m is also given - mcmch.m.

We generated a simulated signal composed of a superposition of three sinusoids of amplitudes 1,3 & 5 and frequencies 0.01,0.03 & 0.05 respectively. We restricted the values of 'm' - the number of sinusoids to be between 1 and 5. The probability distribution for m looked like this. The mcmc chains for the amplitude and frequency are given here. The red, blue and green coloured chains represent the first three amplitudes which have converged. The other two chains represent the fourth and fifth amplitudes. As the most probable value of m is 3, we need only consider the first three amplitude chains which correspond to the red, green and blue ones. The other two are in a sense 'dummy parameters' as they are the parameters for non-existant signals.





Here , you can find a link to the interim version of my report. This will be updated whenever possible and I shall try to have the latest version on the web.