# Machine Learning for Compact Binaries

The central goal in analyzing data from gravitational-wave detectors is to determine the properties of the source (e.g., the masses, spins, location, etc., of the binary) that could have given rise to the observed data. This is accomplished with Bayesian inference. Given a *likelihood* $p(d|\theta)$ for data $d$ given system parameters $\theta$, and a *prior* $p(\theta)$ for the parameters, the *posterior* probability is given by
$$
p(\theta|d) = \frac{p(d|\theta)p(\theta)}{p(d)},
$$
where $p(d)$ is a normalizing factor called the *evidence*. The likelihood is the probability that $d = h(\theta) + n$, where $h(\theta)$ is a signal waveform and $n$ is stationary Gaussian noise.

The task of inference is to obtain samples $\theta \sim p(\theta|d)$. This is usually accomplished using an algorithm such as Markov Chain Monte Carlo: one explores the parameter space and compares simulated waveforms to the data. This can be very computationally costly, however, since a single analysis can require millions of waveform simulations, which is especially expensive for waveform models that include the most realistic physics. In addition, as detectors such as LIGO, Virgo, and KAGRA become more sensitive, the rate of detections will increase, and inference must be performed on each event.

The goal of this research project is to train a neural network to learn an *inverse* model for the system parameters given the data. Once trained, this can instantly provide posterior samples for any observed data. We build a “surrogate” for the posterior using so-called
“normalizing flows”, which allow us to represent the complicated posterior distribution in terms of a mapping (depending on the data) from a much simpler distribution,
$$
f_d : u \to \theta,
$$
where $u$ is normally-distributed and of the same dimension as $\theta$. This gives rise to a distribution
$$
q(\theta|d) = \mathcal{N}(0,\mathbb{1})(f_d^{-1}(\theta)) \left| \det J_{f_d^{-1}} \right|,
$$
which we then train to approximate $p(\theta|d)$. To train the network, we use millions of simulated data sets $(\theta^{(i)},d^{(i)}) \sim p(\theta,d)$. We also condition the network on the noise characteristics of the detectors, to account for noise nonstationarity from event to event. Training takes $\approx 1$ week, but then we can perform inference on any event in under a minute. This compares to roughly a day using standard methods.

Quasicircular binary black hole systems are characterized by 15 parameters, and our networks are able to infer all of them simultaneously, with results in very close agreement with standard analysis codes. Additionally, treating these samples as a proposal for importance sampling, we can verify the deep-learning results, correct them if necessary, and obtain an extremely precise evidence estimate.

We are currently working on building a user-friendly code, called “Dingo”, which we plan to make publicly available. Beyond that the next steps are to

Use the code to analyze more real events with the most realistic waveform models, as well as to search for deviations from Einstein’s theory of gravity.

Extend to binary neutron stars, which have much longer waveforms. This is especially important for rapid alerts to standard telescopes, since these events are much more likely to have multimessenger counterpart signals.

Include a treatment of more realistic (nonstationary or non-Gaussian) noise. This is challenging for conventional approaches, but can be done naturally using simulation-based inference methods such as ours. This will lead to more accurate inference results than otherwise possible using standard codes.