Point process modelling of the Afghan War Diary
Welcome to the project page relating to the work published on point process modelling of the WikiLeaks data set, the Afghan War Diary (AWD). The paper titled "Point process modelling of the Afghan War Diary" was published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS) on July 2012. The official link to the paper is here. The main purpose of this page is to supplement the interested reader with (even further) information and provide a link to all the code required to replicate the results presented in the paper. You will need MATLAB 7.9 (2009b) or higher for most of the programs and R software package for the corroboration results. In addition you will frequently require the Matlab Mapping and Statistical toolboxes; in the main files, dependencies are listed at the top of the scripts.
Modern conflicts are characterised by an ever increasing use of information and sensing technology, resulting in vast amounts of high resolution data. Modelling and prediction of conflict, however, remains a challenging task due to the heterogeneous and dynamic nature of the data typically available. Here we propose the use of dynamic spatiotemporal modelling tools for the identification of complex underlying processes in conflict, such as diffusion, relocation, heterogeneous escalation and volatility. Using ideas from statistics, signal processing and ecology, we provide apredictive framework able to assimilate data and give confidence estimates on the predictions. We demonstrate our methods on the WikiLeaks Afghan War Diary. Our results show that the approach allows deeper insights into conflict dynamics and allows a strikingly statistically accurate forward prediction of armed opposition group activity in 2010, based solely on data from previous years.
The work brings together ideas from statistics and spatiotemporal modelling and applies them to what can only be described as a highly complex social system. We start off with a very generic equation, the stochastic integro-diifference equation (IDE), in order to capture the dynamics and, more importantly, the uncertainty present in the spatiotemporal evolution of the conflict scenario. The IDE has been previously used in diverse areas, such as ecology, neuroscience and weather prediction, the popularity attributed to its intuitive appeal and also to the ease with which conventional empirical/fully Bayesian methods may be applied for data-driven inference. Unfortunately we do not exploit the IDE to its full, partly because nonparametric methods did not detect any dynamics (such as diffusion) at the scale we are considering; there is ample evidence that such effects would, however, be present at finer resolutions.
The ensuing model we employ is still able to provide insights into the progression of the conflict in Afghanistan. In particular we estimate parameters relating to the spatially varying growth and volatility of the conflict, (e.g. see figure below) which allow the model to predict, with confidence measures, the evolution of the conflict based on the data in previous years. Interestingly, we show that this uncertainty translates to other characteristics in the conflict scenario. In particular, the predictive distributions of armed opposition group in 2010 (based on data up to 2009) is seen to match closely the distribution of the observed events on a provincial scale. This strongly suggests that, despite the inherent complexity of this social system, uncertainty may be captured extremely well with the use of appropriate stochastic models. For detailed results, please refer to the associated paper.
We employ an approximate Bayesian inference approach, variational Bayes, for inferring parameters in the model. Since the free-form variational posteriors are not of standard form we only propagate the first- and second-order moments through Laplace approximations. The ensuing VB-Laplace method has shown to compare favourably with state-of-the-art MCMC methods. Despite the skewness, the densities associated with LGCPs of this sort are generally well-behaved; other methods such as INLA and EP may be applied with equal effect.
In line with the work of O'Loughlin et al, we have considered several fixed effects, including elevation, gradient, population density, shortest distance to a major city and the shortest distance to the Pakistan border. We evaluated inclusion of these effects by visualising the variation in the intensity of AWD events with the effect; population density and the shortest distance to a major city were subsequently included in the model. Results showed significant association of conflict with both effects (positively and negatively respectively).
All relevant code can be downloaded in a tar.gz or .zip format from the links at the top of this page (download size is 200Mb). Alternatively visit here and download the relevant code bits you are interested in. Code copyright is supplied under the Simplified BSD License, please refer to the license before use and redistribution.
Point process modelling of the Afghan War Diary is part of the recent trend of the utilisation of modelling and statistics for conflict analysis and prediction, see for example the recent paper published in Science by Johnson et al. The origins of the present work date back to July 2010 when Michael Dewar uploaded a video with a spatiotemporal heat map of the conflict intensity in Afghanistan with data extracted from the Afghan War Diary. His work is, in fact, only part of the extensive efforts by @drewconway and company who provided extensive descriptive summaries of the AWD, all of which ultimately proved extremely valuable in bettering our understanding of the data.
The methodoligical aspect of this work is described more extensively in a recently published IEEE paper which in turn is a development of other work in the research group, most prominently this paper and this paper. For an application of the same methodology to a temporal point-process, see this paper.
Significant understanding of the WikiLeaks data set from a political perspective was gained through the work of O'Loughlin et al. who also provided several statistical summaries verified through our modelling approach. All details can be found on his project page; replication material is also provied.
Finally, this work would not have been possible without the availability of the data sets. The WikiLeaks AWD is available in numerous places on the web; now it is also supplements the PNAS paper here. In addition we made use of the ANSO data sets, ACLED data sets and GTD data. We are indebted to the corresponding organisations for making their data freely available to the public.
This project was carried out by:
Please address questions of a technical nature to the first or last author.
This work was supported in part by PASCAL FP7 NoE, and by a studentship from the University of Sheffield to AZ-M. GS is funded by the Scottish Government through the SICSA initiative. VK acknowledges support from the EPSRC Platform Grant EP/H00453X/1.