## Monday

### 9.00-10.00: Opening lecture

Department of Mathematics, University of Oslo, Norway
Hunting high and low

What is high dimensional statistics in 2016? For more than a decade, especially the field of genomics has motivated much research in so called high dimensional statistics, where the number of covariates $$p$$ is much larger than the number of subjects $$n$$. I will present some alterations of lasso like methods with examples from genomics and epigenomics. Furthermore, statisticians in 2016 also have to handle large, complex datasets facing high dimensional challenges other than $$p \gg n$$. Paradoxically, having extremely many observations is not necessarily a blessing, but rather complicates inferential procedures and algorithms. I will briefly illustrate some of these emerging difficulties with experiences from the brand new Oslo research centre BigInsight, where data are huge and inference must be distributed in various ways, inspiring new directions for statistical research.

### 10.15-11.15: Survey A1

Arnaud Doucet
Department of Statistics, Oxford University, United Kingdom
Pseudo-marginal methods for inference in latent variable models

For complex latent variable models, the likelihood function of the parameters of interest cannot be evaluated pointwise. In this context, standard Markov chain Monte Carlo (MCMC) strategies used to perform Bayesian inference can be very inefficient. Pseudo-marginal methods are an alternative class of MCMC methods which rely on an unbiased estimator of the likelihood. These techniques have become popular over the past 5-10 years and have found numerous applications in fields as diverse as econometrics, genetics and machine learning.

In the first part of the talk, I will review the standard pseudo-marginal method, present some applications and provide useful guidelines on how to optimize the performance of the algorithm. In the second part of the talk, I will introduce new pseudo-marginal algorithms which rely on novel low variance Monte Carlo estimators of likelihood ratios. The efficiency of computations is increased relative to the standard pseudo-marginal algorithm by several orders of magnitude.

This is joint work with George Deligiannidis (Oxford) and Michael K. Pitt (Kings' college).

TBA

TBA

### 14.45-17.00: Poster session

See the poster overview for titles and abstracts.

### 17.00-18.00: Topical lecture

Marloes Maathuis
Seminar für Statistik, ETH Zürich, Switzerland
The role of causal modelling in statistics

Causal questions are fundamental in all parts of science. Answering such questions from non-experimental data is notoriously difficult, but there has been a lot of recent interest and progress in this field. I will explain the fundamentals of causal modelling and outline its potential and its limitations. The concepts will be illustrated by several examples.

## Tuesday

### 9.00-10.00: Survey A2

Arnaud Doucet
Department of Statistics, Oxford University, United Kingdom
Pseudo-marginal methods for inference in latent variable models

For complex latent variable models, the likelihood function of the parameters of interest cannot be evaluated pointwise. In this context, standard Markov chain Monte Carlo (MCMC) strategies used to perform Bayesian inference can be very inefficient. Pseudo-marginal methods are an alternative class of MCMC methods which rely on an unbiased estimator of the likelihood. These techniques have become popular over the past 5-10 years and have found numerous applications in fields as diverse as econometrics, genetics and machine learning.

In the first part of the talk, I will review the standard pseudo-marginal method, present some applications and provide useful guidelines on how to optimize the performance of the algorithm. In the second part of the talk, I will introduce new pseudo-marginal algorithms which rely on novel low variance Monte Carlo estimators of likelihood ratios. The efficiency of computations is increased relative to the standard pseudo-marginal algorithm by several orders of magnitude.

This is joint work with George Deligiannidis (Oxford) and Michael K. Pitt (Kings' college).

### 10.15-11.15: Bayesian Statistics (I1)

Organizer: Jukka Corander

Department of Mathematics, KTH, Stockholm, Sweden

We give a new prior distribution over directed acyclic graphs intended for structured Bayesian networks, where the structure is given by an ordered block model. That is, the nodes of the graph are objects which fall into categories or blocks; the blocks have a natural ordering or ranking. The presence of a relationship between two objects is denoted by a directed edge, from the object of category of lower rank to the object of higher rank. Hoppe's urn scheme is invoked to generate a random block scheme.

The prior in its simplest form has three parameters that control the sparsity of the graph in two ways; implicitly in terms of the maximal directed path and explicitly by controlling the edge probabilities.

We consider the situation where the nodes of the graph represent random variables, whose joint probability distribution factorizes along the DAG.

We use a minimal layering of the DAG to express the prior. We describe Monte Carlo schemes, with a similar generative that was used for prior, for finding the optimal a posteriori structure given a data matrix.

This is joint work with John M. Noble and Felix Rios.

Department of Mathematical Sciences, Norwegian University of Science and Technology, Norway

Discrete Markov random fields (MRFs) defined on a rectangular lattice are frequently used as prior distributions in image analysis applications. A few articles, for example the early Heikkinen and Högmander (1994) and Higdon et al. (1997), and the more recent Friel et al. (2009) and Everitt (2012), have also considered a corresponding fully Bayesian situation by assigning a hyper-prior to the parameters of the discrete MRF. However, in these articles a fixed first-order neighbourhood and a fixed parametric form for the MRF are assumed.

In this presentation we limit the attention to binary MRFs and discuss the fully Bayesian setting introduced in Arnesen and Tjelmeland (2016). We assign prior distribution to all parts of the MRF specification. In particular we define priors for the neighbourhood structure of the MRF, what interactions to include in the model, and for the parameter values. We consider two parametric forms for the energy function of the MRF, one where the parameters represent interaction strengths and one where the parameters are potential values. Both parameterisations have important advantages and disadvantages, and to combine the advantages of both formulations our final prior formulation is based on both parametrisations. The prior for the neighbourhood and what interactions to include in the MRF is based on the parameterisation using interaction strengths, whereas the prior for the parameter values is based on the parameterisation where the parameters are potential values.

We define a reversible jump Markov chain Monte Carlo (RJMCMC) procedure to simulate from the corresponding posterior distribution when conditioned to an observed scene. Thereby we are able to learn both the neighbourhood structure and the parametric form of the MRF from the observed scene. In particular we learn whether a pairwise interaction model is sufficient to model the scene of interest, or whether a higher-order interaction model is preferable. We circumvent evaluations of the intractable normalising constant of the MRF when running the RJMCMC algorithm by adopting a previously defined approximate auxiliary variable algorithm. We demonstrate the usefulness of our prior in two simulation examples and one real data example.

References
Arnesen and Tjelmeland (2016). Prior specification of neighbourhood and interaction structure in binary Markov random fields, Statistics and Computing.
Everitt, R. G. (2012). Bayesian parameter estimation for latent Markov random fields and social networks, Journal of Computational and Graphical Statistics, 21, 940–960.
Friel, N. and Rue, H. (2007). Recursive computing and simulation-free inference for general factorizable models, Biometrika, 94, 661–672.
Heikkinen, J. and Högmander, H. (1994). Fully Bayesian approach to image restoration with an application in biogeography, Applied Statistics, 43, 569–582.
Higdon, D. M., Bowsher, J. E., Johnsen, V. E., Turkington, T. G., Gilland, D. R. and Jaszczak, R. J. (1997). Fully Bayesian estimation of Gibbs hyperparameters for emission computed tomography data, IEEE Transactions on Medical Imaging, 16, 516–526.

Department of Mathematics and Statistics, University of Helsinki, Finland

Likelihood-free inference, ABC and synthetic likelihood have recently been popularized as techniques for inferring parameters in intractable simulator-based models. In this talk we consider how various machine learning methods can provide ways to both speed-up the computation and to quantify the approximate likelihood in a consistent manner. Examples from ecology and infectious disease epidemiology are used to illustrate the use of machine learning in applications.

### 10.15-11.15: Stochastic Processes (I2)

Organizer: Mark Podolskij

Aarhus University, Denmark

We introduce a simulation scheme for a large class of rough processes called Brownian semistationary processes. The scheme is based on discretizing the stochastic integral representation of the process in the time domain. We assume that the kernel function of the process is regularly varying at zero. The novel feature of the scheme is to approximate the kernel function by a power function near zero and by a step function elsewhere. The resulting approximation of the process is a combination of Wiener integrals of the power function and a Riemann sum, which is why we call this method a hybrid scheme. The scheme leads to a substantial improvement of accuracy compared to the ordinary forward Riemann-sum scheme, while having the same computational complexity.

This is joint work with Asger Lunde and Mikko S. Pakkanen.

Department of Mathematical Sciences, University of Copenhagen, Denmark

We consider multi-class systems of interacting nonlinear Hawkes processes (Hawkes, 1971) modeling several large families of neurons and study their mean field limits. As the total number of neurons goes to infinity we prove that the evolution within each class can be described by a nonlinear limit differential equation driven by a Poisson random measure, and state associated central limit theorems. We study situations in which the limit system exhibits oscillatory behavior, and relate the results to certain piecewise deterministic Markov processes and their diffusion approximations. The motivation for this paper comes from the rhythmic scratch like network activity in the turtle, induced by a mechanical stimulus, and recorded and analyzed by Berg and co-workers (Berg et al., 2007). Oscillations in a spinal motoneuron are initiated by the sensory input, and continues by some internal mechanisms for some time after the stimulus is terminated. While mechanisms of rapid processing are well documented in sensory systems, rhythm-generating motor circuits in the spinal cord are poorly understood. The activation leads to an intense synaptic bombardment of both excitatory and inhibitory input, and it is of interest to characterize such network activity, and to build models which can generate self-sustained oscillations. Generally, biological rhythms are ubiquitous in living orgamisms. The brain controls and helps maintain the internal clock for many of these rhythms, and fundamental questions are how they arise and what is their purpose. Many examples of such biological oscillators can be found in the classical book by Glass and Mackey (1988).

The talk is based on the paper Ditlevsen and Löcherbach (2016).

References
Hawkes, A. G. (1971), Spectra of Some Self-Exciting and Mutually Exciting Point Processes. Biometrika, 58, 83-90.
Berg, R.W., Alaburda, A., Hounsgaard, J. (2007). Balanced Inhibition and Excitation Drive Spike Activity in Spinal Half-Centers. Science 315, 390-393.
Glass, L., Mackey, M.C. (1988). From Clocks to Chaos: The Rhythms of Life. Princeton University Press.
Ditlevsen, S., Löcherbach, E. (2016). Multi-class oscillating systems of interacting neurons.

Department of Mathematics, Aarhus University, Denmark

In this talk we present some new limit theorems for power variation of of stationary increments Levy driven moving averages. In this infill sampling setting, the asymptotic theory gives very surprising results, which (partially) have no counterpart in the theory of discrete moving averages. More specifically, we will show that the first order limit theorems and the mode of convergence strongly depend on the interplay between the given order of the increments, the considered power, the Blumenthal-Getoor index of the driving pure jump Levy process and the behaviour of the kernel function near zero. First order asymptotic theory essentially comprises three cases: stable convergence towards a certain infinitely divisible distribution, an ergodic type limit theorem and convergence in probability towards an integrated random process. We also prove the second order limit theorem connected to the ergodic type result.

### 11.30-12.30: Keynote Lecture

Richard Samworth
Statistical Laboratory, University of Cambridge, United Kingdom
Random projection ensemble classification

We introduce a very general method for high-dimensional classification, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower-dimensional space. In one special case that we study in detail, the random projections are divided into non-overlapping blocks, and within each block we select the projection yielding the smallest estimate of the test error. Our random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment. Our theoretical results elucidate the effect on performance of increasing the number of projections. Moreover, under a boundary condition implied by the sufficient dimension reduction assumption, we show that the test excess risk of the random projection ensemble classifier can be controlled by terms that do not depend on the original data dimension. The classifier is also compared empirically with several other popular high-dimensional classifiers via an extensive simulation study, which reveals its excellent finite-sample performance.

### 13.30-14.30: Forensic Statistics (I3)

Organizer: Anders Nordgaard

School of Criminal Justice, The University of Lausanne; Department of Economics, Ca’ Foscari University of Venice

In forensic science, statistical methods are largely used for assessing the probative value of scientific evidence. The evaluation of measurements on characteristics associated to trace evidence is performed through the derivation of a Bayes factor, a rigorous concept that provides a balanced measure of the degree to which evidence is capable of discriminating among competing propositions that are suggested by opposing parties at trial. The assessment of a Bayes factor may be a demanding task, essentially because of the complexity of the scenario at hand and the possible poor informations at the forensic scientist's disposal. Moreover, forensic laboratories have frequently access to equipment which can readily provide scientific evidence in the form of multivariate data, and available databases may be characterised by a complex dependence structure with several levels of variation and a large number of variables. One of the criticisms that is levelled against the use of multivariate techniques in forensic science is the lack of background data from which to estimate parameters and several attempts have been proposed to achieve a dimensionality reduction. Clearly, any statistical methodology which leads to a reduction of the multivariate structure to fewer or even only one dimension need careful justification in order to avoid the challenge of suppression of evidence. Bayesian multilevel models for the evaluation of multivariate measurements on characteristics associated to questioned material that are capable to deal with such constraints (e.g., correlation between variables and multiple sources of variation) may be proposed in various forensic domains. Numerical procedures may be implemented to handle the complexity and to compute the marginal likelihoods under competing propositions. This, along with the acknowledgement of subjective evaluations that are unavoidably involved in the Bayes factor assignment, has originated a large debate in the forensic community about its admissibility at trial. These ideas will be illustrated with reference to handwriting examination, a forensic discipline that attracts nowadays considerable attention due to its uncertain status under new admissibility standards.

Norwegian University of Life Sciences, Ås, Norway

In forensic genetics one commonly encounters biological stains that contain DNA from several individuals, so called mixture DNA profiles, where the goal is to identify the individual contributors. One example is a rape case where an evidence sample shows a mixture of the victim and the perpetrator. A further complication is when the individuals in the mixture are also related, because they are likely to have a more similar individual DNA profile than unrelated individuals. Disregarding this relationship may lead to an overestimation of the evidence against the suspect. In other cases one may wish to determine the relationship between individuals based on a DNA mixture. An example is prenatal paternity testing, where the father of a child is determined based on a blood sample from the mother that reveals a DNA mixture of mother and child. We will look at how the weight of evidence can be estimated for DNA mixtures with related contributors. There is a long tradition of statistics in (forensic) genetics and the talk will present and discuss statistical models for the mentioned applications. Stochastic models are becoming increasingly relevant as methods are getting more sensitive and distinguishing noise from signal more challenging.

Swedish Police Authority and National Forensic Centre & Colin Aitken, School of Mathematics, University of Edinburgh

The percentage of the narcotic substance in a drug seizure may vary a lot depending on when and from whom the seizure was taken. Seizures from a typical consumer would in general show low percentages, while seizures from the early stages of a drug dealing chain would show higher percentages (these will be diluted). Historical records from the determination of the percentage of narcotic substance in seized drugs reveal that the mean percentage but also the variation of the percentage can differ substantially between years. Some drugs show close to monotonic trends while others are more irregular in the temporal variation. Legal fact finders must have an up-to-date picture of what is an expected level of the percentage and what levels are to be treated as unusually low or unusually high. This is important for the determination of the sentences to be given in a drug case. In this work we treat the probability distribution of the percentage of a narcotic substance in a seizure from year to year as a time series of functions. The functions are probability density functions of beta distributions, which are successively updated with the use of point mass posteriors for the shape parameters. The predictive distribution for a new year is a weighted sum of beta distributions for the previous years where the weights are found from forward validation.

### 13.30-14.30: Bayesian Computation (I4)

University of Iceland, Iceland

We introduce a new copula-based correction for generalized linear mixed models (GLMMs) within the integrated nested Laplace approximation (INLA) approach for approximate Bayesian inference for latent Gaussian models. While INLA is usually very accurate, some (rather extreme) cases of GLMMs with e.g. binomial or Poisson data have been seen to be problematic. Inaccuracies can occur when there is a very low degree of smoothing or “borrowing strength” within the model, and we have therefore developed a correction aiming to push the boundaries of the applicability of INLA. Our new correction has been implemented as part of the R-INLA package, and adds only negligible computational cost. Empirical evaluations on both real and simulated data indicate that the method works well.

This is joint work with Håvard Rue (NTNU).

University of Iceland, Iceland

Latent Gaussian models (LGMs) form a flexible subclass of Bayesian hierarchical models and have become popular in many areas of statistics and various fields of applications, as LGMs are both practical and readily interpretable. Although LGMs are well suited from a statistical modeling point of view their posterior inference becomes computationally challenging when latent models are desired for more than just the mean structure of the data density function; or when the number of parameters associated with the latent model increases.

We propose a novel computationally efficient Markov chain Monte Carlo (MCMC) scheme which serves to address these computational issue, we refer to as the MCMC split sampler. The sampling scheme is designed to handle LGMs where latent models are imposed on more than just the mean structure of the likelihood; to scale well in terms of computational efficiency when the dimensions of the latent models increase; and to be applicable for any choice of a parametric data density function. The main novelty of the MCMC split sampler lies in how the model parameters of a LGM are split into two blocks, such that one of the blocks exploits the latent Gaussian structure in a natural way and becomes invariant of the data density function.

This is joint work with Birgir Hrafnkelsson (University of Iceland), Helgi Sigurðarson (University of Iceland) and Daniel Simpson (University of Bath).

University of Stavanger, Norway

We consider Particle Gibbs (PG) as a tool for Bayesian analysis of non-linear non-Gaussian state-space models. PG is a Monte Carlo (MC) approximation of the standard Gibbs procedure which uses sequential MC (SMC) importance sampling inside the Gibbs procedure to update the latent and potentially high-dimensional state trajectories. We propose to combine PG with a generic and easily implementable SMC approach known as Particle Efficient Importance Sampling (PEIS). By using SMC importance sampling densities which are closely globally adapted to the targeted density of the states, PEIS can substantially improve the mixing and the efficiency of the PG draws from the posterior of thestates and the parameters relative to existing PG implementations. The efficiency gains achieved by PEIS are illustrated in PG applications to a stochastic volatility model for asset returns and a Gaussian nonlinear local level model for interest rates.

This is joint work with Oliver Grothe (Karlsruhe Institute of Technology) and Roman Liesenfeld (University of Cologne).

## Wednesday

### 9.00-10.00: SJS Lecture

Jonathan Taylor
Department of Statistics, Stanford University, USA
Selective inference in linear regression

We consider inference after model selection in linear regression problems, specifically after fitting the LASSO (Lee et al.). A classical approach to this problem is data splitting, using some randomly chosen portion of the data to choose the model and the remaining data for inference in the form of confidence intervals and hypothesis tests. Viewing this problem in the framework of selective inference of (Fithian et al.), we describe a few other randomized algorithms with similar guarantees to data splitting, at least in the parametric setting (Tian and Taylor). Time permitting, we describe analogous results from (Tian and Taylor) for arbitrary statistical functionals obeying a CLT in the classical fixed dimensional setting and inference after choosing a tuning parameter by cross-validation.

References
Lee et al. Exact post-selection inference, with application to the LASSO.
Fithian et al. Optimal Inference After Model Selection.
Tian and Taylor. Selective inference with a randomized response.

### 10.15-11.15: Current trends in observational studies in epidemiology (I5)

Organizer: Niels Richard Hansen

Section of Biostatistics, University of Copenhagen, Denmark

Low front-end cost and rapid accrual make web-based surveys and enrollment in studies attractive. Participants are often self-selected with little reference to a well-defined study base. Of course, high quality studies must be internally valid (validity of inferences for the sample at hand), but web-based sampling reactivates discussion of the nature and importance of external validity (generalization of within-study inferences to a target population or context) in epidemiology. A classical epidemiological approach would emphasize representativity, usually conditional on important confounders. An alternative view held by influential epidemiologists claims that representativity (in a narrow sense) is irrelevant for the scientific nature of epidemiology. Against this background, it is a good time for statisticians to take stock of our role and position regarding surveys and observational research in epidemiology. The central issue is whether conditional effects in the study population may be transported to desired target populations. This will depend on the compatibility of causal structures in study and target populations, and will require subject matter considerations in each concrete case. Statisticians, epidemiologists and survey researchers should work together to develop increased understanding of these challenges and improved tools to handle them.

References
Keiding, N. & Louis, T.A. (2016). Perils and potentials of self-selected entry to epidemiological studies and surveys (with discussion). J.Roy.Statist.Soc. A 179, 319-376.

Section of Biostatistics, University of Copenhagen, Denmark

Martinussen et al. (2015) used semiparametric structural cumulative failure time model and instrumental variables (IV) to estimate causal exposure effects for survival data. They impose no restrictions on the type of the instrument nor on the exposure variable. Furthermore their method allows for nonparametric estimation of possible time changing exposure effect. In this work we extend the methods of Martinussen et al. (2015) to handle competing risk data. Such data are very common in practice when studying the timing of initiation of a specific disease since death will often be a competing event. Also when studying death due to a specific cause, such as death from breast cancer as was of interest in the HIP-study, death from any other cause is a competing event. The HIP- study comprises approximately 60000 women and in the first 10 years of follow-up there are 4221 deaths, but only 340 were deemed due to breast cancer. Hence, competing risks is a major issue in these data. Due to non-compliance it is not straightforward to estimate the screening effect. Randomization can, however, be used as an IV and, hence, for these data it is pertinent to have IV-methods for competing risk data to learn about the causal effect of breast cancer screening on the risk of dying from breast cancer.

This is joint work with Stijn Vansteelandt (Ghent University).

References
Martinussen, T., ... (2015).

### 10.15-11.15: Spatial Statistics (I6)

Organizer: David Bolin

University of Bath, United Kingdom

The EUSTACE project will give publicly available daily estimates of surface air temperature since 1850 across the globe for the first time by combining surface and satellite data using novel statistical techniques. Designing and estimating a realistic stochastic model that can realistically capture the multiscale statistical behaviour of air temperature across a wide range of time-scales is not only a modelling challenge, but also a computational challenge. Existing methods for spatial statistics need to be scaled up to handle a large quantity of non-Gaussian data, as well as to properly quantify the uncertainty of the temperature reconstructions in regions and time periods with small quantities of data.

Norwegian University of Science and Technology, Norway

Gaussian random fields (GRFs) are important building blocks in hierarchical models for spatial data, but their parameters typically cannot be consistently estimated under in-fill asymptotics. Even for stationary Matérn GRFs, the posteriors for range and marginal variance do not contract and for non-stationary models there is a high risk of overfitting the data. But, despite this, there is no practically useful, principled approach for selecting the prior on their parameters, and the prior typically must be chosen in an ad-hoc manner.

We propose to construct priors such that simpler models are preferred, i.e. shrinking stationary GRFs towards infinite range and no effect, and shrinking non-stationary GRFs towards stationary GRFs. We use the recent Penalised Complexity prior framework to construct a practically useful, tunable, weakly informative joint prior on the range and the marginal variance for a Matérn GRF with fixed smoothness, and then extend the prior to non-stationary controlled by covariates in the covariance structure. We apply the priors to a dataset of annual precipitation in southern Norway and show that the scheme for selecting the hyperparameters of the non-stationary extension leads to improved predictive performance over the stationary model.

University of Gothenburg, Sweden

Developing models for multivariate spatial data has been an active research area in recent years. However, most research has focused on Gaussian models and there are few practically useful methods for multivariate non-Gaussian geostatistical data. We present a new class of multivariate Matérn random fields, constructed as solutions to systems of stochastic partial differential equations driven by generalized hyperbolic noise. The fields have flexible marginal distributions and are suitable for problems where Gaussianity cannot be assumed for one or more of the dimensions in the data. The model parameters can be estimated efficiently using a likelihood-based method, also when the fields are incorporated in a geostatistical setting with irregularly spaced observations, measurement errors, and covariates. Finally, a comparison with standard Gaussian models is presented for an application to precipitation data.

### 11.30-12.30: Causal Inference (I7)

Organizer: Kjetil Røysland

Department of Biostatistics, University of Oslo, Norway

Multi-state models, as an extension of traditional models in survival analysis, have proved to be a flexible framework for analysing transitions between various states of sickness absence and work over time using data from national registries. A main aim in sickness absence research is to identify the effects of possible interventions, e.g. with respect to increased work participation. When data on the important confounders are available, either through the same registry data or through linkage with other data sources, we have suggested using methods based on inverse probability weighting or g-computation for identifying such effects.

G-computation, or applying the G-formula, is a very flexible approach for identifying marginal treatment effects in multi-state models. It is closely related to traditional ways of making inference from these types of models, but can also be extended to cover a wide set of exposures and confounding situations, such as more intricate treatment regimes and time-dependent confounding. In this talk we will discuss two ways of performing g-computation in multi-state models; one based on intervening on transition intensities and one based on intervening on additional covariates, and how these can be equivalent given certain specifications of the multi-state models. We will discuss pros and cons of applying the G-formula in multi-state models and how the simple implementations can be extended to address more advanced causal questions.

The methods will be illustrated using Norwegian population-wide registry data on sickness absence, disability and work participation, coupled with data from other registries.

Department of Biostatistics, University of Oslo, Norway

A wide range of associations in medicine are claimed to be paradoxical. Many of these associations, however, may have plausible explanations. Causal diagrams are often used to argue that counter-intuitive associations are examples of selection bias. These diagrams do not generally allow to explore the direction and magnitude of the bias. For real life analyses, a numeric evaluation of the bias may be essential.

By combining causal DAGs and quantitative frailty models, we improve the understanding of counter-intuitive associations in epidemiology. First, we consider treatments that are examined over time, and we point to a time-dependent Simpson's paradox. For example, we show that a treatment with constant effect can appear beneficial at time $$t=0$$, but harmful at $$t>0$$. Then, we explore a competing risks setting, where being at increased risk of one event may falsely reduce the risk of another event. Finally, we reveal spurious effects that appear in studies of a diseased population (index-event studies). In particular, analyses estimating the effect of a risk factor (e.g. obesity) on an outcome (e.g. mortality) in a population with a chronic disease (e.g. kidney failure), will be prone to the index-event bias.

Our examples show that frailty will lead to bias in common medical scenarios. It is important that applied researchers recognize these spurious associations. A numerical evaluation of the frailty bias will often be appropriate.

Department of Biostatistics, University of Oslo, Norway

Survival analysis has become one of the fundamental fields of biostatistics. Such analyses are almost always subject to censoring. This necessitates special statistical techniques and forces statisticians to think more in terms of stochastic processes. The theory of stochastic integrals and martingales have therefore been important for the development of such techniques.

Causal inference has lately had a huge impact on how statistical analyses based on non-experimental data are done. The idea is to use data from a non-experimental scenario that could be subject to several spurious effects and then fit a model that would govern the frequencies we would have seen in a related hypothetical scenario where the spurious effects are eliminated. This opens up for using the Nordic health registries to answer new and more ambitious questions. However, there has not been so much focus on causal inference based time-to-event data or survival analysis.

The now well established theory of causal Bayesian networks is for instance not suitable for handling such processes. Motivated by causal inference event-history data from the health registries, we introduce causal local independence models. We show that they offer a generalization of causal Bayesian networks that also enables us to carry out causal inference based on non-experimental data when there is continuous-time processes involved.

The main purpose of this work is to provide new tools for determining the identifiability of causal effects in a dynamic context. We provide criteria based on local independence graphs for identifiability of causal effects. Typically, one can develop graphical criteria for when unmeasured processes disturb a statistical analysis, or when these can safely be ignored. This is done by combining previous work on local independence graphs and $$\delta$$-separation by Vanessa Didelez and previous work on causal inference for counting processes by Kjetil Røysland.

### 11.30-12.30: Big data in smart cities (I8)

Department of Applied Mathematics and Computer Science, DTU, Lyngby, Denmark

Volume, Velocity, Variety, Veracity. These terms are commonly heard when one hears about Big Data. However, most of the participants at Nordstat will not feel uncomfortable by hearing these terms. On the contrary, we are trained to tackle these kinds of problems, so why the big fuss? During this talk I will try to convene experience gained through a sector development project targeted at addressing a societal challenge. Other facets of Big Data pose challenges which necessitate involvement of other competences and skills than those of statistics and data analysis.

Department of Applied Mathematics and Computer Science, DTU, Lyngby, Denmark

This presentation will briefly present the some of the results and methods obtained within the DSF (Danish Council for Strategic Research) Center for IT-Intelligent Energy Systems in cities (CITIES). Using big data analytics and methods for stochastic optimization - including stochastic control - the scientific objective of CITIES is to develop methodologies and ICT solutions for the analysis, operation and development of fully integrated urban energy systems. A holistic research approach will aim at providing solutions at all levels between the households and the global energy system at all essential temporal and spatial scales. The societal objective is to harvest the power of big data analytics to identify and establish realistic pathways to ultimately achieving independence of fossil fuels by harnessing the latent flexibility of the energy system through big data analytics, IT-intelligence, integration and planning.

### 13.30-14.30: Statistical Theory (I9)

Organizer: Nils Lid Hjort

Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim

The point of departure of the talk is an algorithm for sampling from conditional distributions given a sufficient statistic. In certain cases this can be done by a simple parameter adjustment of the original statistical model [1], but in general one needs to use a weighted sampling scheme [2]. The trick is to introduce a distribution on the parameter space, where the use of improper distributions is seen to have some advantages. As will be demonstrated, the approach is closely related to the problem of sampling from posterior distributions, and is also connected to fiducial inference [4, 5]. Particular emphasis will be given to the role of improper distributions, where a theoretical framework that includes improper laws will be briefly reviewed [3, 6].

This is joint work with Gunnar Taraldsen.

References
[1] Engen, S., & Lillegård, M. (1997). Stochastic simulations conditioned on sufficient statistics. Biometrika, 84(1), 235-240.
[2] Lindqvist, B. H., & Taraldsen, G. (2005). Monte Carlo conditioning on a sufficient statistic. Biometrika, 92(2), 451-464.
[3] Taraldsen, G., & Lindqvist, B. H. (2010). Improper priors are not improper. The American Statistician, 64(2), 154-158.
[4] Taraldsen, G., & Lindqvist, B. H. (2013). Fiducial theory and optimal inference. The Annals of Statistics, 41(1), 323-341.
[5] Taraldsen, G., & Lindqvist, B. H. (2015). Fiducial and posterior sampling. Communications in Statistics - Theory and Methods, 44(17), 3754-3767.
[6] Taraldsen, G., & Lindqvist, B. H. (2016). Conditional probability and improper priors. Communications in Statistics - Theory and Methods, (to appear).

Department of Mathematics, University of Oslo, Norway

I introduce a method for melding together the classic parametric likelihood with the nonparametric empirical likelihood, and work out theory for such schemes. The method is really a class of methods, as the statistician would need to choose both which extra parameters to include in the construction and a certain balance parameter that weighs parametrics against nonparametrics. I will show how the methodology of focused information criteria may be used to aid in these choices.

### 13.30-14.30: Functional Data Analysis (I2)

Organizer: Sara Sjöstedt de Luna

Department of Psychology, McGill University, Montreal, Canada

Discrete observations of curves are often smoothed by attaching a penalty to the error sum of squares, and the most popular penalty is the integrated squared second derivative of the function that fits the data. But it has been known since the earliest days of smoothing splines that, if the linear differential operator $$D^2$$ is replaced by a more general differential operator $$L$$ that annihilates most of the variation in the observed curves, then the resulting smooth has less bias and greatly reduced mean squared error.

This talk will show how we can use the data to estimate such a linear differential operator for a system of one or more variables. The differential equations estimated in this way represents the dynamics of the processes being estimated. This idea can be used to estimate a forcing function that defines the output of a linear system, and apply this to handwriting data to show that both the static and dynamic aspects of handwriting are well represented by a surprisingly simple second order differential equation.

MOX, Department of Mathematics, Politecnico di Milano, Italy

Object Oriented Spatial Statistics (O2S2) adresses a variety of application-oriented statistical challenges where the atoms of the analysis are complex data points spatially distributed. The object oriented viewpoint consists in considering as building block of the analysis the whole data point, whether it is a curve, a distribution or a positive definite matrix, regardless of its complexity. When data are observed over a spatial domain, an extra layer of complexity derives from the size, the shape or the texture of the domain, posing a challenge related to the impossibility, both theoretical and practical, of employing approaches based on global models for capturing spatial dependence. A powerful non-parametric line of action is obtained by splitting the analysis along an arrangement of neighborhoods generated by a random decomposition of the spatial domain. The local analyses produce auxiliary new data points which are then aggregated to generate the global final result. I will illustrate these ideas with a few examples where the target analysis is dimensional reduction, classification or prediction.

The talk is based on discussions and work developed at MOX, Department of Mathematics, Politecnico di Milano, with Alessandra Menafoglio, Simone Vantini and Valeria Vitelli (the latter is now at the Oslo Center for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo) and with Konrad Abramowicz, Per Arnqvist and Sara Sjöstedt de Luna at the Department of Mathematics and Mathematical Statistics, Umeå University.

References
Abramowicz K., Arnqvist P., Secchi P., Sjöstedt de Luna S., Vantini S., Vitelli V. Clustering misaligned dependent curves - applied to varved lake sediment for climate reconstruction, Manuscript, 2015.
Menafoglio A. and Secchi P. Statistical analysis of complex and spatially dependent data: a review of Object Oriented Spatial Statistics, Manuscript, 2016.
Secchi, P., Vantini, S., and Vitelli, V. Bagging Voronoi classifiers for clustering spatial functional data. International Journal of Applied Earth Observation and Geoinformation, 22, 53-64, 2012.
Secchi, P., Vantini, S., and Vitelli, V. Analysis of spatio-temporal mobile phone data: a case study in the metropolitan area of Milan (with discussion). Statistical Methods and Applications, 24(2), 279-300, 2015.

### 14.45-15.45: Systems Biology (I11)

Organizer: Carsten Wiuf

School of Mathematics & Statistics, University of Newcastle, United Kingdom

Performing inference for the parameters governing the Markov jump process (MJP) representation of a stochastic kinetic model, using data that may be incomplete and subject to measurement error, is a challenging problem. Since transition probabilities are intractable for most processes of interest yet forward simulation is straightforward, fully Bayesian inference typically proceeds through a "likelihood-free" particle MCMC (pMCMC) scheme. In this talk, we describe a recently proposed approach that exploits the tractability of an approximation to the MJP to reduce the computational cost of the algorithm, whilst still targeting the correct posterior. We also demonstrate that it is possible to improve the statistical efficiency of the vanilla implementation by replacing draws from the forward simulator with those obtained from an approximation to the MJP, conditioned on the observations. We illustrate each approach using toy models of gene expression and predator-prey dynamics.

German Center for Neurodegenerative Diseases, Bonn, Germany

Networks play a central conceptual role in molecular biology. Molecular networks are often conceived of as encoding causal influences. Then, the task of estimating such networks from data is effectively one of causal discovery. In recent years there has been much innovative methodological work in this area. However, our understanding of empirical performance remains limited in important ways and it remains unclear whether estimation of causal molecular networks is really effective in practice, particularly in relatively complex biomedical settings. I will discuss some aspects of estimation of causal networks in molecular biology, with particular emphasis on the empirical assessment of candidate estimators.

### 14.45-15.45: Applied Probability (I12)

Center for Bioinformatics, Aarhus University, Denmark

In this talk I describe statistical methods and recent results for inferring variability in population size in humans and other species. Variability in population size has traditionally been inferred from the site frequency spectrum, but in the past few years methods based on complete genome sequences have been developed. These new methods are based on a Markov approximation along the sequences of the ancestral process with mutation and recombination. The approximation means that the ancestral process simplifies to a state space model, and we use particle filtering to carry out statistical inference. I also describe how to perform model checking, and draw connections between various mutation pattern summaries and methods from spatial statistics.

Deparment of Mathematical Sciences, University of Copenhagen, Denmark

New simple methods of simulating multivariate diffusion bridges are presented. Diffusion bridge simulation plays a fundamental role in simulation-based likelihood inference for stochastic differential equations. By a novel application of classical coupling methods, the new approach generalizes the one-dimensional bridge-simulation method proposed by Bladt and Sørensen (2014) to the multivariate setting. A method of simulating approximate, but often very accurate, diffusion bridges is proposed. These approximate bridges are used as proposal for easily implementable MCMC algorithms that produce exact diffusion bridges. The new method is more generally applicable than previous methods because it does not require the existence of a Lamperti transformation, which rarely exists for multivariate diffusions. Another advantage is that the new method works well for diffusion bridges in long intervals because the computational complexity of the method is linear in the length of the interval. The lecture is based on joint work presented in Bladt, Finch and Sørensen (2016).

The lecture is based on joint work with M. Bladt and S. Finch.

References
Bladt, M. and Sørensen, M. (2014). Simple simulation of diffusion bridges with application to likelihood inference for diffusions. Bernoulli, 20, 645-675.
Bladt, M., Finch, S. and Sørensen, M. (2016). Simulation of multivariate diffusion bridges. J. Roy. Statist. Soc. B, 78, 343-369.

Department of Mathematical Sciences, University of Copenhagen, Denmark

In this talk we propose a class of genuinely heavy-tailed distributions which are mathematically tractable in the sense that we can obtain either closed form formulas and/or exact solutions in applications. The class, NPH, is based on infinite-dimensional phase-type distributions with finitely many parameters. Though the class of finite-dimensional phase-type distributions is dense in the class of distributions on the positive reals, and may hence approximate any such distribution, a finite dimensional approximation will always be light–tailed. This may be a problem when the functionals of interest are tail dependent such as e.g. a ruin probability.

A characteristic feature of distributions from NPH is that the formulas from finite–dimensional phase–type theory remain valid even in the infinite dimensional setting. The numerical evaluation of the infinite– dimensional formulas however differ from the finite-dimensional theory, and we shall provide algorithms for the numerical calculation of certain functionals of interest, such as e.g. the renewal density and a ruin probability.

We present an example from risk theory where we compare ruin probabilities for a classical risk process with Pareto distributed claim sizes to the ones obtained by approximating the Pareto distribution by an infinite–dimensional hyper–exponential distribution.

### 16.00-17.00: Survey B1

Holger Rootzén
Mathematical Sciences, Chalmers and Gothenburg University, Sweden
Extreme value statistics: from one dimension to many

Extreme value statistics helps protect us from devastating waves, floods, windstorms, and landslides. It is widely used for risk management in finance and insurance, and contributes to material science, bioinformatics, medicine, and traffic safety.

The first talk introduces the well-established and widely used statistical theory for extremes of one-dimensional variables. Topics include the block maxima and peaks over thresholds methods; asymptotic (tail) independence; threshold choice; maximum likelihood methods; and model diagnostics. The theory is illustrated by examples from climate change statistics, insurance, metal fatigue, gene estimation, and driver inattention.

In the second talk I survey some of the intensive research in multivariate extreme value statistics which happens right now. Multivariate block maxima methods have so far seen the most development, and the methods have already been directed at important societal problems from hydrology. However, in more than one dimension, block maxima hide information of whether extremes occur at the same time or not, and likelihoods often become unwieldy in dimensions higher than 3 or 4. Instead peaks over threshold methods keep track of whether extremes occur at the same time or not. The last part of the talk surveys work in progress on new parametric multivariate generalized Pareto models. These models, perhaps surprisingly, have simple and tractable likelihoods, and permit use of the entire standard maximum likelihood machinery for estimation, testing, and model checking. I will show how the models can contribute to wind storm insurance, financial portfolio selection, and landslide risk assessment. Throughout, an important issue is how estimated risk should be presented and understood.

## Thursday

### 9.00-10.00: Survey B2

Holger Rootzén
Mathematical Sciences, Chalmers and Gothenburg University, Sweden
Extreme value statistics: from one dimension to many

Extreme value statistics helps protect us from devastating waves, floods, windstorms, and landslides. It is widely used for risk management in finance and insurance, and contributes to material science, bioinformatics, medicine, and traffic safety.

The first talk introduces the well-established and widely used statistical theory for extremes of one-dimensional variables. Topics include the block maxima and peaks over thresholds methods; asymptotic (tail) independence; threshold choice; maximum likelihood methods; and model diagnostics. The theory is illustrated by examples from climate change statistics, insurance, metal fatigue, gene estimation, and driver inattention.

In the second talk I survey some of the intensive research in multivariate extreme value statistics which happens right now. Multivariate block maxima methods have so far seen the most development, and the methods have already been directed at important societal problems from hydrology. However, in more than one dimension, block maxima hide information of whether extremes occur at the same time or not, and likelihoods often become unwieldy in dimensions higher than 3 or 4. Instead peaks over threshold methods keep track of whether extremes occur at the same time or not. The last part of the talk surveys work in progress on new parametric multivariate generalized Pareto models. These models, perhaps surprisingly, have simple and tractable likelihoods, and permit use of the entire standard maximum likelihood machinery for estimation, testing, and model checking. I will show how the models can contribute to wind storm insurance, financial portfolio selection, and landslide risk assessment. Throughout, an important issue is how estimated risk should be presented and understood.

### 10-15-11.15: Monte Carlo Methods (I13)

Organizer: Jimmy Olsson

Département CITI, Telecom SudParis, France

In this paper, we consider the implications of "folding" a Markov chain Monte Carlo algorithm, namely to restrict simulation of a given target distribution to a convex subset of the support of the target via a projection on this subset. We argue that this modification should be implemented in every case an MCMC algorithm is considered. In particular, we demonstrate improvements in the acceptance rate and in the ergodicity of the resulting Markov chain. We illustrate those improvement on several examples, but insist on the fact that they are universally applicable at a negligible computing cost, independently of the dimension of the problem.

Joint work with Christian Robert (Université Paris-Dauphine).

School of Mathematics, Bristol, United Kingdom

We study a distributed particle filter proposed by Bolic et al. (IEEE Trans. Sig. Proc. 2005). This algorithm involves $$m$$ groups of $$M$$ particles, with interaction between groups occurring through a "local exchange" mechanism. We establish a central limit theorem in the regime where $$M$$ is fixed and $$m$$ grows. A formula we obtain for the asymptotic variance can be interpreted in terms of colliding Markov chains, enabling analytic and numerical evaluations of how the asymptotic variance behaves over time, with comparison to a benchmark algorithm consisting of $$m$$ independent particle filters. Subject to regularity conditions, when m is fixed both algorithms converge time-uniformly at rate $$M^{−1/2}$$. Through use of our asymptotic variance formula we give counter-examples satisfying the same regularity conditions to show that when $$M$$ is fixed neither algorithm, in general, converges time-uniformly at rate $$m^{−1/2}$$.

This is joint work with Kari Heine (UCL).

References
http://arxiv.org/abs/1505.02390

Department of mathematics, KTH, Stockholm, Sweden

We shall discuss a sequential Monte Carlo-based approach to approximation of probability distributions defined on spaces of decomposable graphs, or, more generally, spaces of junction (clique) trees associated with such graphs. In particular, we apply a particle Gibbs version of the algorithm to Bayesian structure learning in decomposable graphical models, where the target distribution is a junction tree posterior distribution. Moreover, we use the proposed algorithm for exploring certain fundamental properties of decomposable graphs, e.g., clique size distributions. Our approach requires the design of a family of proposal kernels, so-called junction tree expanders, expanding a given junction tree by connecting randomly a new node to the underlying graph. The performance of the estimators is illustrated through a collection of numerical examples demonstrating the feasibility of the suggested approach in high-dimensional domains.

This is joint work with Tatjana Pavlenko and Felix Rios (KTH).

### 10.15-11.15: Biostatistics (I14)

Department of Statistics, Stockholm University, Sweden

The sample size of a clinical trial is often determined based on power to show a statistically significant effect versus control. The conventional significance level of 5% is usually used. In this traditional approach, the sample size as well as the rule to reject the non-effect null hypothesis do not depend on the size of the population having the disease.

We are interested in a target population for the trial with a rare disease where not enough patients exist to conduct a trial of traditional size. We discuss how we alternatively can justify sample size for such a population based on a decision theoretic approach. The sample size based on this approach depends on the population size. Our method is applied to real disease cases. We discuss then potential justifications for significance levels.

The talk is based on a joint work together with Simon Day, Siew Wan Hee, Jason Madan, Martin Posch, Nigel Stallard, Mårten Vågerö and Sarah Zohar for the InSPiRe project, and a joint work together with Carl Fredrik Burman for the IDEAL project. These projects have received funding from the European Union's Seventh Framework Programme for research, technological development and demonstration under grant agreement no 602144 and no 602552.

AstraZeneca, Sweden

Mid-study design modifications are becoming increasingly accepted in confirmatory clinical trials, so long as appropriate methods are applied such that error rates are controlled. It is therefore unfortunate that the important case of time-to-event endpoints is not easily handled by the standard theory. We analyze current methods that allow design modifications to be based on the full interim data, i.e., not only the observed event times but also secondary endpoint and safety data from patients who are yet to have an event. We show that the final test statistic may ignore a substantial subset of the observed event times, and that this leads to inefficiency compared to alternative sample size re-estimation strategies.

Ziad Taib and Mahdi Hashemi. Interim Analyses: the more the merrier?

Early Clinical Biometrics, Astrazeneca, Sweden

Continuous administrative interim looks at data during the course of a clinical trial are sometimes suggested as a tool for decision making. Sometimes it is even suggested that being of an administrative nature, there is no need for statistical formalism since there is no intention of modifying the ongoing trial. In other circumstances, interim analyses are proposed as part of futility analyses in clinical trials with group sequential designs. We discuss such interim analyses and argue that the need for rigorous statistical methods is no different in administrative analyses than in other types of interim analyses. Moreover, we argue that, quite often, increasing the number of interim looks leads to higher risk and increased cost compared to e.g. one single interim analysis.

This presentation is based on joint work with Mahdi Hashemi and Magnus Kjaer from Astrazeneca RD, Gothenburg, Sweden.

### 11.30-12.30: Closing Lecture

David Lando
Center for Financial Frictions, Copenhagen Business School, Denmark
Models in bank regulation

The core of banking regulation seeks to limit the risk that banks default. Setting limits involves quantifying the risks that banks take when they give loans, hold securities and trade in derivatives, and when they choose how to fund their activities. The models used can be extremely simple or highly complex. Using some basic examples I will highlight the important trade-offs that the regulators face between simplicity and transparency on one side and realism and flexibility on the other. I will also discuss how regulatory rules may affect market prices and have profound effects on bank behavior sometimes creating perverse incentives.