*Aliaksandr Hubin, Storvik*, University of Oslo

The talk is going to be devoted to Bayesian model selection in generalized linear models with random effects. Generalized linear models with random effects are addressed for inference and prediction in a wide range of different applications providing a powerful scientific tool for the researchers and analysts coming from different fields. In most of these fields more and more sources of data are becoming available introducing a variety of hypothetical explanatory variables for these models to be considered. Selection of an optimal in some sense combination of these variables is thus becoming crucial. The posterior marginal distribution of the models can be viewed as a relevant measure for the model evidence, based on the observed data. However the number of models to select from is exponential in the number of candidate variables, moreover the search space in this context is often extremely non-concave and has numerous local extrema or statistically speaking modes. Hence efficient search algorithms have to be adopted for selecting a sufficiently good model within a reasonable amount of time. In this paper we introduce and implement efficient mode jumping MCMC algorithms for calculating posterior marginal probabilities of the models and model selection with respect to them for generalized linear models with a random effect. Marginal posterior probabilities of models given the specific choice of priors and any choice of covariates are efficiently calculated using the integrated nested Laplace approximations approach (INLA) for the class of models addressed. We further apply the algorithms to both simulated data and a real epigenetic data set in the experiments.

- C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual comparison.
*ACM COMPUTING SURVEYS*, pages**268–308, 2003**. - E. I. George and R. E. Mcculloch. Approaches for bayesian variable selection.
*Statistica Sinica*, pages**339–374, 1997**. - H. Rue, S. Martino, and N. Chopin. Approximate bayesian inference for latent Gaussian models by using integrated nested laplace approximations.
*Journal of the Royal Statistical Sosciety*, pages**71(2):319–392, 2009**. - G. Storvik. On the flexibility of metropolis-hastings acceptance probabilities in auxiliary variable proposal generation.
*Scandinavian Journal of Statistics*, pages**38:342–358, 2011**. - H. Tjelmeland and B. K. Hegstad. Mode jumping proposals in mcmc.
*Scandinavian Journal of Statistics*, pages**28:205–223, 1999**.

*Keywords:* Bayesian model selection; High performance computations; Local meta-heuristics; Statistical modeling

*Magne Aldrin Aldrin*, Norwegian Computing Center

where \(\gamma\) is the expected number of goals scored by one team in a match between to equally good teams, \(s_A\) expresses the goodness of team A and \(s_B\) expresses the goodness of team B. Furthermore, the number of goals scored by team B are Poisson distributed with parameter \(\lambda(B,A)\), independent of the number of goals scored by team A. *Estimation:* Before the start of the tournament, the model parameters are estimated from an artificial data set based on evaluations from several Norwegian football experts. The experts have guessed the results of several hundred hypothetical games between the 24 participating teams. When the tournament starts, the real games are taken into account as well. The information value of the hypothetical games (the expert guesses) are weighted versus the real games. The parameters are estimated by maximising a modified Poisson likelihood. The difference from an ordinary Poisson likelihood is that it is robustified by downweighting large victories and by adding a penalty term that shrinks the individual strength parameters towards a common mean. *Simulation:* The winner probabilities are found by simulating all remaining matches numerous times, or in other word, we ``play’‘the remaining part of the tournament many times. From these simulations, we calculate all the teams’ chances to win the championship, to win their group, to reach the semi-finals, etc.

*Keywords:* Football prediction; Poisson regression; Simulation;

*Janne Mannseth*, Mathematics, University of Bergen

Hamiltonian Monte Carlo (HMC) is an MCMC algorithm used to generate samples from complex probability distributions. The poster explores the construction of new numerical integration schemes to be used in HMC and the efficiency of these. It considers three new integration schemes which are extensions of the commonly used leapfrog method. The performance measures used are the asymptotic expected acceptance probability and efficient sample size. All of the integration schemes are tested within the framework of the No-U-turn sampler, both with a logistic regression model and a student t-model. Testing reveals that the results of the leapfrog method are exceeded by all the new methods both for asymptotic expected acceptance probability and efficient sample size for the logistic model.

*Keywords:* Hamiltonian Monte Carlo; Integration scheme; Leapfrog method; No-U-Turn-Sampler (NUTS)

*Jong Jeong*, Biostatistics, University of Pittsburgh

Time-to-event data may be analyzed based on (1) cumulative information up to a specific time point through the hazard function or the survival function or (2) residual information beyond that time point through the mean or quantile residual life function. In this talk, a new summary measure of lost lifespan, which has several advantages over the existing summary measures for censored time-to-event data, is introduced. The distribution of lost lifespan is characterized by the reverse hazard function. Nonparametric inference on the lost lifespan for one sample and two-sample cases are presented, together with some extension to a regression setting. Asymptotic distributions of the estimators and test statistics are derived, and some simulation results for the small sample behaviors of them will be presented. A real data example will be illustrated.

*Keywords:* Lost Lifespan; Median; Residual life; Survival analysis

*Horst Lewitschnig*, ATV QM, Infineon Technologies Austria AG

Highly reliable semiconductors are used e.g. in cars, airplanes, trains, satellites, etc. Such semiconductors must withstand multiple kinds of stress throughout their lifetime. To ensure this, devices are put into stress tests to simulate their life under usage environments. Within a stress test, one or several stress factors can be applied, like temperature cycling, voltage and temperature, voltage and humidity,… Classically, stress tests are performed in parallel. We, however, investigated a new approach. An experiment was conducted in which two stress types were applied sequentially: humidity stress and temperature cycling stress. The survival probability of the devices under test depended on the order in which the stress types were applied, i.e. on the stress history.

- When first humidity stress was applied and then temperature cycling stress, almost all of the devices under test survived.
- If the stress sequence was the other way round, i.e. first temperature cycling stress and then humidity stress was applied, most of the devices under test failed.

In this talk, we introduce a new reliability model to picture this situation. We refer to this model as the Stress Interaction Model. A bivariate random variable for the lifetime of the device under two types of stress is introduced. It reflects the time under each stress type. A probability space is built up on this, and the stress sequence is modelled as a curve in \(\mathbb{R}^2\). The hazard function is set up as a vector field. Depending on its vector analytical field properties, this can lead to an additive or multiplicative hazard function model. If the curl of the vector field exists, the path integral along a curve depends on its form between two points A and B. With this setup, it is possible to describe the survival probability of the devices under test as a function of the stress history. This reliability model combines vector analysis and probability theory. Therefore it enables reliability modelling whenever the survival probability depends on a sequence of factors. Further applications of the stress interaction model can be:

- Medicine: The survival probability of a patient may depend on the sequence of treatments; e.g. if a conservative treatment is done before an operation or the other way around.
- Quality: The detection probability of flaws in a production line can depend on the sequence of quality inspections; e.g. if it makes a difference whether just one thorough inspection at the end of the line is done or if several inspections throughout the line are performed.
- Finance: The default risk of a managed portfolio over a certain period of time might depend on the state of the economic cycle. For instance, in an economic downturn or upturn, it makes a difference whether first bonds are bought and afterwards they are exchanged by shares or the other way round.

- H. Lewitschnig, “Advances on the Stress Interaction Model,”
*IEEE Transactions on Reliability*, Vol. 64, No. 1, March 2015, p. 528 – 534. - H. Lewitschnig, “Stress interaction model,” in
*Proceedings of the Applied Reliability Symposium*, Berlin, Germany, 2010, Track 2, Session 2. - G. C. Schwartz and K. V. Srikrishnan,
*Handbook of Semiconductor Interconnection Technology*, 2nd Ed., Boca Raton, FL, USA: CRC, 2006.

*Keywords:* Combined Stress Tests; Probability Theory; Reliability; Vector Analysis

*Bart Buelens, Van den Brakel*, Department of Statistical Methods, Statistics Netherlands

We compare two approaches to deal with measurement errors in sample surveys employing multiple modes of data collection. When measurement errors are mode dependent the total bias of the final estimates is composed of a mix of different measurement errors, reflecting the distribution of the survey response over the modes. We study sequential survey designs in particular, where responding via the web is offered initially, and more expensive modes are only used thereafter in an attempt to reach web-non-respondents. Repeating such designs multiple times, the mode composition realised in the different editions may vary, compromising temporal comparability of survey results.Two methods have been proposed in the literature to address this issue, and can be seen as adaptations of the commonly used general regression estimator (GREG), \(\sum_{i=1}^n{w_i y_i}\)(Särndal et al. 1992), a weighted sum of the \(n\) observation \(y_i\) in the sample. The weights \(w_i\) account for unequal inclusion probabilities and they correct for selective non-response by calibrating the weights such that the sums over the weighted auxiliary variables equate the known population totals.One method consists of correcting the observations \(y_i\) for mode dependent measurement errors and requires the explicit estimation of these errors (Suzer-Gurtekin at al. 2012). The other method seeks to balance the measurement errors by calibrating the sample to a fixed response mode composition and achieves this by adjusting the calibration weights \(w_i\) (Buelens and Van den Brakel 2015).We show that under certain conditions and with identical assumptions about measurement error models and missing data patterns, the two methods - while motivated very differently - give almost equal results. We illustrate the two methods using 36 monthly editions of the Labour Force Survey in the Netherlands, demonstrate the similarities of the methods and discuss their strengths and weaknesses.

- Buelens, B. and J. A. Van den Brakel (2015). Measurement error calibration in mixed-mode sample surveys. Sociological Methods & Research 44(3), 391-426.
- Särndal, C. E., B. Swensson, and J. Wretman (1992). Model Assisted Survey Sampling. Springer-Verlag, New York.
- Suzer-Gurtekin, Z. T., S. Heeringa, and R. Vaillant (2012). Investigating the bias of alternative statisitcal inference methods in sequential mixed-mode surveys. Proceedings of the JSM, Section on Survey Research Methods. American Statistical Association, 4711-25.

*Keywords:* Measurement error; Mixed-mode surveys; Response mode calibration; Survey sampling

*Anders Hildeman, Bolin, Wallin, Yu*, Mathematical sciences, Chalmers University of Technology

Computed tomography (CT) equivalent information is needed for attenuation correction in PET imaging and for dose planning in radiotherapy. Prior work has shown that Gaussian mixture models can be used to generate a substitute CT image from a specific set of MRI modalities. This is highly attractive since MR images can be acquired without exposing the subject to hazardous ionizing radiation and MRI information is often of interest in its own right. In this work we improve these models by incorporating spatial information through assuming the mixture class probabilities to be distributed according to a discrete Markov random field. Furthermore, the mixtures are extended from Gaussian to normal inverse Gaussian distributions, allowing heavier tails and skewness. Model parameters are estimated from training data using a maximum likelihood approach. Due to the spatial model there is no closed form expression for the likelihood function and a standard EM algorithm would not be possible. Instead, an EM gradient algorithm utilizing MCMC approximations is developed. This procedure yields acceptable convergence properties also when the large quantity of data makes other common modifications of the EM-algorithm infeasible. The estimation procedure is not only applicable to this specific problem but can be used to amore general family of problems where the M-step is not possible to perform but where the gradient of the likelihood can, at least approximately, be evaluated. The advantages of the spatial model and normal inverse Gaussian distributions are evaluated with a cross-validation study based on data from 14 patients.

*Keywords:* Markov random field; MRI; Non Gaussian; Spatial

*Juha Karvanen, Tikka*, Department of Mathematics and Statistics, University of Jyväskylä

Automated symbolic simplification of nonparametric probabilistic expressions is needed in the algorithmic approach to causal inference. In a typical use case, we have a graphical model that describes the causal relations in a nonparametric way and we are interested in expressing the effect of intervention \(\textrm{do}(x)\) on variable \(y\) using only observed probabilities. This problem can be solved by using do-calculus (Pearl 2009) and Tian and Pearl (2002) and Shpitser and Pearl (2006) have derived a graph theoretic algorithm for this task. The algorithm has been implemented in R package causaleffect (Tikka 2016, Tikka and Karvanen 2015). The results of Shpitser’s and Pearl’s algorithm are often unnecessarily complex: they contain summations over variables that can be eliminated. We have identified a nontrivial set of rules for automated simplification and implemented them in the causaleffect R package. As a result, the expressions are easier to understand and shorter to report. Moreover, the estimation of causal effects may become faster and more accurate if unnecessary variables suffering from measurement error or missing data are removed from the expressions of the estimators.

- J. Pearl (2009)
*Causality: Models, Reasoning and Inference*. 2nd edition. Cambridge University Press, New York. - I. Shpitser, J. Pearl (2006) Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models. In
*Proceedings of the 21st National Conference on Artificial Intelligence*Volume 2, pp. 1219–1226. AAAI Press. - S. Tikka (2016) causaleffect: Deriving Expressions of Joint Interventional Distributions and Transport Formulas in Causal Models. R package version 1.2.4
- S. Tikka, J. Karvanen (2015) Identifying causal effects with the R package causaleffect. Accepted for publication in
*Journal of Statistical Software*, https://cran.r-project.org/web/packages/causaleffect/vignettes/causaleffect.pdf - J. Tian, J. Pearl (2002) A General Identification Condition For Causal Effects. In
*Proceedings of the 18th National Conference on Artificial Intelligence*, pp. 567–573. AAAI/The MIT Press.

*Keywords:* Simplification; Probabilistic Expression;
Causal Inference; Graphical Model; Graph Theory

*Jens Malmros, LECR Rocha*, Mathematics, Stockholm university

Respondent-driven sampling (RDS) is a link-tracing sampling method that is especially suitable for sampling hidden populations. RDS combines an efficient snowball-type sampling scheme with inferential procedures that yield unbiased population estimates under some assumptions about the sampling procedure and population structure. Several seed individuals are typically used to initiate RDS recruitment. However, standard RDS estimation theory assume that all sampled individuals originate from only one seed. We present an estimator, based on a random walk with teleportation, which accounts for the multiple seed structure of RDS. The new estimator can also be used on populations with disconnected social networks. We numerically evaluate our estimator by simulations on artifical and real networks. Our estimator outperforms previous estimators, especially when the proportion of seeds in the sample is large. We recommend that our new estimator are to be used in RDS studies, in particular when the number of seeds is large or the social network of the population is disconnected.

*Keywords:* Disconnected network; Markov process; Random walk; Respondent-driven sampling

*Frederik Riis Mikkelsen, NRH Hansen*, Department of Mathematical Sciences, University of Copenhagen

This poster investigates degrees of freedom for a class of estimators of a mean value parameter in \(\mathbb{R}^n\). This class includes several widely used estimators with discontinuities such as *best subset selection* and *debiased lasso* from linear regression. A general representation of the degrees of freedom is presented. For debiased lasso this general representation leads to an estimate of the degrees of freedom based on the lasso solution path, which in turn can be used for estimating the risk of debiased lasso. The usefulness of the risk estimate for selecting the number of variables is demonstrated via simulations. This shows that appropriate estimation of degrees of freedom is possible also when the estimator is nonlinear and discontinuous.

- Mikkelsen, F. R. & Hansen N. R. (2015) ‘Degrees of Freedom for Piecewise Lipschitz Estimators’. http://arxiv.org/abs/1601.03524
- Tibshirani, R. J. (2015), ‘Degrees of freedom and model search’, Statistica Sinica 25, 1265-1296.
- Tibshirani, R. J. & Taylor, J. (2012), ‘Degrees of freedom in lasso problems’, Ann. Statist. 40(2), 1198-1232.
- Ye, J. (1998), ‘On measuring and correcting the effects of data mining and model selection’, J. Amer. Statist. Assoc. 93(441) 120-131.

*Keywords:* best subset selection; debiased lasso; degrees of freedom; stein’s lemma

*Jonatan Kallus, Sánchez, Jauhiainen, Nelander, Jörnsten*, Mathematical sciences, University of Gothenburg

In the high-dimensional setting, where the number of variables isgreater than the number of observations, statistical models need asparsity constraint. Instead of estimating a correct sparsity, whichdepends on the unknown noise level and variable interdependencestructure, we propose an approach that combines solutions fordifferent sparsity constraints. By choosing a model simultaneously forall sparsity levels, both strength and robustness is improved.Modeling of gene expression data as a graph, where two genes areconnected by an edge if their partial correlation is significantlynon-zero, has proven useful for classification of cancer patients aswell as for finding potential target genes for cancer therapies[1]. This is an example of a high-dimensional model selection problem.In the context of Gaussian graphical models, the *graphical lasso* canestimate a sparse precision matrix which, in turn, has acorrespondence with the adjacency matrix of the above mentioned graph[2]. Much work has focused on estimating the correct penalization\(\lambda\) for a given data set, along with a maximum likelihood graph[3], [4]. Here we focus, instead, on estimating a graph whose edgesachieve a given false detection rate. This objective is more useful insome use cases. Our method is based on bootstrapping and statisticalmodeling of the populations of false and true edges. In contrast toexisting approaches [5], we bypass the intermediate problemof finding an optimal \(\lambda\). Instead, our method gains bothstrength and robustness by incorporating graph estimates associatedwith different penalization levels.Bootstrapping and estimating a graph with the graphical lasso for each bootstrap sample yields \(B\) graphs, with equal sets of nodes butdifferent sets of edges. Thus, each potential edge \(i\) will haveappeared \(X_i\) times, \(X_i\in\{0,\ldots,B\}\). \(X_i\) can be describedas coming from a mixture of beta-binomial distributions, withcomponents capturing either the population of truly null edges or the population of truly alternative edges. Fitting this distribution makesit straight forward to choose a threshold \(t\in[0,B]\) corresponding toa given false detection rate such that edges \(i\) with \(X_i>t\) aredeclared significant.Now, performing the above procedure once for each \(\lambda\) in a rangeof sensible \(\lambda\)’s, it is reasonable to assume that theparameters of the distribution are smooth as functions of\(\lambda\). We can thus declare an optimization problem, global in\(\lambda\), where we fit all distributions at once, under smoothnessconstraints for all parameters. This approach is a novel way of incorporating results associated withdifferent \(\lambda\)’s, hence making a global (in \(\lambda\)) classification of edges as significant or not significant, whereas other methods choose one \(\lambda\) before making a local graph estimate. The results show that our approach achieves a higher true positiverate than other methods, while controlling the false detection rate atthe specified level. Unlike other methods, it is able to useinformation from a range of penalization levels\(\{\lambda_j\}_{j=1}^k\) to come up with a final joint estimate. Due toglobal properties, not only can we make a robust and accurate estimate,but also utilize smoothness to require solutions for fewer different \(\lambda\)’s, thus reducing computation time.Our method has been evaluated with extensive simulation studies undera range of different variable interdependence structures and appliedto public gene expression data from the cancer genome atlas [6]. Animplementation of our method is made available as an R package.

- Pe’er D, Hacohen N. Principles and Strategies for Developing Network Models in Cancer.
*Cell*. 2011;144(6):864-8732. - Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso.
*Biostatistics (Oxford, England)*. 2008;9(3):432-4413. - Meinshausen N, and Bühlmann P. Stability selection.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*. 2010;72:417-4734. Liu H, Roeder K, Wasserman 4. L. Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models.*Advances in Neural Information Processing Systems*. 2010;23:1432-14405. - Li S, Hsu L, Peng J, Wang P. Bootstrap Inference for Network Construction with an Application to a Breast Cancer Microarray Study.
*The annals of applied statistics*. 2013;7(1):391-4176. - The Cancer Genome Atlas Research Network, Weinstein J, Collisson E, Mills G, Shaw K, Ozenberger B, Ellrott K, Shmulevich I, Sander C, Stuart J. The Cancer Genome Atlas Pan-Cancer Analysis Project.
*Nature Genetics*. 2013;45:1113-1120

*Keywords:* bootstrap; high-dimensional data; model selection; sparsity

*Olav Nikolai Breivik, Storvik, Nedreaas*, University of Oslo

Knowledge about how many fish that have been killed due to bycatch is an important aspect of ensuring a sustainable ecosystem and fishery. We introduce a Bayesian spatio-temporal prediction method for historical bycatch. The model used assumes that occurrence of bycatch can be described as a log-linear combination of covariates and random effects modeled as Gaussian fields. Integrated Nested Laplace Approximations (INLA) is used for fast calculations. The method introduced is general, and is applied on bycatch of juvenile cod (*Gadus morhua*) in the Barents Sea shrimp (*Pandalus borealis*) fishery. In this fishery we compare our prediction method with the well known ratio and effort methods, and conclude that our method is more accurate and robust.

*Keywords:* Bycatch; Commercial fishery; INLA; Spatio-temporal

*Kari Krizak Halle, Bakke, Djurovic, Bye, Ryeng, Wisløff, Andreassen, Langaas*, Department of Mathematical Sciences, NTNU

In genome-wide association (GWA) studies, a large number of genetic markers are tested for association with a given phenotype. Multiple testing correction methods can control the familywise error rate (FWER) by estimating a local significance level for the individual tests. We have developed a new method for estimating the local significance level and control of the FWER in GWA studies, which includes adjustment for covariates such as age and sex. Our method assumes that the relationship between the phenotype and the genetic and environmental covariates can be modelled by a generalized linear model (GLM, McCullagh and Nelder (1989)). We use a multivariate score statistic, which under the complete null hypothesis of no association between genotypes and the phenotype asymptotically follows a multivariate normal distribution. Inspired by the work of Moskvina and Schmidt (2008) and Dickhaus and Stange (2013), we use an approximation to the distribution of the multivariate score test statistics to estimate the local significance level. Using simulated and real data (one study on schizophrenia and bipolar disorder and one study on maximal oxygen uptake) we show that our method is a powerful alternative to the popular Bonferroni method. Permutation methods can not be used for GLMs when environmental covariates are included because the assumption of exchangeability is in general not satisfied. When the exchangeability assumption is not satisfied, our method is an efficient alternative to permutation methods for multiple testing correction.

- Dickhaus and Stange (2013). Multiple point hypothesis test problems and effective numbers of tests for control of the family-wise error rate, Calcutta Statistical Association Bulletin, 65, 123-144.
- Halle et al. (2016). ’Efficient and powerful family-wise error control when testing for phenotype-genotype association in genome-wide association studies (in prep.)
- McCullagh and Nelder (1989). Generalized Linear Models’, 2nd edition, Chapman and Hall Moskvina and Schmidt (2008): On multiple testing correction in genome-wide association studies, Genetic Epidemiology’, 32, 567-573.

*Keywords:*

*Jacob Østergaard, SD Ditlevsen, AR Rahbek*, Mathematics

Cointegration has been applied in econometrics since it’s introduction in 1981 by Nobel Laureate Clive Granger. Later fellow Nobel Laureate Robert Engle and Granger popularized the idea. The following decades led to extensive research in the field, and in 1996, the seminal monograph by Søren Johansen formalized the theory as a statistical sound framework to analyze integrated non-stationary time series. Today the *Johansen test* has become a standard statistical procedure for inference on multivariate nonstationary and cointegrated autoregressive time series, and it is now included as a standard tool in most statistical software packages. The theory on cointegration has since been developed further to include switching regime models, continuous time models, nonlinear cointegration, but until now applications has mainly been secluded to the field of econometrics. Some attempts at bridging cointegration with dynamical systems have been tried, but neither has led to a fully interpretable result. We show how cointegration can be applied to analyze a multivariate system of coupled oscillators, specified as a general system of rotating processes, exhibiting interaction in the phase processes. We define the system of coupled phase processes through a \(p\)-dimensional continuous time stochastic differential equation \[d\phi_t = f(\phi_t) + \mu dt + \Sigma dW_t\]where the coupling of phases is imposed through the function \(f\). This formulation covers among others the famous Kuramoto model, as a special case where \(f\) is nonlinear. In our study we specify a linear interaction in the phases and present a simulation study on an assumed observed data generating process \(z_t = (\gamma_t \cos(\phi_t), \gamma_t sin(\phi_t) )'\), where \(\gamma_t\) is a corresponding amplitude process. We perform a cointegration analysis of the latent \(\phi_t\) process R from a numerical solution of \(z_t\), and demonstrate how we can identify the interaction in the system, and statistically conclude on independent oscillators. Finally we discuss our current research on applications of cointegration to systems with non-linear coupling functions.

- Bec, F. & Rahbek, A. (2004) ‘Vector equilibrium correction models with non-linear discontinuous adjustment’.
*Econometrics Journal* - Dahlhaus, R. & Neddermeyer, J. (2012) ‘On The Relationship Between The Theory Of Cointegration And The Theory Of Phase Synchronization’.
*Working paper* - Engle, R. & Granger, C. (1987) ‘Co-Integration and Error Correction: Representation, Estimation, and Testing’.
*Econometrica* - Granger, C. (1981) ‘Some Properties Of Time Series Data And Their Use In Econometric Model Specification’.
*Journal of Econometrics* - Johansen, Søren (1996) ‘Likelihood-Based Inference in Cointegrated Vector autoregressive Models’.
*Oxford University Press* - Kammerdiner, A.R. & Pardalos, P.M. (2012) ‘Analysis of multichannel EEG recordings based on generalized phase synchronization and cointegrated VAR’.
*Computational Neuroscience*?*Kessler, M. & Rahbek, A. (2001) ‘Asymptotic Continuous Time Likelihood based Cointegration Inference’.*Scandinavian Journal of Statistics*? - Kessler, M. & Rahbek, A. (2004) ‘Identification and Inference for Cointegrated and Ergodic Gaussian Diffusions’.
*Statistical Inference for Stochastic Processes*? - Kristensen, D. & Rahbek, A. (2010) ‘Likelihood-Based Inference For Cointegration With Nonlinear Error-Correction’.
*Journal of Econometrics*

*Keywords:* Cointegration; Oscillating system; Phase process; Synchronization

*Søren Wengel Mogensen, AHP Petersen, ASB Buchardt, NRH Hansen*, Department of Mathematical Sciences, University of Copenhagen

Survival prognosis modeling is both challenging and of potential importance when making clinical decisions. The Prostate Cancer DREAM challenge encouraged participating teams to develop prognostic models for patients diagnosed with metastatic castration resistant prostate cancer. Training data originated from the comparator arms of three clinical trials, and data from a fourth clinical trial was used as validation data. In total, data comprised around 2000 patients and more than 100 potential predictors. We used a combination of imputation methods and predictive models to make a systematic evaluation of performance as measured by both cross-validation and the validation data. The prediction for a single patient was given in terms of a *risk score* and integrated AUC (iAUC) was used as a performance measure. We employed three imputation schemes: one under a missing completely at random assumption (MCAR) and two under a missing at random assumption. Of the two missing at random schemes one used only the predictor information for imputation (MAR) and one used both the predictor information and the survival response (MARwR). For some models we used stability selection which is a variable selection method that does variable selection on subsamples of data. Variables that are often selected on these subsamples then enter the model (Meinshausen and Bühlmann 2010). The models applied included a proportional hazards model with lasso penalty, a proportional hazards model with prior stability selection, and a generalized additive model with prior stability selection. Random survival forests (Ishwaran et al. 2008) and gradient boosting machines (Friedman 2002) were fitted using both the full set of predictors and a set of predictors chosen by stability selection. The random survival forests without variable selection achieved the highest iAUC (cv-iAUC = 0.70, validation-iAUC = 0.78) whereas the generalized additive model was the best among those applying variable selection (cv-iAUC = 0.69, validation-iAUC = 0.76). We observed that cross-validation and validation gave roughly the same rankings of models, however, there was a marked difference in the absolute values as indicated in the above numbers. This demonstrates that generalization of predictive performance to other data sets is difficult. We saw no clear advantage of the more elaborate imputation schemes MAR and MARwR, and the latter even seemed to hurt the performance for some models.

- Friedman, Jerome H. (2002) ‘Stochastic Gradient Boosting’.
*Computational Statistics and Data Analysis*38(4): 367-78 - Ishwaran, Hemant, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer (2008) ‘Random Survival Forests’.
*The Annals of Applied Statistics*2(3): 841-60 - Meinshausen, Nicolai, and Peter Bühlmann (2010) ‘Stability Selection’.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*72(4): 417-73

*Keywords:* Prediction; Prognosis modeling; Prostate cancer; Survival analysis

*Kristoffer Hellton, Thoresen*, University of Oslo, Dept. of Mathematics

When measuring \(M\) high-dimensional genomic data types for the same tissue sample, an integrative approach to analysis, using all data, can strengthen inference. This is also the case when clustering patient samples, and several integrative cluster procedures have been proposed. Common for these methodologies is the restriction of a joint cluster structure, equal in all data types. We present a clustering extension of the Joint and Individual Variance Explained (JIVE) algorithm (Lock et al. 2013) , Joint and Individual Clustering (JIC), enabling the construction of both joint and data type-specific clusters simultaneously. This requires \(M+1\) numbers of clusters, which are difficult to find using standard approaches in clustering. Instead, as the procedure utilizes the connection between \(k\)-means clustering and principal component analysis, we use the number of relevant principal components to determined the number of clusters. A component is defined as relevant if the component score distribution deviates significantly from normality, under the assumption of normally distributed noise. The procedure is illustrated using gene expression and miRNA levels in breast cancer (TCGA), and the analysis suggests a division into three joint clusters common for both data types and two expression-specific clusters.

- Lock E. F., Hoadley K. A., Marron J. S., Nobel A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types.
*Annals of Applied Statistics***71**, 523–542.

*Keywords:* Clustering; Data combination; Genomics; Principal components

*Rolf Larsson*, Mathematics, Uppsala University

In this article, we focus on the integer valued autoregressive model, INAR (1). We test the null of serial independence, where the INAR parameter is zero, versus the alternative of a positive INAR parameter. To this end, we propose different explicit approximations of the likelihood ratio statistic. We derive the limiting distributions of our statistics under the null. In a simulation study, we compare size and power of our tests with the score test, proposed by Sun and McCabe (2013). The size is either asymptotic or derived via response surface regressions of critical values. We use Poisson as well as negative binomial innovations. In the Poisson case, we find that some of our statistics are superior to score in terms of power and work just as well in terms of size. In the case with negative binomial innovations, it is only one of our proposed statistics that can compete with the score test in terms of size and power.

*Keywords:* INAR model; Likelihood ratio test;

*Siuli Mukhopadhyay*, DEpartment of Mathematics, IIT Bombay

In this talk D-optimal Bayesian repeated measures designs for generalized linear models are discussed. Trials with \(t\) treatments and \(p\) periods, for \(t\leq p\), are considered. The designs proposed in this paper minimize the determinant of the variance of the estimated treatment effects over all possible allocation of the \(n\) subjects to the treatment sequences. In the regression model considered for the linear predictor, both direct effect as well as the carryover effect of each treatment are considered. It is assumed that the \(p\) observations from each subject are mutually correlated while the observations from different subjects are uncorrelated. The correlation between observations within subjects are modeled using a ``working correlation structure”. The correlation structures are assumed to be either auto-regressive or compound symmetric in nature. Since main interest is in estimating the treatment effects the subject effect is assumed to be nuisance, and generalized estimating equations are used to estimate the marginal means. To address the issue of parameter dependence a Bayesian approach is employed. A prior distribution is assumed on the parameter vector which is then incorporated into the D-optimal design criterion by integrating it over the prior distribution. Two numerical examples, one with binary outcomes in a 4 × 4 trial and another based on count data for a 2 × 2 trial are used to illustrate the proposed method. The effect of choice of prior distributions on the designs are also studied.

*Keywords:* Bayesian designs; Carry over effects; D-optimal design; Generalized Estimating Equations

*Thea Bjørnland, M.L Langaas*, Mathematical sciences, NTNU

Extreme Phenotype Sampling (EPS) (response-dependent genotyping, selective genotyping), is a sampling design used for genetic association studies for continuous traits when the number of individuals that can be genotyped is restricted. Extreme phenotype individuals are chosen for genotyping under the assumption that the power to detect causal genetic variants is greater in an extreme phenotype sub-cohort compared to a randomly sampled sub-cohort. For example, if the aim is to investigate genetic association with waist circumference, the individuals in the highest and lowest quarters of the waist circumference population can be sampled for genotyping. In practice, case-control methods are often used for association testing; genotype frequencies are compared between the low and high extreme phenotype groups. The potential power gained by the extreme sampling can be lost under simplifications such as treating a continuous outcome as a binary variable. We present valid statistical tests and models for the EPS design with a continuous outcome, focusing on association with common genetic variants. Statistical methods for two EPS designs are of relevance; first the phenotype and non-genetic information is obtained, along with genetic information, only for extreme phenotype individuals. Second, the genetic information is obtained for extreme individuals, but phenotype and covariate information is obtained for a larger sample. In the latter design, likelihood methods for missing at random covariate data are applied, assuming a parametric model for the genetic covariate. Through simulation studies, we show that tests based on EPS data can have higher power than tests based on random samples to detect associations with common genetic variants. We also discuss how EPS data can be used in studies where the genotyped sample is used to analyze a different phenotype than the sampling-specific.

*Keywords:* Extreme phenotype; Genetic association study; Likelihood methods; Missing data

*Stepan Mazur, Bodnar, Parolya*, Statistics, Lund University

In this poster we consider the asymptotic distributions of functionals of the sample covariance matrix and the sample mean vector obtained under the assumption that the matrix of observations has a matrix variate general skew normal distribution. The central limit theorem is derived for the product of the sample covariance matrix and the sample mean vector. Moreover, we consider the product of an inverse covariance matrix and the mean vector for which the central limit theorem is established as well. All results are obtained under the large dimensional asymptotic regime where the dimension \(p\) and sample size \(n\) approach to infinity such that \(p/n \to c \in(0 ,1)\).

*Keywords:* large dimensional asymptotics; random matrix theory; Skew normal distribution; stochastic representation

*Kevin Burke, MacKenzie*, Mathematics and Statistics, University of Limerick

The Proportional Hazards (PH) model is defined by\[\lambda(t\,|\,x_i) = \exp(x_i^T \beta) \lambda_0(t)\]where \(x_i = (x_{i1},\ldots,x_{ip})^T\) and \(\beta = (\beta_1,\ldots,\beta_p)^T\) are the vectors of covariates and regression coefficients respectively and \(\lambda_0(t)\) is a baseline hazard function common to all individuals. This is the most popular regression model for survival data; however, non-PH effects are often encountered in practice.One common explanation for non-PH effects is the presence of unobservable subject-specific random effects whereby the individual hazard function is given by \[\lambda(t\,|\,x_i, U_i) = U_i \lambda(t\,|\,x_i).\] Here \(U_i\) is the random effect which is commonly assumed to have a gamma distribution. In this so-called “gamma-frailty”" model, covariate effects may be proportional at the individual level (if \(\lambda(t\,|\,x_i)\) is of the PH form) but are non-proportional at the marginal level, i.e., the individual level hazard ratio is\[\frac{\lambda(t\,|\, x_{i1}=1, \tilde{x}_i, U_i)}{\lambda(t\,|\, x_{i1}=0, \tilde{x}_i, U_i)} = \exp(\beta_1),\]where \(\tilde{x}_i\) denotes all covariates other than \(x_{i1}\), but it can be shown that the marginal hazard ratio (which is what we actually observe in practice) is\[\frac{\lambda_m(t\,|\, x_{i1}=1, \tilde{x}_i)}{\lambda_m(t\,|\, x_{i1}=0, \tilde{x}_i)} = \exp(\beta_1)\,\rho(t)\]where \(\rho(t)\) is a function of time and \(\lambda_m(t)\) is the marginal hazard function.An alternative explanation is that covariates truly exhibit non-PH effects at the individual level (and, hence, at the marginal level). We introduce the ``Multi-Parameter Regression’’ (MPR) model: a flexible parametric survival model which handles this situation.

Parametric regression models typically relate covariates to one distributional parameter of specific interest, for example, in generalized linear models the location parameter is regressed on covariates while other parameters (e.g., dispersion) are constant. In the context of parametric survival analysis, the most commonly used regression models relate covariates to a distributional scale parameter. A more flexible approach is to regress multiple parameters simultaneously on covariates; we refer to this practice as ``Multi-Parameter Regression’’ (MPR). For two-parameter (scale and shape) models we have \[g(\lambda_i) = x_i^T\beta, \quad\qquad h(\gamma_i) = z_i^T\alpha\]where \(\lambda\) and \(\gamma\) are the scale and shape parameters, \(g(\cdot)\) and \(h(\cdot)\) are link functions, and \(x_i\), \(\beta\), \(z_i\) and \(\alpha\) are vectors of scale and shape covariates and regression coefficients.Consider the Weibull model with hazard function parameterised as\[\lambda(t) = \lambda\,\gamma\,t^{\,\gamma-1}\]where \(\lambda, \gamma > 0\). Thus, using the log-link to ensure positivity of parameters, the Weibull MPR hazard is \[\lambda(t\,|\,x_i,\,z_i) = \exp(x_i^T\beta)\,\exp(z_i^T\alpha)\,t^{\,\exp(z_i^T\alpha)-1}\] and, hence, the hazard ratio is given by \[\frac{\lambda(t\,|\, c=1, \tilde{x}_i, \tilde{z}_i)}{\lambda(t\,|\, c=0, \tilde{x}_i, \tilde{z}_i)} = \exp(\beta_1+\alpha_1)\,t^{\,\exp(\tilde z_i^T\alpha)[\exp(\alpha_1)-1]}\]where \(c\) represents a covariate common to both the scale and shape regression components which, without the loss of generality, is assumed to be in the first position of the vectors \(x_i\) and \(z_i\) so that \(\beta_1\) and \(\alpha_1\) are its scale and shape coefficients respectively.

The MPR approach directly generalises the parametric PH model to non-PH (i.e., time-dependent effects) status and provides a new test of proportionality. This flexible regression model can produce dramatic improvements in fit compared to the basic PH model and its frailty extension. Combining the MPR and frailty approaches leads to a more general approach still. Interestingly, this MPR-frailty model outperforms the MPR model in the setting of our real data example showing that although both MPR and frailty approaches allow for non-PH effects, one does not abolish the need for the other.

Burke, K. and MacKenzie, G. (submitted). Multi-Parameter Regression Survival Models, Biometrics.

*Keywords:* frailty; non-proportional hazards; survival analysis; time-dependent effects

*Lina Dreižiene, Ducinskas*, Informatics and Statistics, Klaipeda University

The paper discusses the problem of classification of Gaussian Markov random field (GRMF) observation into one of two populations. The rule for classification is based on plug-in Bayes discriminant functions. The unknown parameters of populations are substituted by maximum likelihood estimators obtained from a training sample. Bayes risk is considered to be one of the natural performance measures for the Bayes discriminant functions therefore the closed-form expression for the Bayes risk associated with aforementioned classifier is derived and the approximation of the expected risk is proposed. A simulation experiment is performed to demonstrate the accuracy of the derived approximation. GMRF’s are sampled on regular 2-dimensional lattice with respect to the neighbourhood structure and the influence of statistical parameters to the proposed approximation is studied as well as the influence of different neighbourhood structures is analysed.

*Keywords:* Actual risk; Bayes discriminant function; Gaussian Markov random field;

*Henrike Häbel, Nordin, Rudemo, Särkkä*, Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg

Studies on colloidal cluster aggregation have brought forth theories on stability of colloidal gels and models for aggregation dynamics. However, the link between the theory describing the physics and chemistry involved in the gel formation and experimentally obtained observations is still incomplete. This poster presents our research on colloidal particle aggregation in three dimensions, where real aggregates of nano sized silica are investigated in comparison with two different models for particle aggregation. In particular, we study different silica particle weight percentages (5 and 9 wt%) and particle sizes (5 and 20 nm) under different aggregation conditions controlled by the water salinity (0.5 M and 0.9 M NaCI). The two analyzed aggregation processes are the diffusion limited cluster aggregation (DLCA) and the reaction limited cluster aggregation (RLCA). These processes are driven by the probability of particles to aggregate upon collision, which is one in the DLCA and close to zero in the RLCA regime. In order to compare experimentally obtained and simulated aggregates, we use three-dimensional micrographs obtained by high-angle annular dark field scanning transmission electron microscopy tomography and tools from spatial statistics summarizing the spatial characteristics of clusters of particles. The spatial statistical analysis includes common summary functions from spatial statistics such as the empty space function and Ripley’s \(K\)-function, as well as two recently developed summary functions for cluster analysis based on graph theory. As a result, an evaluation of the experimentally obtained aggregates is presented together with a comparison with the DLCA and RLCA aggregation models.

- C. Hamngren Blomqvist, C. Abrahamsson, T. Gebäck, A.Altskär, A.M. Hermansson, M. Nydén, S. Gustafsson, N. Lorén and E. Olsson (2015). ‘Pore size effects on convective flow and diffusion through nanoporous silica gels’.
*Colloids and Surfaces A: Physicochemical and Engineering Aspects*,**484**, 288–296. doi:10.1016/j.colsurfa.2015.07.032H.*Häbel, A. Särkkä, M. Rudemo, C. Hamngren Blomqvist, E. Olsson, C. Abrahamsson and M. Nordin (2015). ‘From static micrographs to particle aggregation dynamics in three dimensions’.*Journal of Microscopy*, Epub ahead of print. doi:10.1111/jmi.12349.

*Keywords:* Colloidal cluster aggregation; Silica particle gels; Spatial statistical cluster analysis;

*Anna-Kaisa Ylitalo, Särkkä, Puurtinen, Huovinen*, Department of Music, University of Jyväskylä

Eye-tracking is a method for recording the movements of the gaze with equipment specifically designed for this purpose. Eye movement data contain information on the stops of the gaze, called fixations, that the viewer makes while looking at a stimulus. This method has traditionally been used in psychological studies on text reading, but its applications into different fields or research are increasing. However, most of the analyses are still based on measures that aggregate in time, e.g., total number or the average duration of fixations. Thus, more sophisticated analytical tools are needed to truly get hold of these complex processes. In this study we examine eye movements during a special type of visual processing, namely the reading of music notation. Music reading has one characteristic that distinguishes it from the reading of text or looking at pictures: while reading and playing, the performance tempo does not allow the performer’s gaze to stay in one spot too long, but forces him to move forward according to the given time frame. Little is also known on how particular symbols in the notation affect the course of visual processing. Thus, the aim of this study is to develop and test ways to describe the visual processing by taking into account both the temporal limitations and complexity of the score. The data consist of pianists performing simple melodies on an electric piano. Both the eye movements and piano performances were recorded so that, at any given moment during a musical performance, we could measure how the location of the performer’s gaze on the musical score deviates from where the musical time is currently going. As the first step to approach this complex process, we illustrate the visual processing of music notation by merging fixation information with information on the musical performance. Next, we study the characteristics of fixations made during a performance task, and examine what kinds of features of the performed melody affect them and how. Finally, we aim at modelling the course of visual processing during music reading.

*Keywords:* eye-tracking; fixation; modelling; visual processing

*Anna Jansone, Valeinis*, University of Latvia/ Faculty of Physics and Mathematics

We analyze the Neyman’s smooth goodness-of-fit test for the composite hypothesis in the independent case and we propose its modification for the dependent case as well. Using Monte Carlo simulations the consistency of the proposed test is shown for some ARMA processes. We compare the proposed test with some classical tests for dependent observations by some moderate-sample simulation study.

- D.V. Hinkley D.R. Cox.
*Theoretical Statistics*. Chapman & Hall/CRC, 2000 - T. Ledwina T. Inglot, W.C.M. Kallenberg. Data Driven Smooth Tests for Composite Hypotheses.
*The Annals of Statistics*, 25(3):1222 – 1250, 1997 - Munk, A., Stockis, J. P., Valeinis, J., and Giese, G. (2011). Neyman smooth goodness-of-fit tests for the marginal distribution of dependent data.
*Annals of the Institute of Statistical Mathematics*, 63(5), 939-959

*Keywords:* ARMA process; data driven test; Neyman’s tests; Smooth test

*Hiba Nassar*, Lund University

In this work, we consider a version of the functional Hodrick-Prescott filter for functional time series. We show that the associated optimal smoothing operator preserves the ‘noise-to-signal ratio’ structure. Moreover, as the main result, we propose a consistent estimator of this optimal smoothing operator.

*Keywords:* adaptive estimation; functional Hodrick-Prescott filter; Hilbert space-valued Gaussian random variables;

*Nina Munkholt Jakobsen, Sørensen*, Department of Mathematical Sciences, University of Copenhagen

This poster is about parametric estimation for diffusion processes observed at high frequency over a fixed time interval. The processes are assumed to solve stochastic differential equations with an unknown parameter in the diffusion coefficient. We present conditions on approximate martingale estimating functions, under which the estimators are consistent, rate optimal, and efficient under high frequency (in-fill) asymptotics. The asymptotic distributions of the estimators are seen to be normal variance-mixtures, where the mixing distribution generally depends on the full sample path of the diffusion process over the observation time interval. Having utilized stable convergence in distribution, we also present the more easily applicable result that after a suitable data dependent normalization, the estimators converge in distribution to a standard normal distribution. The theory is illustrated by a simulation study that compares an efficient and a non-efficient estimating function for both an ergodic and a non-ergodic model. The work presented in this poster is the topic of the paper Jakobsen & Sørensen (2015).

N. M. Jakobsen & M. Sørensen (2015) ‘Efficient Estimation for Diffusions Sampled at High Frequency Over a Fixed Time Interval’. *Forthcoming in Bernoulli Journal*.

*Keywords:*

*Ann-Sophie Buchardt, Ekstrøm, Christensen*, Department of Mathematical Sciences, University of Copenhagen

We consider data from a randomised controlled trial at Copenhagen University Hospital that aims to determine if patients with acute leukaemia can benefit by a structured and supervised counselling and exercise program [1]. Intervention and control groups are followed over 12 weeks in order to evaluate the effect of exercise interventions and the M.D. Anderson Symptom Inventory (MDASI) is administered once a week.Based on methods such as hierarchical clustering we uncover clusters within the symptom responses and given different such clusterings we form models of varying statistical quality. The longitudinal nature of data gives rise to different dependence structures which should be assessed. Thus we compare results from the ordinal regression model which assumes that responses are independent, from the generalised linear mixed effects model which can be quite sensitive to variance structure specification, and from the generalised estimating equation which is used to estimate the parameters of a generalised linear model with a possible unknown correlation structure between responses.As well as composing a model based on a priori determined clusters we endeavour to uncover clusters from the modelling: Modelling interactions between time and item allows for simultaneous estimation of effects and identification of clusters using penalised approaches.Finally, to examine whether the patients can benefit by the program we analyse the data with a primary hypothesis about the contrast between the intervention/control group and we test the effect of additional covariates at hand.

- Jarden M, Møller T, Kjeldsen L, et al. Patient Activation through Counseling and Exercise–Acute Leukemia (PACE-AL)–a randomized controlled trial. BMC Cancer 2013;13:446. doi:10.1186/1471-2407-13-446.

*Keywords:*

*Kasper Kansanen, Vauhkonen, Lähivaara, Mehtätalo*, School of Computing, University of Eastern Finland

*Individual tree detection* (ITD) is a methodology where individual tree crowns are detected from remote sensing data of a forest algorithmically, for example by fitting geometrical crown models. ITD methods leave smaller trees hiding below larger ones undetected. This leads to a partially detected germ-grain model. Mehtätalo (2006) presented a formula for the probability of detection, so called *detectability*, of tree as function of the crown size, and a stand density (process intensity) estimator derived utilizing the detectability. The detectability and estimator were both based on the Boolean model. In this work we present the detectability and estimator of Mehtätalo, and develop a new method to formulate the detectability. We assume that a tree remains undetected if the center point of the crown fallswithin an erosion set based on the larger tree crowns. These estimators allow the tree to beundetected even if a portion of its crown would be visible. A Horvitz-Thompson type density estimator is also presented. The presented estimators were tested using 36 field plots and their behaviour quantified by RMSE and mean of estimation errors. Clark-Evans indices for plots were calculated and their relation with estimation errors examined.The best estimator performed well compared to the estimator formed directly from the number of algorithmically detected trees - it produced a 54 percent reduction in RMSE and shifted the mean oferrors notably towards zero.

L. Mehtätalo (2006) ‘Eliminating the effect of overlapping crowns from aerial inventory estimates’. *Canadian Journal of Forest Research*, **36(7)**: 1649-1660.

*Keywords:* individual tree detection; stochastic geometry;

*Sam-Erik Walker, Prof. Lid Hjort*, Dept. of Mathematics, Univ. of Oslo

We present and discuss model selection and focused inference in the context of model misspecification and presence of outliers using various robust estimators. Of particular interest are the classes of robust minimum distance disparity type estimators and robust M-estimators such as the minimum density power divergences. Theoretical developments and some practical examples will be presented.

*Keywords:* Focused inference; Model selection; Robust estimators;

*Mareile Große Ruse, Samson, Ditlevsen*, Department of Mathematical Sciences, University of Copenhagen

Biological data are frequently characterized by repeated measurements on a collection of subjects and models incorporating mixed effects have become a popular tool to capture both within- and between-subject variation. Models in which the within-subject variability is described by an ordinary differential equation (ODE) have found various applications, especially in pharmacokinetics/pharmacodynamics. A recent example is a study on the Selenium metabolism in human body, where gamma imaging data of gastrointestinal compartments and liver together with plasma samples is employed to study the initial Selenium kinetics and a delay-ODE model with mixed effects was successfully fitted. However, a more realistic and robust approach to modeling this kind of data is to include stochasticity in the model structure in order to account for uncertainty of the model itself (due to dynamics that may be unobserved or too complex to model). The natural stochastic generalization are the recently proposed stochastic differential mixed-effects models (SDMEMs). As opposed to their deterministic counterparts, theory on parameter estimation in this kind of models is still in its fledging stages, especially due to inherent difficulties such as unknown transition densities, challenging a standard maximum-likelihood estimation approach. We propose a method for maximum-likelihood parameter estimation in multi-dimensional inhomogeneous SDMEMs that are linear in the mixed effect but may be nonlinear in the state variable.

*Keywords:* approximate maximum likelihood; asymptotic normality; mixed effects; stochastic differential equations

*Oscar Espinosa*, Departamento de Estadística, Universidad Nacional de Colombia

This research shows that under certain mathematical conditions, a threshold autoregressive model (TAR) can represent the leverage effect based on its conditional variance function. Furthermore, the analytical expressions for the third and fourth moment of the TAR model are obtained when it is weakly stationary. This dissertation makes an empirical application, where TAR models are fitted using Nieto’s (2005) methodology and VAR-GARCH multivariate models are estimated through A-BEKK approach to the stock indexes in Brazil, Colombia and Japan. Finally, both statistical models are compared, via condi-tional and unconditional moments, and the representation of the leverage effect.

*Keywords:* Bayesian analysis; Leverage effect; Stationary nonlinear stochastic process; TAR model

*Raju Rimal, Prof. Sæbø*, IKBM, Norwegian University of Life Sciences

Data science is generating enormous amounts of data, and new and advanced analytical methods are constantly being developed to cope with the challenge of extracting information from such ‘big data’. Researchers often use simulated data to assess and document the properties of these new methods, and in this paper we present *simrel-m*, which is a versatile and transparent tool for simulating linear model data with extensive range of adjustable properties. The method is based on the concept of relevant components [1], which is equivalent to the envelope model [2]. It is a multi-response extension of *simrel* [3], and as *simrel* the new approach is essentially based on random rotations of latent relevant components to obtain a predictor matrix \(\mathbf{X}\), but in addition we introduce random rotations of latent components spanning a response space in order to obtain a multivariate response matrix \(\mathbf{Y}\). The properties of the linear relation between \(\mathbf{X}\) and \(\mathbf{Y}\) are defined by a small set of input parameters which allow versatile and adjustable simulations. Sub-space rotations also allow for generating data suitable for testing variable selection methods in multi-response settings. The method is implemented as an R-package which serves as an extension of the existing *simrel* package [3].

- Helland, I. S., & Almøy, T. (1994). ‘Comparison of prediction methods when only a few components are relevant’.
*Journal of the American Statistical Association*, 89(426), 583-591 . - Cook, R. D., Helland, I. S., & Su, Z. (2013). ‘Envelopes and partial least squares regression’.
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, 75(5), 851-877.Chicago - Sæbø, S., Almøy, T., & Helland, I. S. (2015). ‘simrel—A versatile tool for linear model data simulation based on the concept of a relevant subspace and relevant predictors’.
*Chemometrics and Intelligent Laboratory Systems*, 146, 128-135.

*Keywords:* Linear Models; Multi-Response Simulation; Multivariate Simulation; R-Packages

*Gudmund Hermansen, Hjort*, University of Oslo

We extend the focused information criterion (FIC) to general classes of time series processes. In particular, this framework includes models of the type \(Y_t = m(x_t, \beta) + \epsilon_t\), built up via trend functions with covariates and errors of the stationary Gaussian time series kind. Compared to other classical information criteria, like the AIC and BIC, the FIC allows the precise intention (i.e. focus) of the analysis to be taken into account when selecting the model. With dependent data series, several interesting foci are naturally related to future time points, i.e. predictions, or formulated conditional on past observations, e.g. like conditional threshold probabilities or conditional mean. Such foci requires a new and extended modelling framework, which leads to a proper generalisation of the original criterion. The methodology is illustrated and compared to traditional methods in both real and simulated data problems.

*Keywords:* focused inference; locally stationary processes; model selection; time series

*Kang Li, Ditlevsen, Bundesen*, MATH, University of Copenhagen

A fundamental question concerning the way the visual world is represented in our brain is how a cortical cell responds when its classical receptive field contains a plurality of stimuli. Two opposing models have been proposed. In the response-averaging model, the neuron responds with a weighted average of responses to all individual stimuli, \(\sum\beta_kI_k(t)\). By contrast, in the probability-mixing model, the cell responds to each single stimuli \(S_k(t)\) with probability \(\alpha_k\). Here \(S_k(t)\) is the \(k\)th stimulus and \(I_k(t)\) is the response to \(S_k(t)\). We apply the probability-mixing and the response-averaging model to leaky integrate-and-fire (LIF) neurons, to describe neuronal behavior based on observed spike trains. The first goal is to estimate parameters of either of the two models from spike train data. The second goal is to distinguish the two models via model selection.

Parameter estimation of the LIF model is done by maximum likelihood using first-passage time probabilities of diffusion processes. We solve the first-passage time problem using four numerical methods, two using Fokker-Planck related partial differential equations (PDE) and two using Volterra integral equations (IE), and compare the performance of the four methods. Different types of spike response kernels and stimuli are used in the LIF model, which generate distinct spiking patterns. We also describe and compare two alternative methods for maximizing the likelihood function of the probability-mixing model, which are direct maximization of the marginal likelihood and the Expectation-Maximization (EM) algorithm. Finally, we show that the probability-mixing model and the response-averaging model can be distinguished in the LIF framework by comparing parameter estimates and conducting model selection with deviance information criterion (DIC) and uniform residual tests.

*Keywords:* Model selection; Parameter estimation; Probability-mixing; Response-averaging

*Lars Erik Gangsei*, Chemistry, Biotechnology and Food Science, NMBU

This poster is about a 3D expansion of Dijkstra’s algorithm. In general Dijkstra’s algorithm is an algorithm for finding the shortest paths between nodes in a graph. The algorithm is a standard tool in (2D) image analysis, where the shortest path represents the border of objects of interest. In the 3D case some conditions of the original Dijkstra’s algorithm are clarified to ensure that the “shortest path” represent a complete connected surface.The algorithm is used for automatic segmentation and identification of the bones in CT images of live pigs, where the identified bones constitutes the core for further detailed analysis. For illustration, an example of segmentation of the shoulderblade and overarm bone is shown.

- Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische mathematik, 1:269–271.
- L. E. & Kongsro, J. (2016). Automatic segmentation of Computed Tomography (CT) images of domestic pig skeleton using a 3D expansion of Dijkstra’s algorithm. Computers and Electronics in Agriculture, 121:191-194.

*Keywords:* Computed Tomography; Dijkstra’s algorithm; Image analysis;

*Karl Bang Christensen*, Department of Biostatistics, University of Copenhagen

Confirmatory factor analysis (CFA) models provide a framework where factor structure based on prior knowledge is postulated and tested against data. They were originally developed for continuous indicators, but have been generalized to ordinal categorical indictors. In many research areas, notably educational testing, the Rasch (1960) model is often preferred due to the sufficiency of the (number correct) score. This poster describes a class of confirmatory factor analysis models that are based on the Rasch model and use the sufficiency property of the Rasch model for likelihood inference without relying on distributional assumptions. The confirmatory factor analysis (CFA) model is written in matrix form \[\mathbf{x}=\boldsymbol\beta+\boldsymbol\Lambda\boldsymbol\theta+\boldsymbol\epsilon\] where \(\mathbf{x}=(x_i)_{i\in I}\) is the vector of observed indicators, \(\boldsymbol\theta=(\theta_d)_{d\in D}\) the vector of latent variables and \(\boldsymbol\Lambda=(\lambda_{id})_{i\in I;d\in D}\) the matrix of factor loadings. Assuming that \(\boldsymbol\theta\sim N(0,\Sigma_{\theta})\) and that \(\boldsymbol\epsilon\sim N(0,\Sigma_{\epsilon})\) the struture imposed on \(COV(\mathbf{x})\) is tested against the emprical covariance. An often specified structure is based on \(\lambda_{id}>0\) for \(i\in I_d\) and zero otherwise for a partition \(I=\cup_{d=1}^DI_d\) of the set of indicators into disjoint sub sets. A CFA model for dichotomous indicators has been proposed based on a CFA model for \(\mathbf{x}^*\) and the assumption that \[x_i=\left\{ \begin{array}{ll} 1, & \hbox{if }x^*_i>\tau_i \\ 0, & \hbox{otherwise} \end{array}\right.\] This corresponds to a CFA model based on tetrachoric correlations (Muthén, 1978). The Rasch confirmatory factor analysis model is very similar in structure, but uses a logit rather than a probit link and imposes restriction on the factor loadings. It is given by \[P(\mathbf{X}=\mathbf{x}|\boldsymbol\theta)\propto\exp(\mathbf{t}'\boldsymbol\theta-\mathbf{x}'\boldsymbol\beta)\] where, in the simplest version, and \(\mathbf{t}=(t_d)_{d\in D}\) with \(t_d=\sum_{i\in I_d}x_i\) making \(\mathbf{t}\) a sufficient statistic for \(\boldsymbol\theta\). This yields a class of models where conditional inference without distributional assumptions can be applied. The poster illustrates how likelihood inference based on the extended lkelihood function (Tjur, 1982) can be applied in this class of models.

- Muthén, B. (1978). Contributions to factor analysis of dichotomous variables.
- Psychometrika, 43, 551-560.
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish National Institute for Educational Research.
- Tjur, T. (1982). A Connection between Rasch’s item analysis model and a multiplicative Poisson model. Scandinavian Journal of Statistics, 9, 23–30.

*Keywords:* CFA; latent variable; Rasch model;

*Claes Andersson, Rajala, Särkkä*, Mathematical Sciences, Chalmers University of Technology

Diabetic neuropathy is a nerve damaging disorder caused by diabetes. Symptoms associated with the disorder are, among others, painful polyneuropathy and loss of sensation. Although there is no treatment for the disorder, measures can be taken to slow down the progress. Therefore, it is of importance to detect it at an early stage. Several studies have established the diagnostic value of epidermal nerve fibers (ENFs), the thin sensory nerve fibers in the outermost layer of the skin - the epidermis. These can be studied by acquiring skin biopsies in which the nerve fibers are traced using microscopy. It has been found that neuropathy is associated with a decrease in the number of nerve fibers per area of the skin, and a majority of the research has been focused on establishing a normative range for the nerve fiber density. Other studies have observed changes in the spatial structure of the nerve fiber patterns that seem to occur while the nerve count remains within the normative range. This was done using methods from spatial statistics, viewing the locations where the nerve fibers enter the epidermis and where they terminate, typically after branching, as realizations of spatial point processes. However, there has been little research on how single nerve fibers are affected by the neuropathy. In this work we investigate whether the data about the individual nerves can be used to discriminate between healthy subjects and subjects with diabetic neuropathy. The ENFs enter the epidermis as single nerve fibers cross the junction between the epidermis and the dermis - the skin layer below the epidermis. They then extend into the epidermis, with or without branching, before they terminate. Thus, they form tree like structures in the skin. We introduce summary measures both for the whole trees and for the individual segments a tree is composed of. We then construct hierarchical models for these summary measures, together with the number of nerves, in which each subject is assumed to belong to one of \(n\) groups. The models are fitted using the EM-algorithm, and based on the fitted models we obtain a partition of the subjects. That is, we use the models to do unsupervised classification of the subjects. The models are then evaluated in terms of how well they separate the diabetic subjects from the healthy ones. We find that while the number of nerves is still the most important factor for detecting the neuropathy, using the information in the tree structures yields a large improvement. In particular, the information in the nerve trees is often a decisive factor for grouping diabetic subjects with nerve counts in the lower regime of the normative range together with the subjects showing more apparent signs of the neuropathy.

- W.R. Kennedy, G. Wendelschafer-Crabb and T. Johnson (1996) ‘Quantitation of epidermal nerves in diabetic neuropathy’.
*Neurology*,**47**(4):1042-1048. - G. Lauria, M. Bakkers, C. Schmitz, R. Lombardi, P. Penza, G. Devigili, A.G. Smith, S.T. Hsieh, S.I. Mellgren, T. Umapathi and D. Ziegler (2010) ‘Intraepidermal nerve fiber density at the distal leg: a worldwide normative reference study’.
*Journal of the Peripheral Nervous System*,**15**(3):202-207 - L.A. Waller, A. Särkkä, V. Olsbo, M. Myllymäki, I.G. Panoutsopoulou, W.R. Kennedy and G. Wendelschafer-Crabb (2011) ‘Second-order spatial analysis of epidermal nerve fibers’.
*Statistics in Medicine***30**(23): 2827-2841. - M. Myllymäki, I.G. Panoutsopoulou and Särkkä, A (2012) ‘Analysis of spatial structure of epidermal nerve entry point patterns based on replicated data’.
*Journal of microscopy*,**247**(3):228-239.

*Keywords:* Diabetic neuropathy; Epidermal nerve fibers; Hierarchical models; Unsupervised classification

*Tatjana Pavlenko, Natalia Stepanova*, Mathematics, KTH Royal Institute of Technology

A new class of goodness-of-fit test statistics based on sup-functionals of weighted empirical pocesses is proposed. The weight functions used are Erdös-Feller-Kolmogorov-Petrovski upper cass functions of a Brownian bridge. The whole class of test statistics is shown to be consistent against a fixed alternative. The proposed test is shown to achieve the detection boundary found by Ingster [Math. Meth. Statist. 6 (1997) 47-69] and, when distinguishing between the null and alternative hypotheses, perform optimally adaptively to unknown sparsity and size of the non-null effects. We show that our test can be used for detecting sparse heterogeneous mixtures in high-dimensional settings where the dimensionality greatly exceeds the sample size. Examples include the problem of feature selection in a high-dimensional classification model when informative features are rare and weak.

*Keywords:* Feature selection; High-dimensional classification; Large scale inference; Rare and weak effects

*Roza Maghsood, Wallin*, Mathematical Statistics, Chalmers university

Driving events, such as maneuvers at slow speed and turns, are important for durability assessments of vehicle components. By counting the number of driving events, one can estimate the fatigue damage caused by the same kind of events. By knowledge of the distribution of driving events for a group of customers, the vehicles producers can tailor the design, of vehicles, for set group. In this article, we propose an algorithm that can be applied on-board a vehicle to online estimate the expected number of driving events occurring. And thus can be used to estimate the distribution of driving events for group of customers. Since the driving events are not directly observed, the algorithm uses a hidden Markov model to extract the events. The parameters of the HMM are estimated using an online EM algorithm. The introduction of the online version allows that the algorithm can be installed on-board vehicles, which was not possible previously due to high computational complexity of the regular EM algorithm. Typically, the EM algorithm is used to estimate a fixed parameter. By setting a, non decaying, forgetting factor, the online EM becomes an adaptive algorithm. This is important in practice since the driving conditions changes over time and a single trip can contain different road types such as city and highway, making the assumption of fixed parameters unrealistic. Finally, we also derive a method to online compute the expected damage.

- O.Cappe’. (2011) ‘Online EM algorithm for hidden Markov models’.
*Journal of Computational and Graphical Statistics*, ** 20:3:728-749**. - O.Cappe’, E.Moulines, and T.Ryde’n, editors. (2005) ‘Inference in Hidden Markov Models’. * Springer*
- R.Maghsood, I.Rychlik, and J.Wallin. (2015) ‘Modeling extreme loads acting on steering components using driving events’.
*Probabilistic Engineering Mechanics*,**41**.

*Keywords:* driving events; expected damage; Hidden Markov models; online EM algorithm

*Per Arnqvist, SdL Sjöstedt-de Luna*, Mathematics and mathematical statistics, Umeå university

A model based functional method to cluster functions into k hidden groups is suggested. The functions are represented by B-splines and modeled in a likelihood approach. The stochastic spline coefficients are modeled using a normal assumption where for each cluster different covariance structures are allowed. An EM-algorithm inspired by the ideas from James and Sugar, (JASA 2003), is derived and implemented. The model allow unequally spaced together with different number of observations of the sampled functions. The model also admit to include covariates. The method is illustrated by analyzing a varved lake sediment core from the lake Kassjön (N. Sweden). The sediment record consists of~6400 varves, each varve around 0.5-1 mm thick. Image analysis was used to generate the observed data of yearly profiles (in terms of grey-scale variation) and varve thickness was measured as the number of pixels within a varve (year).

*Keywords:* climate change; functional data; model based clustering;

*Natalya Pya*, Mathematical Sciences, Nazarbayev University

Epidemiologic dose-response modelling is very useful for causal inference and for assessing relative risks.Typically, categorical analysis and simple parametric models for evaluating dose-response relationship are used by epidemiologists. In this work an approach to constraining dose-response relationship to be monotonic is presented. In addition we show how the relative risks can be evaluated through the derivatives. Shape constrained additive modelling (SCAM) with constraints on the smooth terms of the linear predictor is employed which can be split into three stages: representation of constrained smooth model terms using re-parameterized P-splines, model coefficient estimation by penalized log likelihood maximization, and smoothness selection by minimization of the generalized cross validation score (or similar criteria) (Pya and Wood, 2015). In comparison with parametric regression, SCAM allows for much more flexibility in building the relationship between the response and explanatory variables.This approach is applied to particulate pollution (PM10) and respiratory mortality counts for seniors in greater London (= 65 years) during 2002-2005.

- Pya, N. and Wood, S.N. (2015) ``Shape constrained additive models’’.
*Statistics and Computing*, 25(3), 543-559. - Wood S. (2006)
*Generalized Additive Models. An Introduction with R.*- Boca Raton: Chapman and Hall

*Keywords:* additive modelling; P-splines; shape constraints;

*Sandra Eriksson Barman, DB Bolin, HR Rootzén*, Mathematical sciences, Chalmers University of Technology

Drug release from oral tablets is determined by the porous structure of tablet coatings. Since the pore structure can be controlled by changing the manufacturing parameters, understanding how the pore structure affects transport properties is important for designing coatings with desired properties. We use a spatial statistical model to investigate how the transport properties of ethylcellulose/hydroxypropylcellulose coatings depend on the pore structure. The model is a thresholded Gaussian Markov random field, where the random field is non-stationary. We generate structures from the model that have varying pore sizes and shapes, and analyze statistically how the transport properties depend on the pore structure by using numerically simulated diffusion through the generated structures. We use a Markov Chain Monte Carlo algorithm to fit the model to confocal laser scanning microscope images of the coatings. The model is found to fit stationary parts of the images well.

*Keywords:* Gaussian Markov random fields; Markov Chain Monte Carlo; Pharmaceutical coatings; Porous materials

*Jelena Valkovska, Valeinis*, , University of Latvia

The receiver operating characteristic (ROC) curves is a popular method for diagnostic tests accuracy, it is broadly used in medical studies, machine learning, decision making, etc. The quantitative assessment of the diagnostic accuracy of a diagnostic test is the area under the ROC curve (AUC). Popular methods for constructing confidence intervals for ROC curves, whole and partial AUC and their differences are: empirical likelihood (EL), jackknife empirical likelihood (JEL) and smoothed jackknife empirical likelihood (SJEL). The poster will summarize the methods mentioned above for the estimation of ROC curves and AUC. The comparsion of two and more ROC curves also will be presented using the confidence intervals and the confidence bands as well. Some new results for the partial AUC using SJEL method will be presented. In many situations the diagnostic decisions are not limited with a binary choice, so the notions of ROC curves and AUC were extended for the tests with three diagnostics categories - the ROC surface and the volume under the surface (VUS). These methods will be represented on the poster too.

*Keywords:* area under the ROC curve; ROC curves; ROC surface; smoothed jackknife empirical likelihood

*Per Westerlund, Hilber, Lindquist*, Electromagnetic engineering, KTH Royal Institute of Technology

In maintenance planning a crucial question is when some asset should be maintained. As preventive maintenance often needs outages it is necessary to predict the condition of an asset until the next possible outage. The degradation of the condition can be modelled by a linear function. One method of estimating the condition is linear regression, which requires a number of measured values for different times and gives an interval within which the asset will reach a condition when it should be maintained [1]. A more sophisticated calculation of the uncertainty of the regression is presented based on [2, section 9.1]. Another method is martingale theory [3, chapter 24], which serves to deduce a formula for the time such that there is a probability of less than a given \(\alpha\) that the condition has reached 0 before that time. The formula contains an integral, which is evaluated numerically for different values of the measurement variance and the variance of the Brownian motion, which must be estimated by knowing the maximum and the minimum degradation per time interval. Then just one measured value is needed together with an estimate of the variance. The two methods are compared, especially with regard to the size of the confidence interval of the time when the condition reaches a predefined level. The application for the methods is the development of so called health indices for the assets in an engineering system, which should tell which asset need maintenance first. We present some requirements for a health index and check how the different predictions fulfil these requirements.

- S.E. Rudd, V.M. Catterson, S.D.J. McArthur, and C. Johnstone. Circuit breaker prognostics using SF<sub>6</sub> data. In IEEE Power and Energy Society General Meeting, Detroit, MI, United States, 2011.
- Bernard W. Lindgren. Statistical theory. Macmillan, New York, 2nd edition, 1968.
- Jean Jacod and Philip Protter. Probability essentials. Springer-Verlag, Berlin, 2000.

*Keywords:* health index; linear regression; maintenance planning; martingale theory

*Jonas Wallin, Bolin*, Mathematical statistics, Chalmers/GU

Markov chain Monte Carlo (MCMC) algorithms are widely used for sampling from complicated distributions. One of the most popular MCMC algorithms is the Metropolis Hasting random walk algorithm (MHRW). The popularity is due to that it is very easy to impliment. A sample of the typical MHRW is generated as follows: Given a density \(\pi\) and a previous sample \(x^{(t-1)}\), a new sample,\(x^{(t)}\), is generated by: \[ x^* \sim N(x^{(t-1)}, \Sigma), \] then \(x^{(t)}=x^*\) if \(\log(U) < \frac{\pi(x^*)}{\pi(x)}\) where \(U \sim U[0,1]\), else \(x^{(t)}=x^{(t-1)}.\) For high dimensional distribution it is crucial that \(\Sigma\) is well chosen. The classical adaptive MHRW sets \(\Sigma\) to the empirical covariance matrix of the previous samples of the algorithm. However, as the dimension grows larger the empirical covariance is slow to converge and also sampling from the multivariate normal becomes expensive . To adress this two issues, we propose an parallelizable algorithm to estimate a sparse approximation of the inverse of the covariance matrix. We show, by example, that the method outperforms the regular adaptive MCMC.

*Keywords:* AMCMC; Cholesky estimation; Online estimation; Partial correlation

*Robyn Stuart, Froyland, Pollett*, Mathematical Sciences, University of Copenhagen

Metastable systems have dynamical processes operating on two or more different timescales.Typically there are fast processes which mix the state space and produce fast decorrelation of observables, and there are slow processes that often control switching between different (larger) parts of the state-space. Computational approaches to determine metastable sets as well as (almost) cyclic components are typically based around numerically accessible discretisations of the Perron-Frobenius operator. In this work, we adapt these methods to systems with leaks, and quantify the effect of the leakiness on the metastability properties of the system. We illustrate an application of these methods to a periodically forced two-dimensional system of ODEs.

*Keywords:*

*Abhishek Pal Majumder, Prof. Hannig*, Department of Mathematical Science, University of Copenhagen

Generalized Fiducial Inference () Hannig (2009) is motivated by R.A. Fisher’s approach Fisher (1933,1935) of obtaining posterior-like distributions when there is no prior information available for the unknown parameter. Without using Bayes’ theorem GFI proposes a distribution on the parameter space. In this poster we studied frequentist exactness property of Generalized Fiducial Distribution (GFD) for uni-parameter cases along with several positive examples (including shape parameter of Gamma) under a monotonicity assumption which fails to hold in some non-regular situations like mean of a scaled normal population or correlation coefficient in bivariate normal examples. Motivated by them we analyzed general regularity conditions under which GFD becomes first and second order exact.

- Jan Hannig,“On generalized fiducial inference.’ Statistica Sinica, 19(2):491, 2009.
- R.A.Fisher, “The concepts of inverse probability and fiducial probability referring to unknown parameters.’ Proceedings of the Royal Society of London. Series A,139(838):343–348, 1933.
- R.A.Fisher, “The fiducial argument in statistical inference.’Annals of Eugenics, 6(4):391–398, 1935.

*Keywords:* Bayes’ Theorem; Fiducial Distribution;

*Liis Starkopf, A Gerds, Lange*, Section of Biostatistics, University of Copenhagen

The advances in the causal inference literature, in particularcounterfactual framework, have led to precise definitions andconditions for identification and estimation of direct andindirect effects in the mediation analysis. This development has also led to a number ofdistinct estimation strategies along with software implementations. In this paper we compare the important estimation approaches, both interms of mathematical details and small sample behaviour (i.esimulation study). We examine the following:

- Traditional product of coefficients approach described by Baron and Kenny.
- Closed-form expressions derived from the mediation formula. Software solutions have been made available by Valeri et al. as SAS and SPSS macros.
- Generic approach based on Monte-Carlo integration methods implemented by Tingley et al. in the R package
**mediation**. - Approaches proposed by Lange, Vasteelandt et al. using natural effect models for nested counterfactuals implemented in the R package
**medflex**. - The inverse of odds ratio-weighted estimation proposed by Tchetgen Tchetgen.

- Tyler VanderWeele and Stijn Vansteelandt. ‘Conceptual issues concerning mediation, interventionsand composition.’
*Statistics and its Interface*,**2:457–468**, 2009. - Judea Pearl. ‘Direct and indirect effects.’ In
*Proceedings of the seventeenth conference onuncertainty in artificial intelligence*, pages 411–420. Morgan Kaufmann Publishers Inc., 2001. - Kosuke Imai, Luke Keele, and Teppei Yamamoto. ‘Identification, inference and sensitivityanalysis for causal mediation effects.’
*Statistical Science*, pages 51–71, 2010. - Reuben M Baron and David A Kenny. ‘The moderator–mediator variable distinction in socialpsychological research: Conceptual, strategic, and statistical considerations.’
*Journal ofpersonality and social psychology*,**51(6):1173**, 1986. - Linda Valeri and Tyler J VanderWeele. ‘Mediation analysis allowing for exposure–mediatorinteractions and causal interpretation: Theoretical assumptions and implementation with sasand spss macros.’
*Psychological methods*,**18(2):137**, 2013. - Dustin Tingley, Teppei Yamamoto, Kentaro Hirose, Luke Keele, and Kosuke Imai. ‘
**mediation**:R package for causal mediation analysis.’*Journal of Statistical Software*,**59(5):1–38**, 2014. - Theis Lange, Stijn Vansteelandt, and Maarten Bekaert. ‘A simple unified approach for estimatingnatural direct and indirect effects.’
*American journal of epidemiology*,**176(3):190–195**,2012. - Johan Steen, Tom Loeys, Beatrijs Moerkerke, and Stijn Vansteelandt. ‘
**medflex**: Flexiblemediation analysis using natural effect models’, 2015. R package version 0.5-0. - Tchetgen Tchetgen, Eric J. ‘Inverse odds ratio-weighted estimation for causal mediation analysis.’
*Statistics in medicine***32.26 (2013): 4567-4580**.

*Keywords:* mediation analysis;

*Mingan Yang*, Biostat, San Diego State University

In this article, we use Bayesian nonparametric approach for joint variable selec-

tion of both xed and random eects in presence of nonignorable dropout. We integrate both shrinkage prior, which is expressed as a scale mixture of normal distributions, and the mixed G-prior in our approach. By this way, we greatly improve eciency and accuracy. In particular, we show that our approach greatly decreases bias in estimation and selection for nonignorable missing data. A stochastic search Gibbs sampler is implemented for variable selection. We further illustrate the proposed approach using simulated data and a real example.

*Keywords:* Bayesian model selection; mixed effects model; stochastic search; variable selection

*Øyvind Hoveid*, NIBIO

In linear state space models estimation of parameters, \(\theta\), presumes that all latent variables, \(x\), are Gaussian and can be integrated out analytically so that the likelihood of observations in the learning set, \(y\), conditional on parameters, \(\pi(y|\theta) = \int \pi(y|x,\theta) \pi(x|\theta) dx\), can be maximized. In models with state variables that are non-linearly related or non-Gaussian distributed for theoretical reasons, analytic integration is in general impossible. In such cases also the parameters need be considered stochastic and furnished with informative priors, \(\pi(\theta|\psi)\), that keep the joint likelihood, \(\pi(y|x,\theta) \pi(x|\theta) \pi(\theta|\psi)\) well-defined and bounded for given hyper-parameters, \(\psi\). Typically, determinants of precision matrices should be non-zero. Inverse Wishart distributions as priors ensure this requirement.

A general Bayesian procedure for estimation is to compute the Laplace approximate distribution at the mode of the joint likelihood, and recover approximations to the posterior distribution \(\pi(x,\theta|y)\), with importance sampling from that of Laplace. However, the Bayesian procedure does not respect the non-linearities of the prior, and thus the posterior is not useful as prior for another batch of observations. The remedy is to compute an approximate posterior distribution, \(\pi(x,\theta|\widehat{\psi})\), which on the one hand is as close as possible in Kullback-Leibler divergence to the true posterior, and on the other hand respect non-linearities of the prior with a new vector of (posterior) hyper-parameters, \(\widehat{\psi}\). Effectively, the likelihood of posterior hyper-parameters is maximized over a weighted sample from the Laplace distribution. Additional batches of observations can then provide further improvements of the posterior.

The procedure is demonstrated with a case of food prices determined in dynamic stochastic equilibrium.

*Keywords:*

*Jouni Helske, Vihola*, Department of Mathematics and Statistics, University of Jyväskylä

State space models offer a flexible framework for time series analysis. Popular special cases include Gaussian structural time series models (Harvey 1989) and their non-Gaussian generalizations. This poster introduces a new `R`

(R Core Team 2016) package `bssm`

(Helske and Vihola 2016) for Bayesian analysis of exponential family state space models where the state dynamics are Gaussian but the observational density is either Gaussian, Poisson, binomial, or negative binomial. The main focus is in easy-to-use and efficient forecasting functions for structural time series models. The general form of the supported models is \[
\begin{aligned}
p(y_t | Z_t \alpha_t, x'_t\beta) \qquad (\textrm{observation equation})\\
\alpha_{t+1} = T_t \alpha_t + R_t \eta_t, \qquad (\textrm{transition equation})
\end{aligned}
\] where \(\eta_t \sim N(0, I_k)\) and \(\alpha_1 \sim N(a_1, P_1)\) independently of each other, \(x'_t\beta\) contains the regression component at time \(t\), and \(p(y_t | Z_t \alpha_t, x'_t\beta)\) belongs to the exponential family. The Bayesian machinery behind the `bssm`

is based on the Markov Chain Monte Carlo (MCMC) framework, using adaptive random walk Metropolis updating by RAM algorithm (Vihola 2012). This allows robust modelling where the user does not need to be too concerned about the ‘tuning’ of the MCMC algoritm. For computational efficiency, all main algorithms are written in C++. We introduce the main algoritms behind the `bssm`

package, and give a short illustration on modelling the yearly deaths by drowning in Finland.

- Harvey, Andrew C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.
- Helske, Jouni, and Matti Vihola (2016). bssm: Bayesian Inference of Exponential Family State Space Models. R package version 0.0.3. http://www.github.com/helske/bssm.
- R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
- Vihola, Matti. 2012. “Robust Adaptive Metropolis Algorithm with Coerced Acceptance Rate.” Statistics and Computing 22 (5): 997–1008. doi:10.1007/s11222-011-9269-5.

*Keywords:* Bayesian inference; forecasting; R; time series

*Gry Sparre, Christensen*, University of Copenhagen

In recent years there has been a growing interest in the exercise oncology field, where physical activity has been used as a strategy for rehabilitation in cancer patients to remedy disease and treatment related symptoms and side effects.

Patients with acute leukemia experience a substantial symptom burden and are at risk of developing infections throughout the course of treatment consisting of repeated cycles of intensive chemotherapy. To date, there are no clinical practice exercise guidelines for patients with acute leukemia undergoing the first parts of chemotherapy. Patients with prostate cancer is treated with Androgen Deprivation Therapy (ADT), but adverse musculoskeletal and cardiovascular effects of ADT are widely reported.

In this thesis, we will study four data sets from two studies. For the 'FC Prostate' study we use Role Emotional functioning data and Fatigue data, for the PACE-AL study we use Patient Activation Measure (PAM) and General Self-Efficacy (GSE) data. We examine the longitudinal person reported outcomes of these questionnaires concerning their health status. The patients are divided into two groups, A or B. To examine the data we use Item Response Theory models and to see if there is an effect of exercise treatment we use different kinds of Item Response Theory Models, as well as latent regression models. We will show how to estimate the IRT models in theory, but also give applied examples in \texttt{R} using different packages.

Through the analysis of the data sets we found that there were no significant difference between the groups for the Role Emotional data, but a single patient experienced an increase in the level of health over time. The model used for the Role Emotional Functioning did not fulfill the assumption about local independence, so we cannot trust this result. For the Fatigue questionnaire we found only one patient in group A, to have an increase in the level of fatigue over time, whereas two patients had a decrease of the level of fatigue, one from each group. The Divide-by-total model and the Difference model led to the same conclusion, namely that there were no significant difference between the groups.

The best model for the Patient Activation Measure data did not fulfill the local independence assumption but we solved this by removing two items from the analysis. None of the patients had a significant change in their level of health over time. The analysis showed no significant difference between the groups. For the General Self-Efficacy data we found a significant difference between the groups for the Divide-by-total models, and we found that 4 patients had a decrease in level of self-efficacy, two in each group, while 14 patients had an increase in the level of self-efficacy over time, with 13 of those in group A. Since the model did not fulfill the assumptions about local independence, we cannot trust the results.

We compared the packages used for the IRT analysis and we
found the `TAM`

package to be the most reliable, stable and
versatile package.

For further research we suggest implementation of the
two-dimensional Graded Response Model in `R`

, in order to be
able to compare the two types of IRT models. We also suggest a
theoretical boundary of the Q3 statistic, since now we only have a
rule of thumb, and this might not hold for smaller data sets.