Abstracts 2006 |
In many applications it is known that the underlying smooth function is constrained to have a specific form. In the present paper, we propose an estimation method based on the regression spline approach, which allows to include concavity or convexity constraints in an appealing way. Instead of using linear or quadratic programming routines, we handle the required inequality constraints on basis coefficients by boosting techniques. Therefore, recently developed componentwise boosting methods for regression purposes are applied, which allow to control the restrictions in each iteration. The proposed approach is compared to several competitors in a simulation study. We also consider a real world data set.
Key words: Shape constrained smoothing, Concavity, Regression splines, Boosting.
In general, risk of an extreme outcome in financial markets can be expressed as a function of the tail copula of a high-dimensional vector after standardizing marginals. Hence it is of importance to model and estimate tail copulas. Even for moderate dimension, nonparametrically estimating a tail copula is very inefficient and fitting a parametric model to tail copulas is not robust. In this paper we propose a semi-parametric model for tail copulas via an elliptical copula. Based on this model assumption, we propose a novel estimator for the tail copula, which proves favourable compared to the empirical tail copula, both theoretically and empirically.
For an AR(1) process with ARCH(1) errors, we propose empirical likelihood tests for testing whether the sequence is strictly stationary but has infinite variance, or the sequence is an ARCH(1) sequence or the sequence is an iid sequence. Moreover, an empirical likelihood based confidence interval for the parameter in the AR part is proposed. All of these results do not require more than a finite second moment of the innovations. This includes the case of t-innovations for any degree of freedom larger than 2, which serves as a prominent model for real data.
Recently there has been an increasing interest in applying elliptical distributions to risk management. Under weak conditions, Hult and Lindskog (2002) showed that a random vector with an elliptical distribution is in the domain of attraction of a multivariate extreme value distribution. In this paper we study two estimators for the tail dependence function, which are based on extreme value theory and the structure of an elliptical distribution, respectively. After deriving second order regular variation estimates and proving asymptotic normality for both estimators, we show that the estimator based on the structure of an elliptical distribution is better than that based on extreme value theory in terms of both asymptotic variance and optimal asymptotic mean squared error.Our theoretical results are confirmed by a simulation study.
In this article we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric predictor. We extend existing LVM with simple linear covariate effects by including nonparametric components for nonlinear effects of continuous covariates and interactions with other covariates as well as spatial effects. Full Bayesian modelling is based on penalized spline and Markov random field priors and is performed by computationally efficient Markov chain Monte Carlo (MCMC) methods. We apply our approach to a large German social science survey which motivated our methodological development.
Keywords: Latent variable models, mixed responses, penalized splines, spatial effects, MCMC.
Microaggregation is one of the most important statistical disclosure control techniques for continuous data. The basic principle of microaggregation is to group the observations in a data set and to replace them by their corresponding group means. In this paper, we consider single-axis sorting, a frequently applied microaggregation technique where the formation of groups depends on the magnitude of a sorting variable related to the variables in the data set. The paper deals with the impact of this technique on a linear model in continuous variables. We show that parameter estimates are asymptotically biased if the sorting variable depends on the response variable of the linear model. Using this result, we develop a consistent estimator that removes the aggregation bias. Moreover, we derive the asymptotic covariance matrix of the corrected least squares estimator.
Keywords: Asymptotic variance, consistent estimation, disclosure control, linear model, microaggregation, sorting variable.
Most epidemiological studies suffer from misclassification in the response and/or the covariates. Since ignoring misclassification induces bias on the parameter estimates, correction for such errors is important. For measurement error, the continuous analog to misclassification, a general approach for bias correction is the SIMEX (simulation extrapolation) originally suggested by Cook and Stefanski (1994). This approach has been recently extended to regression models with a possibly misclassified categorical response and/or the covariates by Küchenhoff et al. (2005), and is called the MC-SIMEX approach. To assess the importance of a regressor not only its (corrected) estimate is needed, but also its standard error. For the original SIMEX approach. Carroll et al. (1996) developed a method for estimating the asymptotic variance. Here we derive the asymptotic variance estimators for the MC-SIMEX approach, extending the methodology of Carroll et al. (1996). We also include the case where the misclassification probabilities are estimated by a validation study. An extensive simulation study shows the good performance of our approach. The approach is illustrated using an example in caries research including a logistic regression model, where the response and a binary covariate are possibly misclassified.
Keywords: Misclassification, SIMEX approach, variance estimation.
Count data often exhibit overdispersion and/or require an adjustment for zero outcomes with respect to a Poisson model. Zero-modified Poisson (ZMP) and zero-modified generalized Poisson (ZMGP) regression models are useful classes of models for such data. In the literature so far only score tests are used for testing the necessity of this adjustment. For this testing problem we show how poor the performance of the corresponding score test can be in comparison to the performance of Wald and likelihood ratio (LR) tests through a simulation study. In particular, the score test in the ZMP case results in a power loss of 47% compared to the Wald test in the worst case, while in the ZMGP case the worst loss is 87%. Therefore, regardless of the computational advantage of score tests, the loss in power compared to the Wald and LR tests should not be neglected and these much more powerful alternatives should be used instead. We also prove consistency and asymptotic normality of the maximum likelihood estimators in the above mentioned regression models to give a theoretical justification for Wald and likelihood ratio tests.
In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed.
On the other hand, extra spatial variability in the data is taken into account by adding spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. (2002).
In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. (2002) and using proper scoring rules, see for example Gneiting and Raftery (2004).
We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, a spatial Poisson model is to be preferred over all other models according to the considered criteria.
Binary outcomes that depend on an ordinal predictor in a non-monotonic way are common in medical data analysis. Such patterns can be addressed in terms of cutpoints: for example, one looks for two cutpoints that define an interval in the range of the ordinal predictor for which the probability of a positive outcome is particularly high (or low). A chi-square test may then be performed to compare the proportions of positive outcomes in and outside this interval. However, if the two cutpoints are chosen to maximize the chi-square statistic, referring the obtained chi-square statistic to the standard chi-square distribution is an inappropriate approach. It is then necessary to correct the p-value for multiple comparisons by considering the distribution of the maximally selected chi-square statistic instead of the nominal chi-square distribution. Here, we derive the exact distribution of the chi-square statistic obtained by the optimal two cutpoints. We suggest a combinatorial computation method and illustrate our approach by a simulation study and an application to varicella data.
We prove that the quasi-score estimator in a mean-variance model is optimal in the class of (unbiased) linear score estimators, in the sense that the difference of the asymptotic covariance matrices of the linear score and quasi-score estimator is positive semi-definite. We also give conditions under which this difference is zero or under which it is positive definite. This result can be applied to measurement error models where it implies that the quasi-score estimator is asymptotically more efficient than the corrected score estimator.
A new method for testing linear restrictions in linear regression models is suggested. It allows to validate the linear restriction, up to a specified approximation error and with a specified error probability. The test relies on asymptotic normality of the test statistic, and therefore normality of the errors in the regression model is not required. In a simulation study the performance of the suggested method for model selection purposes, as compared to standard model selection criteria and the t-test, is examined. As an illustration we analyze the US college spending data from 1994.
If rounded data are used in estimating moments and regression coefficients, the estimates are typically more or less biased. The purpose of the paper is to study the bias inducing effect of rounding, which is also seen when population moments instead of their estimates are considered. Under appropriate conditions this effect can be approximately specified by versions of Sheppard's correction formula. We discuss the conditions under which these approximations are valid. We also investigate the efficiency loss that comes along with rounding.
The rounding error, which corresponds to the measurement error of a measurement error model, has a marginal distribution which can be approximated by the uniform distribution.
We generalize the concept of simple rounding to that of asymmetric rounding and study its effect on the mean and variance of a distribution under similar circumstances as with simple rounding.
In this paper we introduce an exponential continuous time GARCH(p,q) process. It is defined in such a way that it is a continuous time extension of the discrete time EGARCH(p,q) process. We investigate stationarity and moment properties of the new model. An instantaneous leverage effect can be shown for the exponential continuous time GARCH(p,p) model.
A novel concept for estimating smooth functions by selection techniques based on boosting is developed. It is suggested to put radial basis functions with different spreads at each knot and to do selection and estimation simultaneously by a componentwise boosting algorithm. The methodology of various other smoothing and knot selection procedures (e.g. stepwise selection) is summarized. They are compared to the proposed approach by extensive simulations for various unidimensional settings, including varying spatial variation and heteroskedasticity, as well as on a real world data example. Finally, an extension of the proposed method to surface fitting is evaluated numerically on both, simulation and real data. The proposed knot selection technique is shown to be a strong competitor to existing methods for knot selection.
This paper focuses on an extension of zero-inflated generalized Poisson (ZIGP) regression models for count data. We discuss generalized Poisson (GP) models where dispersion is modelled by an additional model parameter. Moreover, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. In addition to ZIGP regression introduced by Famoye and Singh (2003), we now allow for regression on the overdispersion and zero-inflation parameters. Consequently, we propose tools for an exploratory data analysis on the dispersion and zero-inflation level. An application dealing with outsourcing of patent filing processes will be used to compare these nonnested models. The model parameters are fitted by maximum likelihood. Asymptotic normality of the ML estimates in this non-exponential setting is proven. Standard errors are estimated using the asymptotic normality of the estimates. Appropriate exploratory data analysis tools are developed. Also, a model comparison using AIC statistics and Vuong tests (see Vuong (1989)) is carried out. For the given data, our extended ZIGP regression model will prove to be superior over GP and ZIP models and even ZIGP models with constant overall dispersion and zero-inflation parameters demonstrating the usefulness of our proposed extensions.
Functional data analysis can be challenging when the functional objects are sampled only very sparsely and unevenly. Most approaches rely on smoothing to recover the underlying functional object from the data which can be difficult if the data is irregularly distributed. In this paper we present a new approach that can overcome this challenge. The approach is based on the ideas of mixed models. Specifically, we propose a semiparametric mixed model with boosting to recover the functional object. While the model can handle sparse and unevenly distributed data, it also results in conceptually more meaningful functional objects. In particular, we motivate our method within the framework of eBay's online auctions. Online auctions produce monotonic increasing price curves that are often correlated across two auctions. The semiparametric mixed model accounts for this correlation in a parsimonious way. It also estimates the underlying increasing trend from the data without imposing model-constraints. Our application shows that the resulting functional objects are conceptually more appealing. Moreover, when used to forecast the outcome of an online auction, our approach also results in more accurate price predictions compared to standard approaches. We illustrate our model on a set of 183 closed auctions for Palm M515 personal digital assistants.
In this paper we introduce a fractionally integrated exponential continuous time GARCH(p,d,q) process. It is defined in such a way that it is a continuous time extension of the discrete time FIEGARCH(p,d,q) process. We investigate stationarity and moment properties of the new model. It is also shown that the long memory effect introduced in the log-volatility propagates to the volatility process.
We propose a novel method to model nonlinear regression problems by adapting the principle of penalization to Partial Least Squares (PLS). Starting with a generalized additive model, we expand the additive component of each variable in terms of a generous amount of B-Splines basis functions. In order to prevent overfitting and to obtain smooth functions, we estimate the regression model by applying a penalized version of PLS. Although our motivation for penalized PLS stems from its use for B-Splines transformed data, the proposed approach is very general and can be applied to other penalty terms or to other dimension reduction techniques. It turns out that penalized PLS can be computed virtually as fast as PLS. We prove a close connection of penalized PLS to the solutions of preconditioned linear systems. In the case of high-dimensional data, the new method is shown to be an attractive competitor to other techniques for estimating generalized additive models. If the number of predictor variables is high compared to the number of examples, traditional techniques often suffer from overfitting. We illustrate that penalized PLS performs well in these situations.
A new regularization method for regression models is proposed. The criterion to be minimized contains a penalty term which explicitly links strength of penalization to the correlation between predictors. As the elastic net, the method encourages a grouping effect where strongly correlated predictors tend to be in or out of the model together. A boosted version of the penalized estimator, which is based on a new boosting method, allows to select variables. Real world data and simulations show that the method compares well to competing regularization techniques. In settings where the number of predictors is smaller than the number of observations it frequently performs better than competitors, in high dimensional settings prediction measures favor the elastic net while accuracy of estimation and stability of variable selection favors the newly proposed method.
Building on the work of Bedford, Cooke and Joe, we show how multivariate data, which exhibit complex patterns of dependence in the tails, can be modelled using a cascade of pair-copulae, acting on two variables at a time. We use the pair-copula decomposition of a general multivariate distribution and propose a method to perform inference. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using pair-copulae as simple building blocs. Pair-copula decomposed models also represent a very flexible way to construct higher-dimensional coplulae. We apply the methodology to a financial data set. Our approach represents the first step towards developing of an unsupervised algorithm that explores the space of possible pair-copula models, that also can be applied to huge data sets automatically.
We present Bayesian updating of an imprecise probability measure, represented by a class of precise multidimensional probability measures. Choice and analysis of our class are motivated by expert interviews that we conducted with modelers in the context of climatic change. From the interviews we deduce that generically, experts hold a much more informed opinion on the marginals of uncertain parameters rather than on their correlations. Accordingly, we specify the class by prescribing precise measures for the marginals while letting the correlation structure subject to complete ignorance. For sake of transparency, our discussion focuses on the tutorial example of a linear two-dimensional Gaussian model. We operationalize Bayesian learning for that class by various updating rules, starting with (a modified version of) the generalized Bayes' rule and the maximum likelihood update rule (after Gilboa and Schmeidler). Over a large range of potential observations, the generalized Bayes' rule would provide non-informative results. We restrict this counter-intuitive and unnecessary growth of uncertainty by two means, the discussion of which refers to any kind of imprecise model, not only to our class. First, we find our class of priors too inclusive and, hence, require certain additional properties of prior measures in terms of smoothness of probability density functions. Second, we argue that both updating rules are dissatisfying, the generalized Bayes' rule being too conservative, i.e., too inclusive, the maximum likelihood rule being too exclusive. Instead, we introduce two new ways of Bayesian updating of imprecise probabilities: a ``weighted maximum likelihood method'' and a ``semi-classical method.'' The former bases Bayesian updating on the whole set of priors, however, with weighted influence of its members. By referring to the whole set, the weighted maximum likelihood method allows for more robust inferences than the standard maximum likelihood method and, hence, is better to justify than the latter.Furthermore, the semi-classical method is more objective than the weighted maximum likelihood method as it does not require the subjective definition of a weighting function. Both new methods reveal much more informative results than the generalized Bayes' rule, what we demonstrate for the example of a stylized insurance model.
Keywords: Generalized Bayes rule, imprecise probabilities, known marginals, maximum likelihood update, modeling expert opinions, robust Bayesians, unknown correlation structure, updating under complex uncertainty.
Nonparametric Predictive Inference (NPI) is a general methodology to learn from data in the absence of prior knowledge and without adding unjustified assumptions. This paper develops NPI for multinomial data where the total number of possible categories for the data is known. We present the general upper and lower probabilities and several of their properties. We also comment on differences between this NPI approach and corresponding inferences based on Walley's Imprecise Dirichlet Model.
Keywords: Imprecise Dirichlet Model, imprecise probabilities, interval probability, known number of categories, lower and upper probabilities, multinomial data, nonparametric predictive inference, probability wheel.
Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale level or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale level or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analysing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research.
The paper is a survey of recent investigations by the authors and others into the relative efficiencies of structural and functional estimators of the regression parameters in a measurement error model. While structural methods, in particular the quasi-score (QS) method, take advantage of the knowledge of the regressor distribution (if available), functional methods, in particular the corrected score (CS) method, discards such knowledge and works even if such knowledge is not available. Among other results, it has been shown that QS is more efficient than CS as long as the regressor distribution is completely known. However, if nuisance parameters in the regressor distribution have to be estimated, this is no more true in general. But by modifying the QS method, the adverse effect of the nuisance parameters can be overcome. For small measurement errors, the efficiencies of QS and CS become almost indistinguishable, whether nuisance parameters are present or not. QS is (asymptotically) biased if the regressor distribution has been misspecified, while CS is always consistent and thus more robust than QS.
Precise knowledge about factors influencing the habitat suitability of a certain species forms the basis for the implementation of effective programs to conserve biological diversity. Such knowledge is frequently gathered from studies relating abundance data to a set of influential variables in a regression setup. In particular, generalised linear models are used to analyse binary presence/absence data or counts of a certain species at locations within an observation area. However, one of the key assumptions of generalised linear models, the independence of the observations is often violated in practice since the points at which the observations are collected are spatially aligned. While several approaches have been developed to analyse and account for spatial correlation in regression models with normally distributed responses, far less work has been done in the context of generalised linear models. In this paper, we describe a general framework for semiparametric spatial generalised linear models that allows for the routine analysis of non-normal spatially aligned regression data. The approach is utilised for the analysis of a data set of synthetic bird species in beech forests, revealing that ignorance of spatial dependence actually may lead to false conclusions in a number of situations.
The present article considers the problem of consistent estimation in measurement error models. A linear relation with not necessarily normally distributed measurement errors is considered. Three possible estimators which are constructed as different combinations of the estimators arising from direct and inverse regression are considered. The efficiency properties of these three estimators are derived and analyzed. The effect of non-normally distributed measurement errors is analyzed. A Monte-Carlo experiment is conducted to study the performance of these estimators in finite samples and the effect of a non-normal distribution of the measurement errors.
We consider a regression of $y$ on $x$ given by a pair of mean and variance functions with a parameter vector $\theta$ to be estimated that also appears in the distribution of the regressor variable $x$. The estimation of $\theta$ is based on an extended quasi score (QS) function. We show that the QS estimator is optimal within a wide class of estimators based on linear-in-$y$ unbiased estimating functions. Of special interest is the case where the distribution of $x$ depends only on a subvector $\alpha$ of $\theta$, which may be considered a nuisance parameter. In general, $\alpha$ must be estimated simultaneously together with the rest of $\theta$, but there are cases where $\alpha$ can be pre-estimated. A major application of this model is the classical measurement error model, where the corrected score (CS) estimator is an alternative to the QS estimator. We derive conditions under which the QS estimator is strictly more efficient than the CS estimator.We also study a number of special measurement error models in greater detail.
Keywords: Mean-variance model, measurement error model, quasi score estimator, corrected score estimator, nuisance parameter, optimality property.
This paper presents a general loss function under quadratic loss structure and discusses the comparison of risk functions associated with the unbiased least squares and biased Stein-rule estimators of the coefficients in a linear regression model.
We consider multi-resolution time series models and their application to high-frequency financial data. An individual transaction share price of a specific firm is subject to market microstructure noise. Therefore, we propose trading duration time weighted averages over given time intervals. Averages over long intervals lead to a coarse resolution and averaging over shorter intervals lead to a finer resolution. Arranging sub-intervals of given lengths on scales with coarse to fine resolution imply a structure which can be represented as a directed acyclic graph. Time series models are then formulated using this graph structure. It is shown that these models have a linear state space representation which allows for efficient computation of the likelihood needed in parameter estimation and for a straightforward treatment of missing observations. Application of these models to the log transaction prices of the IBM shares traded at the New York Stock Exchange from February until October 2002 show that the corresponding one-step prediction errors are heavy tailed and therefore a specific variance term is allowed to follow a fiEGARCH specification, improving the tail behavior and leading to a better fit.
We consider the following problem: estimate the size of a population marked with serial numbers after only a sample of the serial numbers has been observed. Its simplicity in formulation and the inviting possibilities of application make this estimation well suited for an undergraduate level probability course. Our contribution consists in a Bayesian treatment of the problem. For an improper uniform prior distribution, we show that the posterior mean and variance have nice closed form expressions and we demonstrate how to compute highest posterior density intervals. Maple and R code is provided on the authors' web-page to allow students to verify the theoretical results and experiment with data.
This paper presents a Poisson control chart for monitoring time series of counts typically arising in the surveillance of infectious diseases. The in-control mean is assumed to be time-varying and linear on the log-scale with intercept and seasonal components. If a shift in the intercept occurs the system goes out-of-control. Novel is that the magnitude of the shift does not have to be specified in advance: using the generalized likelihood ratio (GLR) statistic a monitoring scheme is formulated to detect on-line whether a shift in the intercept occurred. For this specific Poisson chart the necessary quantities of the GLR detector can be efficiently computed by recursive formulas. Extensions to more general Poisson charts e.g. containing an autoregressive epidemic component are discussed. Using Monte Carlo simulations run length properties of the proposed schemes are investigated. The practicability of the charts is demonstrated by applying them to the observed number of salmonella hadar cases in Germany 2001-2006.
The multinomial logit model (MNL) is one of the most frequently used statistical models in marketing applications. It allows to relate an unordered categorical response variable, for example representing the choice of a brand, to a vector of covariates such as the price of the brand or variables characterising the consumer. In its classical form, all covariates enter in strictly parametric, linear form into the utility function of the MNL model. In this paper, we introduce semiparametric extensions, where smooth effects of continuous covariates are modelled by penalised splines. A mixed model representation of these penalised splines is employed to obtain estimates of the corresponding smoothing parameters, leading to a fully automated estimation procedure. To validate semiparametric models against parametric models, we utilise proper scoring rules and compare parametric and semiparametric approaches for a number of brand choice data sets.
Multi-state models provide a unified framework for the description of the evolution of discrete phenomena in continuous time. One particular example are Markov processes which can be characterised by a set of time-constant transition intensities between the states. In this paper, we will extend such parametric approaches to semiparametric models with flexible transition intensities based on Bayesian versions of penalised splines. The transition intensities will be modelled as smooth functions of time and can further be related to parametric as well as nonparametric covariate effects. Covariates with time-varying effects and frailty terms can be included in addition. Inference will be conducted either fully Bayesian using Markov chain Monte Carlo simulation techniques or empirically Bayesian based on a mixed model representation. A counting process representation of semiparametric multi-state models provides the likelihood formula and also forms the basis for model validation via martingale residual processes. As an application, we will consider human sleep data with a discrete set of sleep states such as REM and Non-REM phases. In this case, simple parametric approaches are inappropriate since the dynamics underlying human sleep are strongly varying throughout the night and individual-specific variation has to be accounted for using covariate information and frailty terms.
We propose a new class of state space models for longitudinal discrete response data where the observation equation is specified in an additive form involving both deterministic and random linear predictors. These models allow us to explicitly address the effects of trend, seaonal or other time-varying covariates while preserving the power of state space models in modeling serial dependence in the data. We develop a Markov Chain Monte Carlo algorithm to carry out statistical inferene for models with binary and binomial responses, in which we invoke de Jong and Shephard's (1995) simulaton smoother to establish an efficent sampling procedure for the state variables. To quantify and control the sensitivity of posteriors on the priors of variance parameters, we add a signal-to-noise ratio type parmeter in the specification of these priors. Finally, we ilustrate the applicability of the proposed state space mixed models for longitudinal binomial response data in both simulation studies and data examples.
In this paper we introduce two stochastic volatility models where the response variable takes on only finite many ordered values. Corresponding time series occur in high-frequency finance when the stocks are traded on a coarse grid. For parameter estimation we develop an e±cient Grouped Move Multigrid Monte Carlo (GM-MGMC) sampler. We apply both models to price changes of the IBM stock in January, 2001 at the NYSE. Dependencies of the price change process on covariates are quantified and compared with theoretical considerations on such processes. We also investigate whether this data set requires modeling with a heavy-tailed Student-t distribution.
In this paper, wavelet basis functions are investigated for their suitability for processing and analysing diffusion tensor imaging (DTI) data. First, wavelet theory is introduced and explained by means of 1d and 2d examples (Sections 1.1 - 1.3). General thresholding techniques, which serve as regularization concepts for wavelet based models, are presented in Section 1.4. Regularization of DTI data can be performed at two stages, either immediately after acquisition (Wirestam, 2006) or after tensor estimation. The latter stage of denoising is outlined in Section 2 together with the incorporation of the positive definiteness constraint using log-Cholesky parametrization. In Section 3, the procedure is examined in a simulation study and compared to standard processing and the space-varying coefficient model (SVCM) based on B-splines (Heim etal., 2007). In addition, a real data example is presented and discussed. Finally, an approach is proposed how a space-varying coefficient model could fairly be adapted to wavelet basis functions. The theoretical parts are based on books of Gencay et al. (2002, Chap. 1, 4-6), Härdle et al. (1998), Ogden (1997) and Jansen (2001) if not stated otherwise. For an introduction to diffusion tensor imaging refer to Heim et al. (2007, Chap.2).
Key words: Wavelets; Varying coefficient model; Diffusion tensor; Brain imaging
In this paper we extend the standard approach of correlation structure analysis in order to reduce the dimension of highdimensional statistical data. The classical assumption of a linear model for the distribution of a random vector is replaced by the weaker assumption of a model for the copula. For elliptical copulae a correlation-like structure remains but different margins and non-existence of moments are possible. Moreover, elliptical copulae allow also for a copula structure analysis of dependence in extremes. After introducing the new concepts and deriving some theoretical results we observe in a simulation study the performance of the estimators: the theoretical asymptotic behavior of the statistics can be observed even for a sample of only 100 observations. Finally, we test our method on real financial data and explain differences between our copula based approach and the classical approach. Our new method yields a considerable dimension reduction also in non-linear models.
We introduce a new latent variable model with count variable indicators, where usual linear parametric effects of covariates, nonparametric effects of continuous covariates and spatial effects on the continuous latent variables are modelled through a geoadditive predictor. Bayesian modelling of nonparametric functions and spatial effects is based on penalized spline and Markov random field priors. Full Bayesian inference is performed via an auxiliary variable Gibbs sampling technique, using a recent suggestion of Frühwirth-Schnatter and Wagner (2006). As an advantage, our Poisson indicator latent variable model can be combined with semiparametric latent variable models for mixed binary, ordinal and continuous indicator variables within an unified and coherent framework for modelling and inference. A simulation study investigates performance, and an application to post war human security in Cambodia illustrates the approach.
Key words: Latent variable models, Poisson indicators, penalized splines, spatial effects, MCMC.
The risk of the family of feasible generalized double k-class estimators under LINEX loss function is derived in a linear regression model. The disturbances are assumed to be non-spherical and their variance covariance matrix is unknown.
Structured additive regression comprises many semiparametric regression models such as generalized additive (mixed) models, geoadditive models, and hazard regression models within a unified framework. In a Bayesian formulation, nonparametric functions, spatial effects and further model components are specified in terms of multivariate Gaussian priors for high-dimensional vectors of regression coefficients. For several model terms, such as penalised splines or Markov random fields, these Gaussian prior distributions involve rank-deficient precision matrices, yielding partially improper priors. Moreover, hyperpriors for the variances (corresponding to inverse smoothing parameters) may also be specified as improper, e.g. corresponding to Jeffery's prior or a flat prior for the standard deviation. Hence, propriety of the joint posterior is a crucial issue for full Bayesian inference in particular if based on Markov chain Monte Carlo simulations. We establish theoretical results providing sufficient (and sometimes necessary) conditions for propriety and provide empirical evidence through several accompanying simulation studies.
The internal-ratings based Basel II approach increases the need for the development of more realistic default probability models. In this paper we follow the approach taken in McNeil and Wendin (2006) by constructing generalized linear mixed models for estimating default probabilities from annual data on companies with different credit ratings. The models considered, in contrast to McNeil and Wendin (2006), allow parsimonious parametric models to capture simultaneously dependencies of the default probabilities on time and credit ratings. Macro-economic variables can also be included. Estimation of all model parameters are facilitated with a Bayesian approach using Markov Chain Monte Carlo methods. Special emphasis is given to the investigation of predictive capabilities of the models considered. In particular predictable model specifications are used. The empirical study using default data from Standard and Poor gives evidence that the correlation between credit ratings further apart decreases and is higher than the one induced by the autoregressive time dynamics.
Influenza is one of the most common and severe diseases worldwide. Devastating epidemics actuated by a new subtype of the influenza A virus occur again and again with the most important example given by the Spanish Flu in 1918/19 with more than 27 million deaths. For the development of pandemic plans it is essential to understand the character of the dissemination of the disease. We employ an extended SIR model for a probabilistic analysis of the spatio-temporal spread of influenza in Germany. The inhomogeneous mixing of the population is taken into account by the introduction of a network of subregions, connected according to Germany's commuter and domestic air traffic. The infection dynamics is described by a multivariate diffusion process, the discussion of which is a major part of this report. We furthermore present likelihood-based estimates of the model parameters.
Key words: general stochastic epidemic, likelihood inference, Euler scheme, influenza.