## Statistics

Researchers: John Butler; Alberto Caimo; Joe Condon

Collaborators: Environmental Sustainability & Health Institute (ESHI)

### Description of area

Bayesian analysis & statistical network analysis (Alberto Caimo)

Research focuses on the development of Bayesian computational and modelling techniques for network data with various applications. Networks are relational data are defined on a complex graph whose nodes represent the actors of the network and the links represent the relational structure between actors. Statistical network analysis provides the opportunity to explain the complexity of relational behaviour and investigate issues on how the global features of observed relational data may be related to local relational structures. However, network relationships cannot regarded as independent and this assumption makes the estimation of statistical network models severely hindered by computational intractability.

A crucial challenge for statistical models is to capture and analyse the dependency between observations giving rise to global relational structure and the development and application of new probabilistic modelling and inferential techniques for analysing network data is a strategic and timely scientific topic. From a computational perspective, Bayesian methods offer the advantage of a rich diagnostic information about the probability of competing models and model parameters given the observed data and allow for exploration of prior assumptions. Advanced computational methods are often required to carry out estimation procedures for such complex statistical models. Hence, the design of software is a complex yet crucial part for supporting any new methodological or modelling advances in statistical network analysis.

Bioinformatics, Bayes factor, bootstrapping (John Butler)

Bioinformatics is the application of statistical and mathematical methods to biological data. The methods can give insight to different real world problems as well as help develop research questions. For example, the interpretation and modelling of motor and decision making using electrophysiological signals of people with Parkinson’s with and without freezing of gait.

Classically when conducting a statistical test between two groups (or conditions) one wishes to see is there a significant difference between the groups (The Alternative Hypothesis) or not (the Null Hypothesis). Bayes factor analysis allows us to embrace the Null hypothesis with confidence. Thus enabling the move away from the current experimental approach in science which is to show difference between groups and/or conditions and embrace similarities. For example, when investigating sensory processing in neurotypicals and people with Autism, this approach allowed for the dispelling of the neural unreliability thesis.

Bootstrapping allows for the inference of information of data by resampling from sample data. This sidesteps the assumptions on the underlying statistical distributions required for parametric statistics. It also allows for the statistical analysis of small data sets, such as in rare disease like Niemann-Pick type C disease.

Regression models with random effects, survival analysis & frailty, classiﬁcation methods (Joe Condon)

Regression models are among the most widely used data analysis techniques. They attempt to model the relationship between predictor (independent) variables and one or more response (dependent) variables. In recent decades classes of regression models have been developed to deal with a myriad of different types of predictors and response variables. These include discrete responses, non-gaussian continuous responses and incomplete time-to-event (i.e. survival) data.

In their classical application, regression models assumed statistical independence between responses and fixed non-random predictors. However, modern applications of regression frequently address dependence among responses and non-fixed random effects predictors. The influence and structure of these dependencies is often of central interest in the analysis of such data. This dependence may occur over time (e.g. repeated measures) or via spatial or other forms of hierarchical clustering. These features raise significant challenges to the fitting of complex regression models to data, the interpretation of model outputs and the application of robust statistical inference. There is now a well established suite of software for fitting such models in relatively simple cases. However, extensions allowing for multiple correlated random effects as well as non-gaussian and arbitrary mixing distributions are still under developed. These models are also often poorly understood and/or underused by the general data analysis community.

For more information about any of these areas and to discuss opportunities for research please contact the individuals above.