Prior to being disbanded in 2021, the group had several research projects that roughly fall into the following themes.
Algorithms for metagenome analysis
Our work in this space includes the development of new analysis methods as well as contributing to community-driven efforts to evaluate the performance of publicly available metagenome data analysis methods. Examples of our work in this space include:
- Contributions to the Critical Assessment of Metagenome Interpretation. This is an ongoing community-driven effort to evaluate metagenome data analysis tools. Our first publication appeared in Nature Methods in 2017.
- Methods to analyse metagenome Hi-C data. This stream of work includes the bin3C tool for reconstructing individual genomes from a community sequenced with metagenomic Hi-C, and the related data analysis tools sim3C, and qc3C.
- Methods to resolve the genomes of strains from metagenomic assemblies. Together with Dr. Chris Quince and other international collaborators we have been developing statistical methods for strain genome resolution using time-series sampled data, Hi-C data, and long read data. We published a method, DESMAN, for this task and continue to work on new methods.
- New methods to estimate the rates of recombination for microbial populations from metagenomic data.
Development of advanced DNA sequencing technologies
We are developing advanced DNA sequencing technologies via tightly coupled development of computational inference methods and next-generation sequencing techniques. Two examples of our work in this area are the development of metagenomic Hi-C and a technique for full length 16S ribosomal gene sequencing on the Illumina MiSeq in collaboration with Dr. Catherine Burke. That work led to a spin-out company to commercialize a new long read sequencing technology, which in 2019 emerged from stealth mode with the announcement of Morphoseq.
Understanding the development of the infant gut microbiome
We are working to understand how the infant’s microbiome develops, in humans and in animals, and its relationship with maternal microbiota and infant health status.
- In collaboration with the NSW Department of Primary Industries, we are participating in a large project to evaluate the effect of probiotics and antibiotics on postweaning piglets. This work has yielded a very large metagenomic timeseries dataset and has revealed that microbial community development in piglets appears to be tightly structured.
- In collaboration with a large team led by Prof. Shyamali Dharmage and with support of the National Health and Medical Research Council, we are studying the microbiota found in the breast milk of families that are at risk for atopy.
Bayesian phylogenomic inference
We are developing new, scalable phylogenetic analysis methods using Monte Carlo methods, including Variational Inference, Sequential Monte Carlo, MCMC, and approximate Bayesian computation. We have a particular interest in the application of these methods to bacteria, and the analysis challenges introduced by bacterial recombination and horizontal gene transfer. Some examples of our work in this space include:
- The PhyloStan software which implements a research prototype for the application of probabilistic programming and variational inference to phylogenetic models. This software provides a means to carry out Bayesian inference on continuous phylogenetic model parameters such as branch lengths, and historical population sizes. A peer reviewed manuscript describing the design and evaluation of PhyloStan is available.
- The STS software is a prototype for online phylogenetic inference via Sequential Monte Carlo. The work behind STS is described in two corresponding manuscripts: one on the underlying theory, and another describing the STS algorithm and its performance.
- The beagle library. BEAGLE is a library that provides a single uniform programming interface for high performance implementations of phylogenetic likelihood and gradient calculations across a variety of compute architectures. The library currently contains specialized likelihood calculators for SSE, OpenMP, and GPU-enabled computation via CUDA and OpenCL kernels.
- New methods to identify the recombinant parts of a genome and to improve the estimation of the clonal genealogy for bacteria.