Research Interest: The mitigation of global key issues (particularly
enteric methane) whilst maintaining/improving high animal welfare standards
and sustaining agriculture and livestock farming for the future.
Summary of the research completed: A workflow was successfully
developed utilising various statistical packages (e.g. R, PLINK, Zanardi and
Beagle) to estimate the accuracy imputation of missing genotypes in
heterogeneous livestock populations, focusing on the impact on the prediction
accuracy. The data comprised of both genotypic and phenotypic datasets and
was subject to various scenarios including, population size and structure and
percentage of missing data, all of which had an effect on imputation accuracy.
Specific scenarios were selected for analysis with phenotypic data on rumen
methane emissions, identifying relations between imputation and population
parameters on the prediction accuracy of methane production. Conclusions
drawn strengthen predictions of anthropogenic green house gas emissions thus,
assisting their mitigation and sustainable livestock breeding.
Synthesis and application: The computational workflow was comprised of
four key stages; Data Preparation, Injection of missing data, imputation of
missing data and accuracy assessment which was either followed by the
phenotype assessment or the presentation of accuracies. The workflow was run on
24 different scenarios (e.g. single breed, mixed) and was designed to read
and convert data sets and select specific sample sizes and proportions of
random missing data from the original data set. Missing data was then subject
to imputation using the Zanardi (Marras et al., 2016) and Beagle (Browning
and Browning, 2007) programs to obtain a complete dataset again. The accuracy
of this imputation was assessed in R for the total accuracy and the accuracy
of AA, AB and BB genotypes. The impact of imputation accuracy on the
prediction of methane emission was assessed using the R package GROAN
(https://cran.r-project.org/web/packages/GROAN/) and required both the imputed
genomic data and the phenotypic data in terms of ruminal methane production (g/d).
R was used to visually present the data using packages such as ggplot2 so that
data could be interpreted effectively.
Wider benefits of the STSM to the participant: The STSM will benefit the
participant in further work with SNP chip data analysis and provided her with
skills that can be utilized throughout her PhD and bioinformatics career. The STSM
allowed the participant the opportunity to present at a seminar for colleagues and
staff of the National research council, Milan, an opportunity that was readily
accepted by both parties. Furthermore, this work should result in a
publication of which the participant will be an author. In addition to the
above, the STSM allowed the participant the opportunity to travel and
experience the cultures of a different country and gain friends and colleagues
with similar research interests; essential for future collaborations.
Successful assessment of imputation accuracy in heterogeneous livestock
populations using SNP array data.
Development of a pipeline to assess the accuracy of imputation on the prediction
of phenotypic methane production.
Quote: “The STSM allowed me a great opportunity to learn from the
excellent expertise at the CNRIBBA (Milan, Italy) and PTP (Lodi, Italy) and develop
a new skill set beneficial to a career in bioinformatics, whilst focusing on my key
research interests around livestock production.”