Genotoxic colibactin mutational signature in colorectal cancer is associated with clinicopathological features, specific genomic alterations and better survival., medRxiv, 2023.
Authors
Georgeson P, Steinfelder RS, Harrison TA, Pope BJ, Zaidi SH, Qu C, Lin Y, Joo JE, Mahmood K, Clendenning M, Walker R, Aglago EK, Berndt SI, Brenner H, Campbell PT, Cao Y, Chan AT, Chang-Claude J, Dimou N, Doheny KF, Drew DA, Figueiredo JC, French AJ, Gallinger S, Giannakis M, Giles GG, Goode EL, Gruber SB, Gsur A, Gunter MJ, Harlid S, Hoffmeister M, Hsu L, Huang WY, Huyghe JR, Manson JE, Moreno V, Murphy N, Nassir R, Newton CC, Nowak JA, Obón-Santacana M, Ogino S, Pai RK, Papadimitrou N, Potter JD, Schoen RE, Song M, Sun W, Toland AE, Trinh QM, Tsilidis K, Ugai T, Um CY, Macrae FA, Rosty C, Hudson TJ, Winship IM, Phipps AI, Jenkins MA, Peters U, Buchanan DD
Journal
medRxiv
Volume
None
Issue
None
Year
2023
DOI
10.1101/2023.03.10.23287127
Pubmed ID
37090539
Abstract
BACKGROUND AND AIMS: The microbiome has long been suspected of a role in colorectal cancer (CRC) tumorigenesis. The mutational signature SBS88 mechanistically links CRC development with the strain of METHODS: SBS88-positive CRCs were identified from targeted sequencing data from 5,292 CRCs from 17 studies and tested for their association with clinico-pathological features, oncogenic pathways, genomic characteristics and survival. RESULTS: In total, 7.5% (398/5,292) of the CRCs were SBS88-positive, of which 98.7% (392/398) were microsatellite stable/microsatellite instability low (MSS/MSI-L), compared with 80% (3916/4894) of SBS88 negative tumors (p=1.5×10 CONCLUSION: SBS88-positivity, a biomarker of colibactin-induced DNA damage, can identify a novel subtype of CRC characterized by recurrent somatic mutations, copy number alterations and better survival. These findings provide new insights for treatment and prevention strategies for this subtype of CRC.
A tumor focused approach to resolving the etiology of DNA mismatch repair deficient tumors classified as suspected Lynch syndrome., medRxiv, 2023.
Authors
Walker R, Mahmood K, Joo JE, Clendenning M, Georgeson P, Como J, Joseland S, Preston SG, Antill Y, Austin R, Boussioutas A, Bowman M, Burke J, Campbell A, Daneshvar S, Edwards E, Gleeson M, Goodwin A, Harris MT, Henderson A, Higgins M, Hopper JL, Hutchinson RA, Ip E, Isbister J, Kasem K, Marfan H, Milnes D, Ng A, Nichols C, O'Connell S, Pachter N, Pope BJ, Poplawski N, Ragunathan A, Smyth C, Spigelman A, Storey K, Susman R, Taylor JA, Warwick L, Wilding M, Williams R, Win AK, Walsh MD, Macrae FA, Jenkins MA, Rosty C, Winship IM, Buchanan DD, Family Cancer Clinics of Australia
Journal
medRxiv
Volume
None
Issue
None
Year
2023
DOI
10.1101/2023.02.27.23285541
Pubmed ID
36909643
Abstract
Routine screening of tumors for DNA mismatch repair (MMR) deficiency (dMMR) in colorectal (CRC), endometrial (EC) and sebaceous skin (SST) tumors leads to a significant proportion of unresolved cases classified as suspected Lynch syndrome (SLS). SLS cases (n=135) were recruited from Family Cancer Clinics across Australia and New Zealand. Targeted panel sequencing was performed on tumor (n=137; 80xCRCs, 33xECs and 24xSSTs) and matched blood-derived DNA to assess for microsatellite instability status, tumor mutation burden, COSMIC tumor mutational signatures and to identify germline and somatic MMR gene variants. MMR immunohistochemistry (IHC) and
Evaluating Multiple Next-Generation Sequencing-Derived Tumor Features to Accurately Predict DNA Mismatch Repair Status., J Mol Diagn, 2023.
Authors
Walker R, Georgeson P, Mahmood K, Joo JE, Makalic E, Clendenning M, Como J, Preston S, Joseland S, Pope BJ, Hutchinson RA, Kasem K, Walsh MD, Macrae FA, Win AK, Hopper JL, Mouradov D, Gibbs P, Sieber OM, O'Sullivan DE, Brenner DR, Gallinger S, Jenkins MA, Rosty C, Winship IM, Buchanan DD
Journal
J Mol Diagn
Volume
25
Issue
2
Year
2023
DOI
10.1016/j.jmoldx.2022.10.003
Pubmed ID
36396080
Abstract
Identifying tumor DNA mismatch repair deficiency (dMMR) is important for precision medicine. Tumor features, individually and in combination, derived from whole-exome sequenced (WES) colorectal cancers (CRCs) and panel-sequenced CRCs, endometrial cancers (ECs), and sebaceous skin tumors (SSTs) were assessed for their accuracy in detecting dMMR. CRCs (n = 300) with WES, where mismatch repair status was determined by immunohistochemistry, were assessed for microsatellite instability (MSMuTect, MANTIS, MSIseq, and MSISensor), Catalogue of Somatic Mutations in Cancer tumor mutational signatures, and somatic mutation counts. A 10-fold cross-validation approach (100 repeats) evaluated the dMMR prediction accuracy for i) individual features, ii) Lasso statistical model, and iii) an additive feature combination approach. Panel-sequenced tumors (29 CRCs, 22 ECs, and 20 SSTs) were assessed for the top performing dMMR predicting features/models using these three approaches. For WES CRCs, 10 features provided >80% dMMR prediction accuracy, with MSMuTect, MSIseq, and MANTIS achieving ≥99% accuracy. The Lasso model achieved 98.3% accuracy. The additive feature approach, with three or more of six of MSMuTect, MANTIS, MSIseq, MSISensor, insertion-deletion count, or tumor mutational signature small insertion/deletion 2 + small insertion/deletion 7 achieved 99.7% accuracy. For the panel-sequenced tumors, the additive feature combination approach of three or more of six achieved accuracies of 100%, 95.5%, and 100% for CRCs, ECs, and SSTs, respectively. The microsatellite instability calling tools performed well in WES CRCs; however, an approach combining tumor features may improve dMMR prediction in both WES and panel-sequenced data across tissue types.
2022
Long-read assembly and comparative evidence-based reanalysis of Cryptosporidium genome sequences reveal expanded transporter repertoire and duplication of entire chromosome ends including subtelomeric regions., Genome Res, 2022.
Authors
Baptista RP, Li Y, Sateriale A, Sanders MJ, Brooks KL, Tracey A, Ansell BRE, Jex AR, Cooper GW, Smith ED, Xiao R, Dumaine JE, Georgeson P, Pope BJ, Berriman M, Striepen B, Cotton JA, Kissinger JC
Journal
Genome Res
Volume
32
Issue
1
Year
2022
DOI
10.1101/gr.275325.121
Pubmed ID
34764149
Abstract
Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants and the immunosuppressed. Despite its importance, the
Identifying colorectal cancer caused by biallelic MUTYH pathogenic variants using tumor mutational signatures., Nat Commun, 2022.
Authors
Georgeson P, Harrison TA, Pope BJ, Zaidi SH, Qu C, Steinfelder RS, Lin Y, Joo JE, Mahmood K, Clendenning M, Walker R, Amitay EL, Berndt SI, Brenner H, Campbell PT, Cao Y, Chan AT, Chang-Claude J, Doheny KF, Drew DA, Figueiredo JC, French AJ, Gallinger S, Giannakis M, Giles GG, Gsur A, Gunter MJ, Hoffmeister M, Hsu L, Huang WY, Limburg P, Manson JE, Moreno V, Nassir R, Nowak JA, Obón-Santacana M, Ogino S, Phipps AI, Potter JD, Schoen RE, Sun W, Toland AE, Trinh QM, Ugai T, Macrae FA, Rosty C, Hudson TJ, Jenkins MA, Thibodeau SN, Winship IM, Peters U, Buchanan DD
Journal
Nat Commun
Volume
13
Issue
1
Year
2022
DOI
10.1038/s41467-022-30916-1
Pubmed ID
35668106
Abstract
Carriers of germline biallelic pathogenic variants in the MUTYH gene have a high risk of colorectal cancer. We test 5649 colorectal cancers to evaluate the discriminatory potential of a tumor mutational signature specific to MUTYH for identifying biallelic carriers and classifying variants of uncertain clinical significance (VUS). Using a tumor and matched germline targeted multi-gene panel approach, our classifier identifies all biallelic MUTYH carriers and all known non-carriers in an independent test set of 3019 colorectal cancers (accuracy = 100% (95% confidence interval 99.87-100%)). All monoallelic MUTYH carriers are classified with the non-MUTYH carriers. The classifier provides evidence for a pathogenic classification for two VUS and a benign classification for five VUS. Somatic hotspot mutations KRAS p.G12C and PIK3CA p.Q546K are associated with colorectal cancers from biallelic MUTYH carriers compared with non-carriers (p = 2 × 10
Rare germline variants in the AXIN2 gene in families with colonic polyposis and colorectal cancer., Fam Cancer, 2022.
Authors
Chan JM, Clendenning M, Joseland S, Georgeson P, Mahmood K, Walker R, Como J, Joo JE, Preston S, Hutchinson RA, Pope BJ, Metz A, Beard C, Purvis R, Arnold J, Vijay V, Konycheva G, Atkinson N, Parry S, Jenkins MA, Macrae FA, Rosty C, Winship IM, Buchanan DD
Journal
Fam Cancer
Volume
21
Issue
4
Year
2022
DOI
10.1007/s10689-021-00283-9
Pubmed ID
34817745
Abstract
Germline loss-of-function variants in AXIN2 are associated with oligodontia and ectodermal dysplasia. The association between colorectal cancer (CRC) and colonic polyposis is less clear despite this gene now being included in multi-gene panels for CRC. Study participants were people with genetically unexplained colonic polyposis recruited to the Genetics of Colonic Polyposis Study who had a rare germline AXIN2 gene variant identified from either clinical multi-gene panel testing (n=2) or from whole genome/exome sequencing (n=2). Variant segregation in relatives and characterisation of tumour tissue were performed where possible. Four different germline pathogenic variants in AXIN2 were identified in four families. Five of the seven carriers of the c.1049delC, p.Pro350Leufs*13 variant, two of the six carriers of the c.1994dupG, p.Asn666Glnfs*41 variant, all three carriers of c.1972delA, p.Ser658Alafs*31 variant and the single proband carrier of the c.2405G>C, p.Arg802Thr variant, which creates an alternate splice form resulting in a frameshift mutation (p.Glu763Ilefs*42), were affected by CRC and/or polyposis. Carriers had a mean age at diagnosis of CRC/polyposis of 52.5 ± 9.2 years. Colonic polyps were typically pan colonic with counts ranging from 5 to >100 (median 12.5) comprising predominantly adenomatous polyps but also serrated polyps. Two CRCs from carriers displayed evidence of a second hit via loss of heterozygosity. Oligodontia was observed in carriers from two families. Germline AXIN2 pathogenic variants from four families were associated with CRC and/or polyposis in multiple family members. These findings support the inclusion of AXIN2 in CRC and polyposis multigene panels for clinical testing.
2021
DNA Methylation Signatures and the Contribution of Age-Associated Methylomic Drift to Carcinogenesis in Early-Onset Colorectal Cancer., Cancers (Basel), 2021.
Authors
Joo JE, Clendenning M, Wong EM, Rosty C, Mahmood K, Georgeson P, Winship IM, Preston SG, Win AK, Dugué PA, Jayasekara H, English D, Macrae FA, Hopper JL, Jenkins MA, Milne RL, Giles GG, Southey MC, Buchanan DD
Journal
Cancers (Basel)
Volume
13
Issue
11
Year
2021
DOI
10.3390/cancers13112589
Pubmed ID
34070516
Abstract
We investigated aberrant DNA methylation (DNAm) changes and the contribution of ageing-associated methylomic drift and age acceleration to early-onset colorectal cancer (EOCRC) carcinogenesis. Genome-wide DNAm profiling using the Infinium HM450K on 97 EOCRC tumour and 54 normal colonic mucosa samples was compared with: (1) intermediate-onset CRC (IOCRC; diagnosed between 50-70 years; 343 tumour and 35 normal); and (2) late-onset CRC (LOCRC; >70 years; 318 tumour and 40 normal). CpGs associated with age-related methylation drift were identified using a public dataset of 231 normal mucosa samples from people without CRC. DNAm-age was estimated using epiTOC2. Common to all three age-of-onset groups, 88,385 (20% of all CpGs) CpGs were differentially methylated between tumour and normal mucosa. We identified 234 differentially methylated genes that were unique to the EOCRC group; 13 of these DMRs/genes were replicated in EOCRC compared with LOCRCs from TCGA. In normal mucosa from people without CRC, we identified 28,154 CpGs that undergo ageing-related DNAm drift, and of those, 65% were aberrantly methylated in EOCRC tumours. Based on the mitotic-based DNAm clock epiTOC2, we identified age acceleration in normal mucosa of people with EOCRC compared with normal mucosa from the IOCRC, LOCRC groups (
Evaluating the utility of tumour mutational signatures for identifying hereditary colorectal cancer and polyposis syndrome carriers., Gut, 2021.
Authors
Georgeson P, Pope BJ, Rosty C, Clendenning M, Mahmood K, Joo JE, Walker R, Hutchinson RA, Preston S, Como J, Joseland S, Win AK, Macrae FA, Hopper JL, Mouradov D, Gibbs P, Sieber OM, O'Sullivan DE, Brenner DR, Gallinger S, Jenkins MA, Winship IM, Buchanan DD
Journal
Gut
Volume
70
Issue
11
Year
2021
DOI
10.1136/gutjnl-2019-320462
Pubmed ID
33414168
Abstract
OBJECTIVE: Germline pathogenic variants (PVs) in the DNA mismatch repair (MMR) genes and in the base excision repair gene DESIGN: Whole-exome sequencing of formalin-fixed paraffin-embedded (FFPE) CRC tissue was performed on 33 MMR germline PV carriers, 12 biallelic RESULTS: The combination of mutational signatures SBS18 and SBS36 contributing >30% of a CRC's signature profile was able to discriminate biallelic CONCLUSION: Assessment of SBS and ID signatures can discriminate CRCs from biallelic
Germline and Tumor Sequencing as a Diagnostic Tool To Resolve Suspected Lynch Syndrome., J Mol Diagn, 2021.
Authors
Pope BJ, Clendenning M, Rosty C, Mahmood K, Georgeson P, Joo JE, Walker R, Hutchinson RA, Jayasekara H, Joseland S, Como J, Preston S, Spurdle AB, Macrae FA, Win AK, Hopper JL, Jenkins MA, Winship IM, Buchanan DD
Journal
J Mol Diagn
Volume
23
Issue
3
Year
2021
DOI
10.1016/j.jmoldx.2020.12.003
Pubmed ID
33383211
Abstract
Patients in whom mismatch repair (MMR)-deficient cancer develops in the absence of pathogenic variants of germline MMR genes or somatic hypermethylation of the MLH1 gene promoter are classified as having suspected Lynch syndrome (SLS). Germline whole-genome sequencing (WGS) and targeted and genome-wide tumor sequencing were applied to identify the underlying cause of tumor MMR deficiency in SLS. Germline WGS was performed on samples from 14 cancer-affected patients with SLS, including two sets of first-degree relatives. MMR genes were assessed for germline pathogenic variants, including complex structural rearrangements and noncoding variants. Tumor tissue was assessed for somatic MMR gene mutations using targeted, whole-exome sequencing or WGS. Germline WGS identified pathogenic MMR variants in 3 of the 14 cases (21.4%), including a 9.5-megabase inversion disrupting MSH2 in a mother and daughter. Excluding these 3 MMR carriers, tumor sequencing identified at least two somatic MMR gene mutations in 8 of 11 tumors tested (72.7%). In a second mother-daughter pair, a somatic cause of tumor MMR deficiency was supported by the presence of double somatic MSH2 mutations in their respective tumors. More than 70% of SLS cases had double somatic MMR mutations in the absence of germline pathogenic variants in the MMR or other DNA repair-related genes on WGS, and, therefore, were confidently assigned a noninherited cause of tumor MMR deficiency.
MSH2-deficient prostate tumours have a distinct immune response and clinical outcome compared to MSH2-deficient colorectal or endometrial cancer., Prostate Cancer Prostatic Dis, 2021.
Authors
McCoy P, Mangiola S, Macintyre G, Hutchinson R, Tran B, Pope B, Georgeson P, Hong MKH, Kurganovs N, Lunke S, Clarkson MJ, Cmero M, Kerger M, Stuchbery R, Chow K, Haviv I, Ryan A, Costello AJ, Corcoran NM, Hovens CM
Journal
Prostate Cancer Prostatic Dis
Volume
24
Issue
4
Year
2021
DOI
10.1038/s41391-021-00379-4
Pubmed ID
34108644
Abstract
BACKGROUND: Recent publications have shown patients with defects in the DNA mismatch repair (MMR) pathway driven by either MSH2 or MSH6 loss experience a significant increase in the incidence of prostate cancer. Moreover, this increased incidence of prostate cancer is accompanied by rapid disease progression and poor clinical outcomes. METHODS AND RESULTS: We show that androgen-receptor activation, a key driver of prostate carcinogenesis, can disrupt the MSH2 gene in prostate cancer. We screened tumours from two cohorts (recurrent/non-recurrent) of prostate cancer patients to confirm the loss of MSH2 protein expression and identified decreased MSH2 expression in recurrent cases. Stratifying the independent TCGA prostate cancer cohort for MSH2/6 expression revealed that patients with lower levels of MSH2/6 had significant worse outcomes, in contrast, endometrial and colorectal cancer patients with lower MSH2/6 levels. MMRd endometrial and colorectal tumours showed the expected increase in mutational burden, microsatellite instability and enhanced immune cell mobilisation but this was not evident in prostate tumours. CONCLUSIONS: We have shown that loss or reduced levels of MSH2/MSH6 protein in prostate cancer is associated with poor outcome. However, our data indicate that this is not associated with a statistically significant increase in mutational burden, microsatellite instability or immune cell mobilisation in a cohort of primary prostate cancers.
2020
Landscape of somatic single nucleotide variants and indels in colorectal cancer and impact on survival., Nat Commun, 2020.
Authors
Zaidi SH, Harrison TA, Phipps AI, Steinfelder R, Trinh QM, Qu C, Banbury BL, Georgeson P, Grasso CS, Giannakis M, Adams JB, Alwers E, Amitay EL, Barfield RT, Berndt SI, Borozan I, Brenner H, Brezina S, Buchanan DD, Cao Y, Chan AT, Chang-Claude J, Connolly CM, Drew DA, Farris AB, Figueiredo JC, French AJ, Fuchs CS, Garraway LA, Gruber S, Guinter MA, Hamilton SR, Harlid S, Heisler LE, Hidaka A, Hopper JL, Huang WY, Huyghe JR, Jenkins MA, Krzyzanowski PM, Lemire M, Lin Y, Luo X, Mardis ER, McPherson JD, Miller JK, Moreno V, Mu XJ, Nishihara R, Papadopoulos N, Pasternack D, Quist MJ, Rafikova A, Reid EEG, Shinbrot E, Shirts BH, Stein LD, Teney CD, Timms L, Um CY, Van Guelpen B, Van Tassel M, Wang X, Wheeler DA, Yung CK, Hsu L, Ogino S, Gsur A, Newcomb PA, Gallinger S, Hoffmeister M, Campbell PT, Thibodeau SN, Sun W, Hudson TJ, Peters U
Journal
Nat Commun
Volume
11
Issue
1
Year
2020
DOI
10.1038/s41467-020-17386-z
Pubmed ID
32686686
Abstract
Colorectal cancer (CRC) is a biologically heterogeneous disease. To characterize its mutational profile, we conduct targeted sequencing of 205 genes for 2,105 CRC cases with survival data. Our data shows several findings in addition to enhancing the existing knowledge of CRC. We identify PRKCI, SPZ1, MUTYH, MAP2K4, FETUB, and TGFBR2 as additional genes significantly mutated in CRC. We find that among hypermutated tumors, an increased mutation burden is associated with improved CRC-specific survival (HR = 0.42, 95% CI: 0.21-0.82). Mutations in TP53 are associated with poorer CRC-specific survival, which is most pronounced in cases carrying TP53 mutations with predicted 0% transcriptional activity (HR = 1.53, 95% CI: 1.21-1.94). Furthermore, we observe differences in mutational frequency of several genes and pathways by tumor location, stage, and sex. Overall, this large study provides deep insights into somatic mutations in CRC, and their potential relationships with survival and tumor features.
2019
Annotation of the Giardia proteome through structure-based homology and machine learning., Gigascience, 2019.
Authors
Ansell BRE, Pope BJ, Georgeson P, Emery-Corbin SJ, Jex AR
Journal
Gigascience
Volume
8
Issue
1
Year
2019
DOI
10.1093/gigascience/giy150
Pubmed ID
30520990
Abstract
BACKGROUND: Large-scale computational prediction of protein structures represents a cost-effective alternative to empirical structure determination with particular promise for non-model organisms and neglected pathogens. Conventional sequence-based tools are insufficient to annotate the genomes of such divergent biological systems. Conversely, protein structure tolerates substantial variation in primary amino acid sequence and is thus a robust indicator of biochemical function. Structural proteomics is poised to become a standard part of pathogen genomics research; however, informatic methods are now required to assign confidence in large volumes of predicted structures. AIMS: Our aim was to predict the proteome of a neglected human pathogen, Giardia duodenalis, and stratify predicted structures into high- and lower-confidence categories using a variety of metrics in isolation and combination. METHODS: We used the I-TASSER suite to predict structural models for ∼5,000 proteins encoded in G. duodenalis and identify their closest empirically-determined structural homologues in the Protein Data Bank. Models were assigned to high- or lower-confidence categories depending on the presence of matching protein family (Pfam) domains in query and reference peptides. Metrics output from the suite and derived metrics were assessed for their ability to predict the high-confidence category individually, and in combination through development of a random forest classifier. RESULTS: We identified 1,095 high-confidence models including 212 hypothetical proteins. Amino acid identity between query and reference peptides was the greatest individual predictor of high-confidence status; however, the random forest classifier outperformed any metric in isolation (area under the receiver operating characteristic curve = 0.976) and identified a subset of 305 high-confidence-like models, corresponding to false-positive predictions. High-confidence models exhibited greater transcriptional abundance, and the classifier generalized across species, indicating the broad utility of this approach for automatically stratifying predicted structures. Additional structure-based clustering was used to cross-check confidence predictions in an expanded family of Nek kinases. Several high-confidence-like proteins yielded substantial new insight into mechanisms of redox balance in G. duodenalis-a system central to the efficacy of limited anti-giardial drugs. CONCLUSION: Structural proteomics combined with machine learning can aid genome annotation for genetically divergent organisms, including human pathogens, and stratify predicted structures to promote efficient allocation of limited resources for experimental investigation.
Tumor mutational signatures in sebaceous skin lesions from individuals with Lynch syndrome., Mol Genet Genomic Med, 2019.
Authors
Georgeson P, Walsh MD, Clendenning M, Daneshvar S, Pope BJ, Mahmood K, Joo JE, Jayasekara H, Jenkins MA, Winship IM, Buchanan DD
Journal
Mol Genet Genomic Med
Volume
7
Issue
7
Year
2019
DOI
10.1002/mgg3.781
Pubmed ID
31162827
Abstract
BACKGROUND: Muir-Torre syndrome is defined by the development of sebaceous skin lesions in individuals who carry a germline mismatch repair (MMR) gene mutation. Loss of expression of MMR proteins is frequently observed in sebaceous skin lesions, but MMR-deficiency alone is not diagnostic for carrying a germline MMR gene mutation. METHODS: Whole exome sequencing was performed on three MMR-deficient sebaceous lesions from individuals with MSH2 gene mutations (Lynch syndrome) and three MMR-proficient sebaceous lesions from individuals without Lynch syndrome with the aim of characterizing the tumor mutational signatures, somatic mutation burden, and microsatellite instability status. Thirty predefined somatic mutational signatures were calculated for each lesion. RESULTS: Signature 1 was ubiquitous across the six lesions tested. Signatures 6 and 15, associated with defective DNA MMR, were significantly more prevalent in the MMR-deficient lesions from the MSH2 carriers compared with the MMR-proficient non-Lynch sebaceous lesions (mean ± SD=41.0 ± 8.2% vs. 2.3 ± 4.0%, p = 0.0018). Tumor mutation burden was, on average, significantly higher in the MMR-deficient lesions compared with the MMR-proficient lesions (23.3 ± 11.4 vs. 1.8 ± 0.8 mutations/Mb, p = 0.03). All four sebaceous lesions observed in sun exposed areas of the body demonstrated signature 7 related to ultraviolet light exposure. CONCLUSION: Tumor mutational signatures 6 and 15 and somatic mutation burden were effective in differentiating Lynch-related from non-Lynch sebaceous lesions.
Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software., Gigascience, 2019.
Authors
Georgeson P, Syme A, Sloggett C, Chung J, Dashnow H, Milton M, Lonsdale A, Powell D, Seemann T, Pope B
Journal
Gigascience
Volume
8
Issue
9
Year
2019
DOI
10.1093/gigascience/giz109
Pubmed ID
31544213
Abstract
BACKGROUND: Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. FINDINGS: We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. CONCLUSIONS: Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.
2018
sEst: Accurate Sex-Estimation and Abnormality Detection in Methylation Microarray Data., Int J Mol Sci, 2018.
Authors
Jung CH, Park DJ, Georgeson P, Mahmood K, Milne RL, Southey MC, Pope BJ
Journal
Int J Mol Sci
Volume
19
Issue
10
Year
2018
DOI
10.3390/ijms19103172
Pubmed ID
30326623
Abstract
DNA methylation influences predisposition, development and prognosis for many diseases, including cancer. However, it is not uncommon to encounter samples with incorrect sex labelling or atypical sex chromosome arrangement. Sex is one of the strongest influencers of the genomic distribution of DNA methylation and, therefore, correct assignment of sex and filtering of abnormal samples are essential for the quality control of study data. Differences in sex chromosome copy numbers between sexes and X-chromosome inactivation in females result in distinctive sex-specific patterns in the distribution of DNA methylation levels. In this study, we present a software tool, sEst, which incorporates clustering analysis to infer sex and to detect sex-chromosome abnormalities from DNA methylation microarray data. Testing with two publicly available datasets demonstrated that sEst not only correctly inferred the sex of the test samples, but also identified mislabelled samples and samples with potential sex-chromosome abnormalities, such as Klinefelter syndrome and Turner syndrome, the latter being a feature not offered by existing methods. Considering that sex and the sex-chromosome abnormalities can have large effects on many phenotypes, including diseases, our method can make a significant contribution to DNA methylation studies that are based on microarray platforms.
2017
Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics., Hum Genomics, 2017.
Authors
Mahmood K, Jung CH, Philip G, Georgeson P, Chung J, Pope BJ, Park DJ
Journal
Hum Genomics
Volume
11
Issue
1
Year
2017
DOI
10.1186/s40246-017-0104-8
Pubmed ID
28511696
Abstract
BACKGROUND: Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. RESULTS: Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. CONCLUSIONS: These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.
Diagnostic Impact and Cost-effectiveness of Whole-Exome Sequencing for Ambulant Children With Suspected Monogenic Conditions., JAMA Pediatr, 2017.
Authors
Tan TY, Dillon OJ, Stark Z, Schofield D, Alam K, Shrestha R, Chong B, Phelan D, Brett GR, Creed E, Jarmolowicz A, Yap P, Walsh M, Downie L, Amor DJ, Savarirayan R, McGillivray G, Yeung A, Peters H, Robertson SJ, Robinson AJ, Macciocca I, Sadedin S, Bell K, Oshlack A, Georgeson P, Thorne N, Gaff C, White SM
Journal
JAMA Pediatr
Volume
171
Issue
9
Year
2017
DOI
10.1001/jamapediatrics.2017.1755
Pubmed ID
28759686
Abstract
IMPORTANCE: Optimal use of whole-exome sequencing (WES) in the pediatric setting requires an understanding of who should be considered for testing and when it should be performed to maximize clinical utility and cost-effectiveness. OBJECTIVES: To investigate the impact of WES in sequencing-naive children suspected of having a monogenic disorder and evaluate its cost-effectiveness if WES had been available at different time points in their diagnostic trajectory. DESIGN, SETTING, AND PARTICIPANTS: This prospective study was part of the Melbourne Genomics Health Alliance demonstration project. At the ambulatory outpatient clinics of the Victorian Clinical Genetics Services at the Royal Children's Hospital, Melbourne, Australia, children older than 2 years suspected of having a monogenic disorder were prospectively recruited from May 1 through November 30, 2015, by clinical geneticists after referral from general and subspecialist pediatricians. All children had nondiagnostic microarrays and no prior single-gene or panel sequencing. EXPOSURES: All children underwent singleton WES with targeted phenotype-driven analysis. MAIN OUTCOMES AND MEASURES: The study examined the clinical utility of a molecular diagnosis and the cost-effectiveness of alternative diagnostic trajectories, depending on timing of WES. RESULTS: Of 61 children originally assessed, 44 (21 [48%] male and 23 [52%] female) aged 2 to 18 years (mean age at initial presentation, 28 months; range, 0-121 months) were recruited, and a diagnosis was achieved in 23 (52%) by singleton WES. The diagnoses were unexpected in 8 of 23 (35%), and clinical management was altered in 6 of 23 (26%). The mean duration of the diagnostic odyssey was 6 years, with each child having a mean of 19 tests and 4 clinical genetics and 4 nongenetics specialist consultations, and 26 (59%) underwent a procedure while under general anesthetic for diagnostic purposes. Economic analyses of the diagnostic trajectory identified that WES performed at initial tertiary presentation resulted in an incremental cost savings of A$9020 (US$6838) per additional diagnosis (95% CI, A$4304-A$15 404 [US$3263-US$11 678]) compared with the standard diagnostic pathway. Even if WES were performed at the first genetics appointment, there would be an incremental cost savings of A$5461 (US$4140) (95% CI, A$1433-A$10 557 [US$1086- US$8004]) per additional diagnosis compared with the standard diagnostic pathway. CONCLUSIONS AND RELEVANCE: Singleton WES in children with suspected monogenic conditions has high diagnostic yield, and cost-effectiveness is maximized by early application in the diagnostic pathway. Pediatricians should consider early referral of children with undiagnosed syndromes to clinical geneticists.
Single nucleotide-level mapping of DNA double-strand breaks in human HEK293T cells., Genom Data, 2017.
Authors
Pope BJ, Mahmood K, Jung CH, Georgeson P, Park DJ
Journal
Genom Data
Volume
11
Issue
None
Year
2017
DOI
10.1016/j.gdata.2016.11.007
Pubmed ID
27942458
Abstract
Constitutional biological processes involve the generation of DNA double-strand breaks (DSBs). The production of such breaks and their subsequent resolution are also highly relevant to neurodegenerative diseases and cancer, in which extensive DNA fragmentation has been described Stephens et al. (2011), Blondet et al. (2001). Tchurikov et al. Tchurikov et al. (2011, 2013) have reported previously that frequent sites of DSBs occur in chromosomal domains involved in the co-ordinated expression of genes. This group report that hot spots of DSBs in human HEK293T cells often coincide with H3K4me3 marks, associated with active transcription Kravatsky et al. (2015) and that frequent sites of DNA double-strand breakage are likely to be relevant to cancer genomics Tchurikov et al. (2013, 2016) . Recently, they applied a RAFT (rapid amplification of forum termini) protocol that selects for blunt-ended DSB sites and mapped these to the human genome within defined co-ordinate 'windows'. In this paper, we re-analyse public RAFT data to derive sites of DSBs at the single-nucleotide level across the built genome for human HEK293T cells (https://figshare.com/s/35220b2b79eaaaf64ed8). This refined mapping, combined with accessory ENCODE data tracks and ribosomal DNA-related sequence annotations, will likely be of value for the design of clinically relevant targeted assays such as those for cancer susceptibility, diagnosis, treatment-matching and prognostication.
Diagnostic and cost utility of whole exome sequencing in peripheral neuropathy., Ann Clin Transl Neurol, 2017.
Authors
Walsh M, Bell KM, Chong B, Creed E, Brett GR, Pope K, Thorne NP, Sadedin S, Georgeson P, Phelan DG, Day T, Taylor JA, Sexton A, Lockhart PJ, Kiers L, Fahey M, Macciocca I, Gaff CL, Oshlack A, Yiu EM, James PA, Stark Z, Ryan MM, Melbourne Genomics Health Alliance
Journal
Ann Clin Transl Neurol
Volume
4
Issue
5
Year
2017
DOI
10.1002/acn3.409
Pubmed ID
28491899
Abstract
OBJECTIVE: To explore the diagnostic utility and cost effectiveness of whole exome sequencing (WES) in a cohort of individuals with peripheral neuropathy. METHODS: Singleton WES was performed in individuals recruited though one pediatric and one adult tertiary center between February 2014 and December 2015. Initial analysis was restricted to a virtual panel of 55 genes associated with peripheral neuropathies. Patients with uninformative results underwent expanded analysis of the WES data. Data on the cost of prior investigations and assessments performed for diagnostic purposes in each patient was collected. RESULTS: Fifty patients with a peripheral neuropathy were recruited (median age 18 years; range 2-68 years). The median time from initial presentation to study enrollment was 6 years 9 months (range 2 months-62 years), and the average cost of prior investigations and assessments for diagnostic purposes AU$4013 per patient. Eleven individuals received a diagnosis from the virtual panel. Eight individuals received a diagnosis following expanded analysis of the WES data, increasing the overall diagnostic yield to 38%. Two additional individuals were diagnosed with pathogenic copy number variants through SNP microarray. CONCLUSIONS: This study provides evidence that WES has a high diagnostic utility and is cost effective in patients with a peripheral neuropathy. Expanded analysis of WES data significantly improves the diagnostic yield in patients in whom a diagnosis is not found on the initial targeted analysis. This is primarily due to diagnosis of conditions caused by newly discovered genes and the resolution of complex and atypical phenotypes.
An Emerging Female Phenotype with Loss-of-Function Mutations in the Aristaless-Related Homeodomain Transcription Factor ARX., Hum Mutat, 2017.
Authors
Mattiske T, Moey C, Vissers LE, Thorne N, Georgeson P, Bakshi M, Shoubridge C
Journal
Hum Mutat
Volume
38
Issue
5
Year
2017
DOI
10.1002/humu.23190
Pubmed ID
28150386
Abstract
The devastating clinical presentation of X-linked lissencephaly with abnormal genitalia (XLAG) is invariably caused by loss-of-function mutations in the Aristaless-related homeobox (ARX) gene. Mutations in this X-chromosome gene contribute to intellectual disability (ID) with co-morbidities including seizures and movement disorders such as dystonia in affected males. The detection of affected females with mutations in ARX is increasing. We present a family with multiple affected individuals, including two females. Two male siblings presenting with XLAG were deceased prior to full-term gestation or within the first few weeks of life. Of the two female siblings, one presented with behavioral disturbances, mild ID, a seizure disorder, and complete agenesis of the corpus callosum (ACC), similar to the mother's phenotype. A novel insertion mutation in Exon 2 of ARX was identified, c.982delCinsTTT predicted to cause a frameshift at p.(Q328Ffs
2016
UNDR ROVER - a fast and accurate variant caller for targeted DNA sequencing., BMC Bioinformatics, 2016.
Authors
Park DJ, Li R, Lau E, Georgeson P, Nguyen-Dumont T, Pope BJ
Journal
BMC Bioinformatics
Volume
17
Issue
None
Year
2016
DOI
10.1186/s12859-016-1014-9
Pubmed ID
27083325
Abstract
BACKGROUND: Previously, we described ROVER, a DNA variant caller which identifies genetic variants from PCR-targeted massively parallel sequencing (MPS) datasets generated by the Hi-Plex protocol. ROVER permits stringent filtering of sequencing chemistry-induced errors by requiring reported variants to appear in both reads of overlapping pairs above certain thresholds of occurrence. ROVER was developed in tandem with Hi-Plex and has been used successfully to screen for genetic mutations in the breast cancer predisposition gene PALB2. ROVER is applied to MPS data in BAM format and, therefore, relies on sequence reads being mapped to a reference genome. In this paper, we describe an improvement to ROVER, called UNDR ROVER (Unmapped primer-Directed ROVER), which accepts MPS data in FASTQ format, avoiding the need for a computationally expensive mapping stage. It does so by taking advantage of the location-specific nature of PCR-targeted MPS data. RESULTS: The UNDR ROVER algorithm achieves the same stringent variant calling as its predecessor with a significant runtime performance improvement. In one indicative sequencing experiment, UNDR ROVER (in its fastest mode) required 8-fold less sequential computation time than the ROVER pipeline and 13-fold less sequential computation time than a variant calling pipeline based on the popular GATK tool. UNDR ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). It requires as input a tab-delimited format file containing primer sequence information, a FASTA format file containing the reference genome sequence, and paired FASTQ files containing sequence reads. Primer sequences at the 5' end of reads associate read-pairs with their targeted amplicon and, thus, their expected corresponding coordinates in the reference genome. The primer-intervening sequence of each read is compared against the reference sequence from the same location and variants are identified using the same algorithm as ROVER. Specifically, for a variant to be 'called' it must appear at the same location in both of the overlapping reads above user-defined thresholds of minimum number of reads and proportion of reads. CONCLUSIONS: UNDR ROVER provides the same rapid and accurate genetic variant calling as its predecessor with greatly reduced computational costs.
A prospective evaluation of whole-exome sequencing as a first-tier molecular test in infants with suspected monogenic disorders., Genet Med, 2016.
Authors
Stark Z, Tan TY, Chong B, Brett GR, Yap P, Walsh M, Yeung A, Peters H, Mordaunt D, Cowie S, Amor DJ, Savarirayan R, McGillivray G, Downie L, Ekert PG, Theda C, James PA, Yaplito-Lee J, Ryan MM, Leventer RJ, Creed E, Macciocca I, Bell KM, Oshlack A, Sadedin S, Georgeson P, Anderson C, Thorne N, Melbourne Genomics Health Alliance, Gaff C, White SM
Journal
Genet Med
Volume
18
Issue
11
Year
2016
DOI
10.1038/gim.2016.1
Pubmed ID
26938784
Abstract
PURPOSE: To prospectively evaluate the diagnostic and clinical utility of singleton whole-exome sequencing (WES) as a first-tier test in infants with suspected monogenic disease. METHODS: Singleton WES was performed as a first-tier sequencing test in infants recruited from a single pediatric tertiary center. This occurred in parallel with standard investigations, including single- or multigene panel sequencing when clinically indicated. The diagnosis rate, clinical utility, and impact on management of singleton WES were evaluated. RESULTS: Of 80 enrolled infants, 46 received a molecular genetic diagnosis through singleton WES (57.5%) compared with 11 (13.75%) who underwent standard investigations in the same patient group. Clinical management changed following exome diagnosis in 15 of 46 diagnosed participants (32.6%). Twelve relatives received a genetic diagnosis following cascade testing, and 28 couples were identified as being at high risk of recurrence in future pregnancies. CONCLUSIONS: This prospective study provides strong evidence for increased diagnostic and clinical utility of singleton WES as a first-tier sequencing test for infants with a suspected monogenic disorder. Singleton WES outperformed standard care in terms of diagnosis rate and the benefits of a diagnosis, namely, impact on management of the child and clarification of reproductive risks for the extended family in a timely manner.Genet Med 18 11, 1090-1096.