HiTIME speeds up detection of unknown drug metabolites in LC-MS data

Large datasets generated by researchers using liquid chromatography-mass spectrometry to identify unknown drug metabolites can now be processed more quickly and efficiently by using HiTIME, a novel memory efficient and scalable parallelisation algorithm which allows timely processing  on commodity computing hardware.

Computer scientists and chemists from the University of Melbourne and University of NSW have now published a paper documenting their successful work in developing this algorithm at Software X, Elsevier’s open access journal: Michael G. Leeming, Andrew P. Isaac, Luke Zappia, Richard A.J. O’Hair, William A. Donald and Bernard J. Pope, HiTIME: An efficient model-selection approach for the detection of unknown drug metabolites in LC-MS data.

The identification of metabolites plays an important role in understanding drug efficacy and safety however these compounds are often difficult to identify in complex mixtures. One approach to identify drug metabolites involves utilising differentially isotopically labelled drug compounds to create unique isotopic signals that can be detected by liquid chromatography-mass spectrometry (LC-MS). User-friendly, efficient, computational tools that allow selective detection of these signals are lacking. Our computer scientists have developed an efficient open-source software tool called HiTIME (High-Resolution Twin-Ion Metabolite Extraction) which filters twin-ion signals in LC-MS data.

HiTIME is a sensitive tool for the detection of twin-ion signals in LC-MS data that has been successfully demonstrated for the detection of paracetamol (APAP) metabolites in blood plasma of APAP-treated rats and endogenous proteins covalently bound to electrophilic APAP metabolites. HiTIME accepts inputs and produces outputs in standard mzML format, facilitating integration with other tools and workflows. A significant advantage of HiTIME is that it supports inputs in both profile and centroid modes, and its novel memory efficient and scalable parallelisation algorithm allows timely processing of large data sets on commodity computing hardware.

Image source: Wikipedia

This work was completed with support from a University of Melbourne Interdisciplinary Seed Grant, the Victorian Life Sciences Computation Initiative (now Melbourne Bioinformatics), Assoc. Prof. Pope’s Victorian Health and Medical Research Fellowship, Australia and Dr Michael Leeming’s Elizabeth and Vernon Puzey PhD scholarship and The University of Melbourne’s Norma Hilda Schuster scholarship.

Galaxy Australia part of global effort to research coronavirus

Two weeks ago, members of Melbourne Bioinformatics (Andrew Lonie (Director Australian BioCommons) and Simon Gladman) contributed to an extraordinary open science effort to pull together all the current public genomic data on SARS-CoV-2, in demonstration of a collaborative, globally accessible, rapid, reproducible research response to this current public health crisis. It’s a wonderful example of the benefit of shared bioinformatics infrastructure for medical research.

No more business as usual: agile and effective responses to emerging pathogen threats require open data and open analytics (link to BioRxiv paper)

Abstract
The current state of much of the Wuhan pneumonia virus (COVID-19) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies, and requires unimpeded access to data, analysis tools, and computational infrastructure. Here we show that community efforts in developing open analytical software tools over the past ten years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all COVID-19 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and to (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises.

Their draft paper concludes:
“… anyone can use the open workflows described here to analyze the new data. In an age of digital connectedness, open, highly accessible, globally shared data and analysis platforms have the potential to transform the way biomedical research is done, opening the way to ‘global research markets’, where competition arises from deriving understanding rather than access to samples and data. Other disciplines have embraced the benefits of global data generation and sharing, astronomy and high energy physics being two highly successful examples. We have the opportunity to mirror their successes in infrastructure funding by demonstrating that biological research can embrace the same global perspective on common infrastructure investment and data sharing…”

This resource is also empowering our global Galaxy community (we’ve run workshops in Africa, SE Asia, EU, USA, AU) and our students who have trained with us on the Galaxy platform who can now get in and do their own research.

Galaxy Australia is hosted by the University of Melbourne and the Queensland Cyber Infrastructure Foundation and funded through NCRIS. It is one of the 4 global Galaxy platforms that participated in the project. 

‘Janis’ is now being used to analyse a mixture of whole genome and exome sequencing data at Peter Mac, WEHI & Melbourne Bioinformatics

Researchers across the Parkville Precinct and beyond now have a simpler way to run their data analysis pipelines on multiple computing platforms thanks to recent developments in the Portable Pipelines Project.

Over 2019, our team of experts across WEHI, PeterMac and Melbourne Bioinformatics built Janis, a new Python framework for building and running workflows. Janis provides a simple Application Programming Interface to build workflows that are portable and reproducible across many compute environments and it provides workflow specifications (Common Workflow Language [CWL] and Workflow Description Language [WDL]) as publishable artifacts.

2019 developments

  • Janis’ portability has been validated on systems at Peter Mac, WEHI, University of Melbourne, Pawsey Supercomputing Centre in WA and the Google Cloud Platform.
  • The project has generated considerable community awareness and support on GitHub and other online platforms.
  • Janis is in alpha testing now and is openly available for researchers to download and use through GitHub (support is available through GitHub and Gitter).
  • It is being successfully used by researchers at Peter Mac, WEHI and Melbourne Bioinformatics to analyse a mixture of whole genome and exome sequencing data.
  • Documentation now available here.

Community uptake

Uptake across the life science community has further validated Janis’ usefulness, with Kersten Breuer from the German Cancer Research Center adding support for Janis workflows to CWLab. And at a combined Broad Institute and AWS hackathon Janis was adopted to improve cloud support for Cromwell.

The project has now contributed to open source repositories including the CWL, WDL, Cromwell and more, and in particular the team has worked to document Singularity support for Cromwell.

Presentations

  • Bioinformatics Open Source Conference (BOSC) 2019 – Basel
  • Victorian Bioinformatics Seminar (VCBS) 2019 – Melbourne
  • Australian Bioinformatics and Computational Biology Society (ABACBS) 2019 – Sydney
  • WEHI seminar series – Melbourne
  • Australian BioCommons briefing online across Australia

Upcoming events

In early 2020, workshops on Janis will run as part of the Parkville Bioinformatics Training Group’s activities led out of Melbourne Bioinformatics.

What’s next for Janis?

In 2020 the team is looking to support more researchers to use Janis to analyse their data and build new workflows. The team aims to build exemplar pipelines for analysing RNA seq data, run workshops, complete documentation and work closely with researchers to increase functionality.

This has been an exciting and productive collaboration to date. And given Janis is applicable for all research domains, it will also be interesting to see how it develops a life beyond its original intended use.

Prostate Cancer Research project

2019 update:

Chol Hee Jung has been carrying out the quality control and processing of Australian data and the identification and analysis of genomic variants using various analysis pipelines on this major research project. He also handled the local management of data and organised data sharing with international collaborators.

Further funding

Preliminary investigations have contributed to the award of $4million over 3 years for the Australian PRECEPT Program funded by the Prostate Cancer Research Alliance, an Australian Government and Movember Foundation Collaboration. Lead investigator is A/Prof Niall Corcoran and Chief investigators include A/Prof Bernie Pope and A/Prof Danny Park from Melbourne Bioinformatics.

Project description

This project aims to reveal how the tumour progresses to lethal metastatic stages and the detailed view of tumour cells by integrated genomic and epigenomic variants analyses from cohort patients. A part of the work from this project also contributes to the international Pan-Prostate Cancer Group collaboration.

Project collaborators

Prof Christopher Hovens and A/Prof Niall Corcoran (project lead), Dr Ken Chow (sample information curation), Royal Melbourne Hospital

Prof Tony Papenfuss, Ms Jocelyn Penington, Dr Justin Bedo (analysis), WEHI

A/Prof Bernie Pope, Dr Chol-hee Jung, A/Prof Danny Park, Bioinformaticians and Mr Edmund Lau, Data analyst and manager, Melbourne Bioinformatics

Grants

This project is supported by an Australian Prostate Cancer Research grant awarded to Prof Christopher Hovens.

Engineering microbial symbionts that increase coral climate resilience

2019 update:

This work progressed well in 2019, with Gayle Philip and Dieter Bulach sharing their expertise with the Prof van Oppen’s team.

Along with conducting her own analysis of high-throughput data generated by the lab, Gayle has been upskilling members of the lab to be able to perform their own analyses. This has included implementing systems for storage of data in MediaFlux, communication in the lab through MS Teams and teaching lab members how to access the University of Melbourne’s High Performance Computing system (Spartan). Through the lab’s association with the Environmental Microbiology Research Initiative (EMRI), Gayle has delivered workshops for EMRI including: Galaxy and Data Formatting, Nectar and Spartan HPC and Introduction to Unix.

Project description

This research focuses on microbial symbiosis in corals, adaptation/acclimatisation to climate change, and connectivity of coral reefs. It is particularly focussed on ‘assisted evolution’, where mechanisms of adaptation and acclimatisation in corals and genetic manipulations to enhance stress tolerance and fitness of corals in a changing environment are explored.

Read more about Prof van Oppen’s work here.

Project collaborators

Prof Madeleine van Oppen, Chair, Marine Biology, University of Melbourne (School of BioSciences) and Senior Principal Research Scientist at the Australian Institute of Marine Science (AIMS))

The van Oppen lab team

Dr Gayle Philip, Melbourne Bioinformatics

Dr Dieter Bulach, Melbourne Bioinformatics

Grants

This project is supported by an ARC Australian Laureate Fellowship (2019-23) awarded to Prof van Oppen.

Genetics of Colorectal Cancer (CRC)

2019 update:

This project continued to progress well in 2019, with Khalid Mahmood sharing his expertise with the group.

Several significant collaborations have developed across several projects in the laboratory. The focus of these collaborations has been to use genomics and associated clinical data to characterise CRCs to improve screening and diagnostics strategies for patients. Some of the key tasks have been to deploy state of the art bioinformatics methods to analyse germline and tumour genomics sequencing data to characterise different subtypes of CRCs as well as identity new variations and genes that predispose families to higher risk of developing CRC. Work from these collaborations has resulted in several publications under preparation or review. In addition, work has involved supervision of several honours students.

Publications

Tumor mutational signatures in sebaceous skin lesions from individuals with Lynch syndrome, Georgeson et al, Molecular Genetics and Genomic Medicine.

sEst: Accurate Sex-Estimation and Abnormality Detection in Methylation Microarray Data, Jung et al, International Journal of Molecular Science.

Presentations

Seminar, University of Melbourne Centre for Cancer Research

Oral presentation, International Conference InSiGHT (International Society for Gastrointestinal Hereditary Tumours).

Project description

The focus of the Colorectal Oncogenomics Group (COG) led by Assoc Prof Daniel Buchanan includes the identification and investigation of clinically and biologically relevant subtype of colorectal cancers (CRC) in both familial and non-familial settings. The analysis involves a wide range of multi-disciplinary techniques ranging from computational biology, epigenetics and genomics to analyse tumour and pre-malignant lesions in terms of their histopathological features. This very successful collaboration covers a range of colorectal cancer projects including those forming part of the University-hosted NHMRC Centre for Research Excellence in Optimising Screening for Colorectal Cancer, whose vision is to create and implement a personalised approach to colorectal cancer screening to reduce the number of new cases and deaths from this common disease.

Project collaborators

Assoc Prof Daniel Buchanan, University of Melbourne

Assoc Prof Bernie Pope, Melbourne Bionformatics

Prof Ingrid Winship, Royal Melbourne Hospital & University of Melbourne

Prof Mark Jenkins, University of Melbourne Centre for Cancer Research

Dr Khalid Mahmood, Melbourne Bioinformatics

Grants

This project is supported by significant grants from NHMRC, NIH and Cancer Australia.

Identification and function of genes that increase risk for endometriosis

2019 update:

This project has made some very good progress in 2019, with Jessica Chung providing the expertise from our team to enable

  •  development of a method to normalise cycle stage effects in endometrium expression data
  •  developed an interactive R Shiny application where the research group can explore microarray and RNA-seq data with their own parameters
  •  analysis of endometriosis severity and BMI, lipidomics data, uterine receptivity, and clinical factors that influence repeat surgery.

Project description

Endometriosis is a disorder that affects 5 – 10% of reproductive age women in Australia, causing severe pain and infertility. This project aims to use genomic data to identify candidate genes that increase the risk of endometriosis. We are also investigating mechanisms that cause reduced endometrial receptivity, the association between BMI and endometriosis, and clinical indicators that can predict repeat surgery for endometriosis.

Project collaborators

Prof Peter Rogers, Professor Of Women’s Health Research, Obstetrics and Gynaecology Royal Women’s Hospital/Mercy
Dr Sarah Carson, Research Fellow, Obstetrics And Gynaecology Royal Women’s Hospital/Mercy
Dr Wan Tinn Teh, Clinician, Royal Women’s Hospital & Melbourne IVF
Ms Jessica Chung, Melbourne Bioinformatics

Grant

NHMRC: Identification And Function Of Genes That Increase Risk For Endometriosis (Grant number: 1105321, 2016-2019)

Using genetic testing to solve brothers’ health mystery

It’s one thing to identify a genetic disorder, but another to successfully treat it. In great news this month, Melbourne Genomics shared the story of two brothers whose genetic disorder was identified and treated, here in Melbourne. Thanks to all the teams involved, including our own team working on the genomic data analysis pipelines.

Full story.

 

 

Hi-Plex 2 just released: a simple and robust approach to targeted sequencing-based genetic screening

Hi-Plex was developed by our Molecular Biologist, Assoc Prof Daniel Park and Computer Scientist, Assoc Prof Bernard Pope, co-leads of our Human Genomics Group at Melbourne Bioinformatics, to simplify processes and reduce costs on projects needing targeted sequencing of panels of genes across large numbers of specimens. It brings greater efficiency and accuracy to all such research projects – big and small.

Go here for background to the original Hi-Plex. 

Hi-Plex 2, published July 2019 is suitable for an extensive range of clinical and research applications and is complemented by software for primer design and variant calling. It still enables a PCR-based target-enrichment system, unrivalled in terms of simplicity, accuracy and cost.

What improvements have been made?

By ironing out some problems incurred when working with bigger targets, Hi-Plex 2 now more effectively enables the robust construction of small-to-medium panel-size libraries while maintaining the low cost, simplicity and accuracy benefits of the Hi-Plex platform. Hi-Plex 2 returns substantial reduction of off-target amplification to enable library construction for small to medium sized design panels not possible using the previous Hi-Plex chemistry.

Contact the Hi-Plex team for information and collaboration enquiries from tech transfer, reagent design, methods, data analysis, including bespoke analysis pipelines.

Genovic now ready for use in clinical genomic testing

25 February 2019

Following our July 2018 update, Genovic, Victoria’s shared clinical system for genomics, has now reached a further milestone. Last week Melbourne Genomics announced that GenoVic is now ready for use in clinical genomic testing.

Victorian Clinical Genetics Services (VCGS) is the first laboratory to agree to use the GenoVic system as its primary tool for genomic interpretation. The system will support VCGS’ medical scientists to identify and report on the exact changes in a patient’s DNA driving their condition or targetable for treatment.

Further agreements on implementing GenoVic are in process with Alliance members Monash Health, AGRF and The Royal Melbourne Hospital.

Get the full story at the Melbourne Genomics website.

Congratulations to Anthony Marty and his team on their contribution to the success of this project.

Working to embed genomics into everyday healthcare

Technology Manager, Melbourne Genomics & Software Engineer, Melbourne Bioinformatics, Mr Anthony Marty

Members of the Melbourne Genomics team recently travelled to the US to share experiences with colleagues. Our own Anthony Marty, as a member of the team building the Alliance’s shared clinical data system for genomics, co-presented at the DNAnexus Connect conference in San Francisco on ‘How DNAnexus is part of a system to embed genomics into everyday healthcare.’

Read the full story on Melbourne Genomics website.

GOBLET launches Critical Guides for bioinformatics training on f1000’s Bioinformatics Education & Training Collection

Since its foundation in 2012, the Global Organisation for Bioinformatics Learning, Education & Training (GOBLET) has built its activity through the support of its global membership and energetic and committed elected officers. Its vision is to unite, inspire and equip bioinformatics trainers worldwide through cultivating the global bioinformatics trainer community, setting standards and providing high-quality resources to support learning, education and training. Support for bioinformatics training is coordinated through various Committees: Learning, Education and Training; Fundraising; Standards; Outreach & PR; Technical.

GOBLET members now act as Community Advisors to the recent f1000 initiative: the f1000 Bioinformatics Education and Training Collection. As part of that Collection, in September 2018, a new Introduction to Bioinformatics series, containing a set of Critical Guides was published. The first Guides in the series cover: UnixBLASTUniProtKBUniProtKB Flat-file FormatInterPro, and PDB:

EMBL-ABR is a GOLD member of GOBLET and Sonika Tyagi from EMBL-ABR’s Monash Bioinformatics Node currently sits on the Standards Committee. As a Node of EMBL-ABR, Melbourne Bioinformatics has access to all GOBLET activities and resources.

Read more about GOBLET’s first five years in this 2017 report: Full report. Executive Summary.

For all GOBLET questions, their Executive Board may be reached via info@nullmygoblet.org.

____________________________________________________________________________________

Content provided on behalf of GOBLET’s Executive Board by Dr Celia W.G. van Gelder, Programme Manager DTL Learning, Training Coordinator ELIXIR-NL, ELIXIR Training Platform Leader

Working towards an Australian Bioinformatics Commons

This week project convenor and Director, Melbourne Bioinformatics and EMBL-ABR, updated the EMBL-ABR International Scientific Advisory Group, as consultants to this project, on progress to date. A summary follows:

1. Phase 3a of the Australian Biosciences Data Capability project (now the Australian Bioinformatics Commons) was an extremely useful use-case activity to determine the technical and implementation details for a national bioinformatics infrastructure investment. It confirmed that there are a number of bioinformatics services that would provide genuine value across the national researcher community; additionally, a national training program in concert with national infrastructure resources and closely aligned with international programs would provide broad benefit. The full report is available here.

2. Earlier this year a very significant multi-year NCRIS investment in both Bioplatforms Australia (BPA) (~$110m over 5 years) and the Australian Research Data Commons (ARDC) (~$200m over 5 years) was announced. These investments are so significant that both bodies are now each undertaking strategic review processes, intended to drive multi-year infrastructure programs starting July 2019 in both cases.

3. As part of BPA’s strategic review, and funded and endorsed by BPA, this project will now build on this national consultation and planning work done to date to propose a detailed infrastructure and expertise investment strategy. It is expected that the proposal will be comprehensive and contain alternative models for how such infrastructure might be funded, delivered and managed. It will be very important that this project continues to consult closely with the ARDC during this planning period to make sure ambitions align.

Each member of the national Reference Group will now be contacted once more, to help work towards the next phase of this activity, and to assemble some smaller community-aligned groups from the Reference Group to explore specific objectives and plans.

Galaxy Australia August 2018 update

The ongoing evolution of services behind the GVL, including Galaxy Australia, saw a number of solid outcomes for our teams at Melbourne Bioinformatics and QCIF throughout May to August.

Galaxy Community Conference

Key Galaxy developer Simon Gladman and UQ’s Derek Benson attended the Galaxy Community Conference held in Portland, Oregon from 25-30 June 2018. Simon gave a presentation on Galaxy Australia and how we are contributing to the global community efforts for the project. He was also an invited panel member at five other sessions. We are very pleased to see Galaxy Australia is playing a key role in the development of this exciting, active community and look forward to continuing to do so.

Winter School

QFAB’s Gareth Price, presented on Galaxy Australia and the GVL at the University of Queensland (UQ) -hosted Winter School in Mathematical and Computational Biology held in early July. Alongside, at the Australian Society of Microbiology’s annual conference, Gareth, supported by Scott Beatson (conference co-organiser), presented a 1.5hr workshop on Galaxy focussed on resources for microbiologists.

Rolling out national Galaxy Australia training

Throughout 2018 the Galaxy Australia project team is working with the EMBL-ABR Hub and Nodes to roll out a national bioinformatics training program. Following a successful two-day facilitator training workshop held in Melbourne in July, four workshops are planned for introducing Galaxy Australia capabilities to researchers, with demonstrations based around different themes: genome assembly, variant detection, RNA-seq and metagenomics.

The first of these workshops (on 22 Wednesday) was led by Dr Anna Syme from Melbourne Bioinformatics, who is coordinating training for Galaxy Australia, with trained facilitators on hand at a ten venues across Australia. These facilitators are supporting their local participants during the live online training event. With over 100 registrants for the first workshop, we are confident this hybrid delivery model is helping to meet the growing demand for bioinformatics training in Australia, where we are always challenged by the tyranny of distance.

The future for the GVL?

Further suggestions for how the GVL may develop further include a wish list of new tools such as Pacbio, Nanostring and Nanopore analysis. The project team is now considering the possibility of including non-genomic tools in Galaxy Australia (metabolomics, proteomics etc) to make it more of an extended –omics platform. With the enthusiasm for what the Galaxy platform is delivering amongst the growing global community of developers and users, this might possibly be the next phase in the GVL’s development.

Release 1 of Genovic: Melbourne Genomics’ shared clinical system for genomics

4 July 2018

Melbourne Genomics is excited to have completed delivery of Release 1 of Genovic: its shared clinical system for genomics. This point marks conclusion of the first phase, which has:

  • established and implemented pipelines built on DNAnexus
  • selected a curation tool and implemented this as a shared system
  • integrated all these services.

The next phase of GenoVic (Release 2) will involve working with the laboratories to integrate their systems with GenoVic, enabling clinical tests to be operational in the system.

GenoVic integration lead and Melbourne Bioinformatics software engineer, Anthony Marty said the completion of Release 1 was a significant milestone for the Alliance: 

The system we have established now successfully runs singleton germline samples from end-to-end (that is, from sequencing, through curation and to production of a clinical report). We are just weeks away from having tumour/normal and trio samples supported within the system, as well.

It has taken a tremendous amount of work from everyone across the Alliance to get to this point. I am now looking forward to working with the laboratories to implement their clinical workflows.

Follow all the news from the Melbourne Genomics website.