Members of the Melbourne Genomics team recently travelled to the US to share experiences with colleagues. Our own Anthony Marty, as a member of the team building the Alliance’s shared clinical data system for genomics, co-presented at the DNAnexus Connect conference in San Francisco on ‘How DNAnexus is part of a system to embed genomics into everyday healthcare.’
Since its foundation in 2012, the Global Organisation for Bioinformatics Learning, Education & Training (GOBLET) has built its activity through the support of its global membership and energetic and committed elected officers. Its vision is to unite, inspire and equip bioinformatics trainers worldwide through cultivating the global bioinformatics trainer community, setting standards and providing high-quality resources to support learning, education and training. Support for bioinformatics training is coordinated through various Committees: Learning, Education and Training; Fundraising; Standards; Outreach & PR; Technical.
GOBLET members now act as Community Advisors to the recent f1000 initiative: the f1000 Bioinformatics Education and Training Collection. As part of that Collection, in September 2018, a new Introduction to Bioinformatics series, containing a set of Critical Guides was published. The first Guides in the series cover: Unix, BLAST, UniProtKB, UniProtKB Flat-file Format, InterPro, and PDB:
EMBL-ABR is a GOLD member of GOBLET and Sonika Tyagi from EMBL-ABR’s Monash Bioinformatics Node currently sits on the Standards Committee. As a Node of EMBL-ABR, Melbourne Bioinformatics has access to all GOBLET activities and resources.
For all GOBLET questions, their Executive Board may be reached via firstname.lastname@example.org.
Content provided on behalf of GOBLET’s Executive Board by Dr Celia W.G. van Gelder, Programme Manager DTL Learning, Training Coordinator ELIXIR-NL, ELIXIR Training Platform Leader
This week project convenor and Director, Melbourne Bioinformatics and EMBL-ABR, updated the EMBL-ABR International Scientific Advisory Group, as consultants to this project, on progress to date. A summary follows:
1. Phase 3a of the Australian Biosciences Data Capability project (now the Australian Bioinformatics Commons) was an extremely useful use-case activity to determine the technical and implementation details for a national bioinformatics infrastructure investment. It confirmed that there are a number of bioinformatics services that would provide genuine value across the national researcher community; additionally, a national training program in concert with national infrastructure resources and closely aligned with international programs would provide broad benefit. The full report is available here.
2. Earlier this year a very significant multi-year NCRIS investment in both Bioplatforms Australia (BPA) (~$110m over 5 years) and the Australian Research Data Commons (ARDC) (~$200m over 5 years) was announced. These investments are so significant that both bodies are now each undertaking strategic review processes, intended to drive multi-year infrastructure programs starting July 2019 in both cases.
3. As part of BPA’s strategic review, and funded and endorsed by BPA, this project will now build on this national consultation and planning work done to date to propose a detailed infrastructure and expertise investment strategy. It is expected that the proposal will be comprehensive and contain alternative models for how such infrastructure might be funded, delivered and managed. It will be very important that this project continues to consult closely with the ARDC during this planning period to make sure ambitions align.
The ongoing evolution of services behind the GVL, including Galaxy Australia, saw a number of solid outcomes for our teams at Melbourne Bioinformatics and QCIF throughout May to August.
Galaxy Community Conference
Key Galaxy developer Simon Gladman and UQ’s Derek Benson attended the Galaxy Community Conference held in Portland, Oregon from 25-30 June 2018. Simon gave a presentation on Galaxy Australia and how we are contributing to the global community efforts for the project. He was also an invited panel member at five other sessions. We are very pleased to see Galaxy Australia is playing a key role in the development of this exciting, active community and look forward to continuing to do so.
QFAB’s Gareth Price, presented on Galaxy Australia and the GVL at the University of Queensland (UQ) -hosted Winter School in Mathematical and Computational Biology held in early July. Alongside, at the Australian Society of Microbiology’s annual conference, Gareth, supported by Scott Beatson (conference co-organiser), presented a 1.5hr workshop on Galaxy focussed on resources for microbiologists.
Rolling out national Galaxy Australia training
Throughout 2018 the Galaxy Australia project team is working with the EMBL-ABR Hub and Nodes to roll out a national bioinformatics training program. Following a successful two-day facilitator training workshop held in Melbourne in July, four workshops are planned for introducing Galaxy Australia capabilities to researchers, with demonstrations based around different themes: genome assembly, variant detection, RNA-seq and metagenomics.
The first of these workshops (on 22 Wednesday) was led by Dr Anna Syme from Melbourne Bioinformatics, who is coordinating training for Galaxy Australia, with trained facilitators on hand at a ten venues across Australia. These facilitators are supporting their local participants during the live online training event. With over 100 registrants for the first workshop, we are confident this hybrid delivery model is helping to meet the growing demand for bioinformatics training in Australia, where we are always challenged by the tyranny of distance.
The future for the GVL?
Further suggestions for how the GVL may develop further include a wish list of new tools such as Pacbio, Nanostring and Nanopore analysis. The project team is now considering the possibility of including non-genomic tools in Galaxy Australia (metabolomics, proteomics etc) to make it more of an extended –omics platform. With the enthusiasm for what the Galaxy platform is delivering amongst the growing global community of developers and users, this might possibly be the next phase in the GVL’s development.
4 July 2018
Melbourne Genomics is excited to have completed delivery of Release 1 of Genovic: its shared clinical system for genomics. This point marks conclusion of the first phase, which has:
- established and implemented pipelines built on DNAnexus
- selected a curation tool and implemented this as a shared system
- integrated all these services.
The next phase of GenoVic (Release 2) will involve working with the laboratories to integrate their systems with GenoVic, enabling clinical tests to be operational in the system.
GenoVic integration lead and Melbourne Bioinformatics software engineer, Anthony Marty said the completion of Release 1 was a significant milestone for the Alliance:
The system we have established now successfully runs singleton germline samples from end-to-end (that is, from sequencing, through curation and to production of a clinical report). We are just weeks away from having tumour/normal and trio samples supported within the system, as well.
It has taken a tremendous amount of work from everyone across the Alliance to get to this point. I am now looking forward to working with the laboratories to implement their clinical workflows.
Follow all the news from the Melbourne Genomics website.
The Australian-made Genomics Virtual Laboratory keeps on producing outcomes for Melbourne Bioinformatics. Yesterday (10 May 2018) co-authors Enis Afgan (ex-VLSCI, now Johns Hopkins University), Andrew Lonie (Melbourne Bioinformatics), James Taylor (Johns Hopkins University) and Nuwan Goonasekera (Melbourne Bioinformatics) submitted this paper to Cornell University’s arXiv, which outlines how to launch complex applications (typical for bioinformatics) across various cloud providers:
CloudLaunch: Discover and Deploy Cloud Applications
Cloud computing is a common platform for delivering software to end users. However, the process of making complex-to-deploy applications available across different cloud providers requires isolated and uncoordinated application-specific solutions, often locking-in developers to a particular cloud provider. Here, we present the CloudLaunch application as a uniform platform for discovering and deploying applications for different cloud providers. CloudLaunch allows arbitrary applications to be added to a catalog with each application having its own customisable user interface and control over the launch process, while preserving cloud-agnosticism so that authors can easily make their applications available on multiple clouds with minimal effort. It then provides a uniform interface for launching available applications by end users across different cloud providers. Architecture details are presented along with examples of different deployable applications that highlight architectural features.
Link to paper here: https://arxiv.org/pdf/1805.04005.pdf
For the second time in two years Melbourne Bioinformatics has hosted visiting scholar, Ivo F.A.C. Fokkema, from Leiden University Medical Center in the Netherlands. Ivo leaves us this week, having spent the past three months progressing his LOVD database development project.
LOVD is an online platform for storing and sharing genetic variation, as well as software for analysing whole-exome sequencing data. Developed by Ivo and used by the Melbourne Genomics Health Alliance, the platform has potential to be used within the Australian Genomics Health Alliance for sharing of all genetic findings by the Alliance’s members, as well as for the further development of the whole-exome sequencing analysis platform. Ivo’s visit has continued to develop relationships and interest in the use of the platform for this work.
I am very grateful to Melbourne Bioinformatics for once again hosting me here; they provide an excellent network of expertise and a great environment to work in, says Ivo.
We look forward to hearing how this project evolves, with Ivo continuing to progress it from Leiden.
Life science research is and will increasingly be shaped by infrastructure that supports it. At the beginning of Big Data biology, this meant funding sequencers and computers and while we still need those, we also need to become smarter. Increases in our ability to solve the big problems in biology have come as much from scaling people (through training, sharing of practices, and collaboration) as they have from cheaper sequencing or faster processors.
Jason Williams, Chair, International Science Advisory Group
In mid-March Melbourne Bioinformatics’ resident Galaxy guru, Simon Gladman, attended the ELIXIR Galaxy community meeting in Freiburg, Germany for the official launch of the https://usegalaxy.eu/ server and to announce the upcoming this new Galaxy Australia server – https://usegalaxy.org.au.
With NCRIS funding, we are partnering with QCIF to extend and update the service model for Galaxy across Australia so all Australian researchers will soon have an increased number of tools, reference genome choice and user support at hand.
This is being made possible through the expansion and re-launch of the existing Galaxy-QLD service instance of the open, web-based Galaxy platform for computational biology research, now to be known as Galaxy Australia.
Galaxy Australia will enable accessible, reproducible, and transparent research, and is a major feature of our Australian-made Genomics Virtual Laboratory.
Led by our software expert Anthony Marty, Melbourne Bioinformatics’ role in the Australian Genomics Health Alliance has extended to providing a technical assessment of several well-developed, prototyped tools which enable sharing of curated genomic variant data. The preferred tool will be adopted across Australian pathology laboratories for use in research and clinical diagnoses. When known and carefully curated genomic variants indicative of clinical significance occur in conjunction with known disease/s, specific clinical information can be inferred and a more precise management of the disease applied. This might mean a different drug regimen or perhaps a more vigilant monitoring of a cancer. These new technologies are transforming our treatment of disease.
And as our understanding of the human genome slowly emerges from research laboratories, and bioinformaticians worldwide refine their analysis techniques, disagreement around the interpretation of this genetic information is still likely. So ensuring that within this process there is an in-built mechanism to resolve any classification conflicts is also a difficult part of this task.
Tools for precision medicine draw upon complex information in published and curated genomic databases being built as part of a global effort. Collaborating to avoid duplication is essential, and researchers and laboratories around the world, including our own, are engaged in this effort through the non-profit Global Alliance for Genomics and Health (GA4GH) who released their Strategic Roadmap in February 2018.
The Roadmap lists a series of projects laying the groundwork for this real-world genomic data sharing across the international genomic data community by 2022. Important frameworks and standards for the sharing of genomic and health-related data will enable this to be done responsibly, voluntarily, and securely.
The Australian Genomic Health Alliance features prominently in many of the key GA4GH projects, and through our engagement with them we are pleased to be playing a small part.
The GA4GH Strategic Roadmap presents standards and frameworks planned for development under GA4GH Connect — a 5 year Strategic Plan aimed at aligning with the key needs of the genomic data community. The Roadmap will be updated annually with new deliverables and timelines.
With growing antibiotic resistance spreading through our communities, finding new ways to stop illness and death from Methicillin resistant Staphylococcus aureus (MRSA or golden staph) has become a significant challenge for health systems the world over. Understanding MRSA’s vulnerabilities, through knowledge of its genome, offers new technologies for researchers, and our microbial genomics experts are contributing to this work here in Melbourne.
We’ve started work with colleagues both at the Peter Doherty Institute (PDI) and Monash University on projects funded through the NHMRC and the Wellcome Trust. These projects confront the problem on several levels.
Announced in December 2017, Torsten Seemann is Chief Investigator on two NHMRC Project grants, led by Professor Tim Stinear, PDI. The first grant ($784,451) is investigating ways to modify a very well-studied, specific regulation gene in MRSA to find where it’s vulnerable to attack by antibiotics. This extends a decade of painstaking, detailed lab work and associated genomics and bioinformatics analysis which has built up our understanding of how this gene system works.
The second grant ($772,710) is investigating invasive staph, which is a particular threat to people living with compromised immune systems. It’s focussed on how and why golden staph spreads throughout hospitals and the community, looking at how such organisms behave in complex environments. This complements the work to understand the organisms’ basic biology as being targeted in the first project.
Professor Ben Howden, PDI, is leading an NHMRC Partnership Grant ($1,427,000) to work with the Victorian Government Department of Health and Human Services and sequencing company Illumina Australia Pty Ltd to develop microbial genomics for real-time tracking of communicable diseases for earlier detection of outbreaks. Our team will be leading the bioinformatics and data analysis component of this project which is studying the entire life cycle of a public health outbreak, seeing how to incorporate microbial genomics technologies to improve the timeliness of our responses and also improve the outcomes of public health bacterial management across hospitals, communities and in food safety.
Finally, Dr Dieter Bulach is an Associate Investigator, University of Melbourne, on a Wellcome Trust “Our Planet, Our Health” Project led by Professor Rebekah Brown from Monash University. Professor Jodie McVernon, Professor and Director of Doherty Epidemiology, Victorian Infectious Diseases Reference Laboratory will also be providing modelling / transmission analysis. We are very excited by this project as it will provide access to all the resources and on-site training available within the prestigious Wellcome Sanger Institute. Link here to the full story about this significant international project.
Feel free to contact our team leader and bacterial bioinformatics expert, A/Prof Torsten Seemann.
Last month the GVL was trialled by colleagues in New Zealand (NZ) who were interested in its use for both research and training in bioinformatics. Aleksandra Pawlik, Research Community Manager, New Zealand eScience Infrastructure, wrote afterwards:
We’re ending the first of the scheduled training workshops and indeed it was a great idea. GVL removes all the headache of setting things up but is also an infrastructure that (hopefully) will become available and prevalent among many researchers. I am saying that as from my experience in teaching computational skills to researchers, I know that it is essential to teach people tools that they can carry on using. Therefore usually Virtual Machines or ad-hoc cloud set ups are not ideal. They are available at the workshop but then researchers have to use whatever their organisations offer them.
This opportunity arose from an information exchange our Nectar colleagues were having with their NZ counterparts who are responsible for meeting the infrastructure needs of the New Zealand Government’s new national genomics initiative (Genomics Aotearoa, NZ$35M over 7 years). It is being hosted by the University of Otago, and involves a number of NZ universities and Crown Research Institutes. Genomics Aotearoa aims to put in place key genomics and bioinformatics infrastructure to underpin research exemplars across three themes: Health, Environment and Primary Production. An important component of this work involves the provision of a national genomics computing platform and for that they were keen to assess the GVL environment for genomics training and analysis activities.
Thanks to all involved in making this trial a success,
Assoc Prof Michael Black, Department of Biochemistry, University of Otago
Dr Michelle Barker, Deputy Director, Research Software Infrastructure, Nectar
Prof Glenn Moloney, Director, Nectar
Dr Paul Coddington, Deputy Director, Research Platforms, Nectar
Dr Aleksandra Pawlik, Research Community Manager, New Zealand eScience Infrastructure
Assoc Prof Andrew Lonie, Director, Melbourne Bioinformatics & EMBL-ABR
Mr Simon Gladman, GVL Lead, Melbourne Bioinformatics
Mr Nick Jones, Director, eScience NZ Infrastructure
Dr Elizabeth Permina, Bioinformatician, Otago Genomics and Bioinformatics Facility, Health Sciences, University of Otago
Prof Peter Dearden, Director, Director, Genetics Otago
Prof Cris Print, School of Medical Sciences, University of Auckland and co-lead bioinformatics, Genetics Otago.
We look forward to greater collaboration in the development of shared training resources for both our communities.
2 November 2017
In a Victoria University report published this week which set about measuring the return on investment on Australia’s Virtual Laboratories (VLs), which provide digital interfaces, tools and data to online research communities, it has been reported that they are generating a return on investment of up to 138 times their cost.
Estimating the value and impact of Nectar Virtual Laboratories, written by the Victoria Institute of Strategic Economic Studies for the National eResearch Collaboration Tools and Resources project (Nectar), studied three Nectar-supported VLs across different disciplines, including the Genomics Virtual Laboratory hosted at Melbourne Bioinformatics.
Five methods of value measurement were used, including the impact the VLs have on research and how much users would be willing to pay for the service if it did not already exist.
The return on investment varies depending on the metric and the associated method of calculation, however the report has found that return on investment (ROI) is at least double the investment for every measure of each of the VLs studied, indicating the services have a significant economic and user impact. By one measure the value of the VL was over 100 times the cost of investment.
25 October 2017
Thanks to the global efforts of over 250 contributors, including our own Simon Gladman, bioinformaticians and life scientists now have access to ‘Bioconda’, a software-package building and management system designed for bioinformatics. This work is now documented at BioRxiv.
A common problem in computing and data science especially – known as ‘dependency hell’ – occurs when you try to install software you want to run and it’s not compatible with your operating systems, versions, system set-ups etc. This creates an environment where the compilation requirements of the underlying systems are often competing with one another. In more mature fields of computer programming, packaging systems like Conda have been developed to overcome this problem: someone makes their software available using a ‘Conda recipe’ which describes the software, where to find it, what dependencies it needs both to build and run it and then some basic scripting to install it. The ‘recipe’ is then added to the Conda repository system where it is automatically ‘built’ into installable tool packages for various operating systems and hardware and then stored in a fully-supported, global repository.
Bioconda extends Conda into the life sciences and, in addition to making bioinformatics software installation much easier, improves analysis reproducibility by allowing users to define isolated environments with defined software versions, all of which are easily installed and managed without the need for administrative privileges.
It improves on other packaging systems by having an option to install tools in their own sandboxed environment so they don’t interfere with any other installed software. And every tool put into the repository automatically has a Docker container built for it.
Simon Gladman says,
The Bioconda project is very well organised with contributions to the repository via pull request and code review before merging. I’ve added roughly 30 packages to the Conda ecosystem (out of ~2500) since I started working with it, including our Microbial Genomics group’s most popular ones like Velvet Optimiser, Prokka and Snippy. I’ve also added tools I use a lot like Roary and Gubbins (Sanger Pathogens group). To progress the project we have held hackathons all over the world, the last one at the 2017 Galaxy conference in Montpellier.
About 2 years ago, the Galaxy project decided to experiment with using Conda and Bioconda as their preferred method of tool installation and they’ve now formally adopted it as standard. The latest version of Australia’s Genomics Virtual Laboratory (GVL) uses Bioconda to handle tool installations for Galaxy in the GVL and we’ve started working on ways to supply command line versions of the tools also.
An alternative packaging system Torsten Seemann contributes to is Homebrew Science. It pre-dates Bioconda and inspired many of the package formulae now employed in Bioconda.
January 2018 workshop
We have invited two Bioconda and Galaxy experts, Saskia Hiltemann (Erasmus University, The Netherlands) and Eric Rasche (Frieburg University, Germany) to run a Bioconda/Galaxy tool wrapping tutorial and workshop to help us build Australia’s capability in, and contributions to, this great community project.
Register your interest in this workshop with Christina Hall.
- a repository of recipes hosted on GitHub
- a build system that turns these recipes into conda packages
- a repository of >2700 bioinformatics and other packages ready to use with ‘conda install’
- over 250 contributors that add, modify, update and maintain the recipes
Follow the project on twitter: #bioconda
Watch 30 minute webinar from ELIXIR on Bioconda and Biocontainers by Björn Grüning (ELIXIR Germany).
Install your software using the conda system: after installing a conda system such as Miniconda, try ‘conda’ install <bioinformatics tool>.
For Melbourne Genomics, work is well underway towards the delivery of the first components of Victoria’s clinical system for genomics (‘GenoVic’). The work plan for this year sees the delivery of shared analysis and curation tools, data governance and clinical tools – paving the way for further elements of the system in 2018 and 2019.
Earlier this year Melbourne Bioinformatics was pleased to host and be part of the team which developed the benchmarks and was assessing two world-class curation tools to enable selection of the best tool for Victoria. This cutting-edge software will provide a streamlined, collaborative and easy-to-use way to interpret, share and store genomic information for clinical and research purposes.
We were very impressed with the team’s thoroughness and dedication to the task. Negotiations are now well underway with the selected vendor and you can read the full story at Melbourne Genomics.