The Portable Pipelines Project started the development of Janis in late 2018 with the goal of delivering bioinformatics workflows that are portable and reproducible across many compute environments. Janis is a Python framework that provides a simple Application Programming Interface to build and run workflows that adhere to current standards of workflow specifications. Over 2020, […]
Richard Lupat, Bioinformatician, PeterMac
|Walter and Eliza Hall Institute of Medical Research||COLLABORATOR|
|Peter MacCallum Cancer Centre||LEAD INSTITUTE|
|University of Melbourne||COLLABORATOR|
PeterMac / WEHI / Melbourne Bioinformatics (University of Melbourne) co-funding
In September 2018, talented software engineer, Michael Franklin, joined us to work on a new collaborative project between Peter MacCallum Cancer Centre (Richard Lupat and Jason Li), Melbourne Bioinformatics (Daniel Park and Bernard Pope) and WEHI (Evan Thomas). This Portable Pipelines Project (PPP) anticipates the complex needs of cancer researchers as whole genome sequencing (WGS) becomes a routine in research and part of standard clinical evaluation of disease. It is expected that at the conclusion of the project (18 months), Precinct researchers will have access to a modular, portable bioinformatics pipeline system, capable of meeting current and future needs of WGS analysis.
Bench scientists working in cancer genomics are increasingly reliant upon data scientists. Data scientists are able to stay on top of the choice best practice analysis tools, can problem-solve how to maximise access to scarce compute resources and bring their data management and curation discipline which is increasingly required for sharing data for publication, collaboration and reproducibility.
This project aims to produce a modular and robust portable pipeline using common workflow development language to enable it to run on multiple computing platforms without the need to rewrite the workflow to accommodate multiple hardware systems (from local HPC to cloud).
Along the way, it is envisaged that the team will also learn a great deal more about working with containers, workflow development language, different engines such as TOIL and CROMWELL and various compute platforms, and gain the skills required to change pipelines as needed for research problems.
Whilst being developed for whole genome sequencing for cancer research, if it’s successful, in the long term the software should be able to extend to other domains. And this means the team will have developed an extremely useful tool which will be in demand by research teams worldwide.
Mr Richard Lupat, Bioinformatician, Bioinformatics Consulting Core, PeterMac
Dr Jason Li, Manager, Bioinformatics Consulting Core, PeterMac
Assoc Prof Bernard Pope, Senior Bioinformatician, Melbourne Bioinformatics
Assoc Prof Daniel Park, Platform Lead, Melbourne Bioinformatics
Mr Evan Thomas, HPC Platform, WEHI
Mr Michael Franklin, Research Software Engineer, Melbourne Bioinformatics