The following testimonials have been freely provided by researchers working in Terra or its earlier incarnation, FireCloud. Read on to see how how specific features of the platform enabled their work.
The best of both worlds: testing locally, running on the Cloud
Jessica Hekman, DVM, PhD
Broad Institute of MIT and Harvard
Our lab is moving to Terra for a project that involves calling variants in large datasets (thousands of whole genomes). It's going to require a lot of parallel compute for a short amount of time, so it makes a lot of sense to do it on the cloud. However, we wanted to still do the pipeline development work on our local infrastructure since it's free and allows us to iterate very quickly. Fortunately Cromwell, the workflow management system built into Terra, can also be used locally as a standalone. So we can write our pipelines in WDL, use Cromwell on premises to test the pipeline on small numbers of samples, and then move the result onto Terra for running at large scale. The only change we need to make is to adjust the inputs to point to where the data lives in cloud storage. This allows us to develop without having to worry about paying for compute when we make the inevitable development mistakes, and move onto Terra only when we are confident that our pipeline is robust. The combination of Cromwell and Terra lets us take advantage of the best of both worlds: fast and free development on-premises and large-scale execution on cloud. On top of that, we can take advantage of the WDL pipelines available from other labs, and when we publish our results we'll be able to share our methods with others very easily.
Recording and running reproducible analyses
Biomedical Software Engineer
Icahn School of Medicine at Mount Sinai
Terra is a great platform for recording and running reproducible analyses. I often run into a situation where I've read a paper and I can't figure out where the data is. If I somehow find the data, the preprocessing and filtering methods applied to the data are opaque. Let's say I have miraculously scraped together the data and all of the code applied to this data set: will I be able to run this code on my machine? There is a very slim chance I have the right versions of software and all of its expected dependencies. What a headache! So I think it's really important that we all take responsibility for sharing our work in a way that supports computational reproducibility. I like that Terra lets me maintain my data in one centralized location, record my code and workflows with Jupyter notebooks and WDL, and resolve software compatibility issues with Docker. With these tools, I can easily run an analysis from start to finish, over and over again, and I can enable others to do the same without any additional effort.
Answering the challenges of big data analysis and collaborations
Clinical and Translational Epidemiology Unit, Mongan Institute
Massachusetts General Hospital
I have been involved in efforts to link rare non-coding variants to complex diseases since 2011. In the early days, we performed our analysis with whole genome sequence data on our local cluster. The Precision Medicine Initiative made large-scale WGS in epidemiological cohorts available and it became clear that retrieving, storing and analyzing these massive data files would be problematic. Another challenge was the collaborative model that our funding agencies were encouraging — we have collaborators at many other institutions with whom we wanted to share resources, code and analysis results.
We found that FireCloud provided an optimized solution to our challenges. We had to learn a new iterative style of workflow development (involving Cromwell, WDL and docker in addition to FireCloud itself), but as soon as we had our development cycle in place, we were able to develop and deploy our analysis workflows with the engagement and assistance from our collaborators. The platform provides a model for our work to be open-source, with excellent tools to manage user access and cloud computing costs. The development team has been extremely responsive to the needs of the research community with a series of enhancements that have enabled us to perform more sophisticated analyses. I'm excited to start taking advantage of the further improvements in Terra.
Managing and sharing analysis workflows
Effective collaborating, and working towards fully reproducible science
Democratizing computational bio, enabling reproducible research
Matthieu J. Miossec, PhD
Centre for bioinformatics and integrative biology
Universidad Andrés Bello
I was first introduced to FireCloud in 2018 when my workshop proposal for a conference was merged with one from the Broad Institute’s GATK team. For the purposes of the workshop, we reproduced one of the studies I worked on as a doctoral student and research associate (see Page et al, 2018), which was an opportunity for me to discover the platform in detail. The cloud-computing aspect of FireCloud, in itself tremendously useful, particularly for laboratories that don’t always have direct access to high-performance computer clusters, is only the tip of the iceberg.
FireCloud’s method repository was tremendously helpful in building up the pipeline in a brief amount of time. The repository contains both well-documented featured methods created by the GATK team and public methods contributed by other FireCloud users. In both these repositories we found methods that corresponded closely to what my previous team had implemented on local machines and we were therefore able to clone these instead of starting over from scratch. Crucially, once cloned we could make all the small tweaks necessary for the methods to fit our specifications. This mass sharing of both methods and entire workspaces is surely the future of bioinformatics at a time when reproducibility is strongly needed. I will certainly continue working with FireCloud and Terra going forward.
See the project description and check out the Terra workspace here:
Enabling population-scale polygenic association studies
Denis Bauer, PhD
The VariantSpark team
Our first notebook on the Terra platform showcases our machine learning software, VariantSpark, which uses Apache Spark and utilized Terra’s capability of custom environment configurations. We are excited about enabling the global Terra research community to perform population-scale polygenic association analyses.
Check out the Terra workspace here: