In this post, Amanda Kedaigle, a computational scientist from the Stanley Center for Psychiatric Research is describing her experience with the Single-Cell Portal (SCP), a data visualization portal built on top of Terra.
As a computational biologist, two words that are constantly used to describe my work are "collaborative" and "interdisciplinary" -- and what that really means is that I'm constantly interacting with researchers from different fields, who have very different skill sets. That can create some interesting (read: hard) challenges when it comes to communicating results and exchanging insights. In this guest blog post, I'd like to share how Single-Cell Portal (SCP) has made it much easier for my colleagues and me to collaborate on data analysis and interpretation across the wetlab-drylab divide.
For context, I work with several groups at the Broad studying a novel 3-dimensional model of human brain development – called brain organoids – on a single cell level. Each of us has very different expertise: the stem cell experts and neurobiologists of Paola Arlotta’s group grow organoids; the technology, sequencing, and tissue-handling experts of Joshua Levin’s and Aviv Regev’s groups apply exciting new assays to them such as single cell RNA- and ATAC-seq; and the resulting data eventually makes its way to my officemates and me to dig out the results. Our job is to then feed those results back to the biologists to inform their next round of experimentation.
My daily work involves analyzing the data, applying relevant metadata labels such as cell type assignments for each of the cells, and producing visualizations like tSNE or umap plots. Technically, I could do all of it on my own computer or on the Broad’s cluster, but that produces data files that our biologist friends might not know how to use, and fixed images that rarely give them the perspective they're looking for.
That's where Single Cell Portal comes in. It's built for visualizing single cell data, so all you need to do is upload the data (which stays private by default), and right off the bat you have access to a set of standard visualizations. The plots are interactive, so you can color or categorize the cells based on any of the meta data labels, and easily switch between tSNEs, bar charts, or violin plots. My favorite part might be the Gene Search tool, where you can enter any gene you’re interested in and plot its expression across clusters or datasets.
Like I said, I could do all of that for myself on my laptop. What makes SCP so valuable to me is that I can share the study workspace with my collaborators, some of whom have never installed R or heard of the command line, and they can point-and-click to explore the data without my help. Then we can discuss the results and refine the analysis together. For example, the developmental neurologists might look at the expression plots of the marker genes they know about and give me feedback on how well I’ve assigned cells to their cell types. I can then quickly update the SCP study based on their comments and send it back to them for another round of review.
This ability to share our organoid studies goes beyond just direct collaborators. At one point, an Arlotta lab member asked me if we had a particular rare cell type represented in the organoids – a cell type I’d never even heard of. Rather than set up a series of meetings to figure it out, I simply added him to our study; he looked at the data himself and found the cells he wanted.
Even cooler, we were able to use this as part of the review and publication process for a recent paper: once we had the manuscript ready, we shared the SCP study with reviewers so they could explore the data themselves. When the paper was published, we easily switched the study from private to public so that anyone who reads the paper can explore the data for themselves as well. I’ve already received emails from a researcher at Johns Hopkins who’s using our data as a resource and found downloading the data from the SCP much more user-friendly than relying on the Gene Expression Omnibus (which was built with bulk expression data in mind).
Just in case I made it sound too perfect to be true, I'll note that SCP is still under active development and does have a few quirks and flaws. The first time around, it took me a few attempts to wrangle my data into the exact file formats required by SCP -- I would love it if the Portal could automatically parse output formats from popular single cell analysis packages like Seurat and Scanpy -- and I ran into a couple of issues uploading the files. In another register, by default the system sends email notices about every little change. That sounds like a great feature until you have to do some iterative refinement after sharing a study with collaborators... there were a couple of days where I kept noticing errors in cell labels and re-uploading new files, which led to my poor co-workers receiving what seemed like hundreds of emails, until I realized I could customize notification settings to avoid that problem. Finally, some of my less computationally-savvy collaborators struggled a little with the interactive plotting features until they learned to use them effectively. Perhaps the interface could be made more intuitive for novices, or that could be solved through some targeted onboarding materials.
Overall, though, setting up my first big study was pretty painless. The SCP team was a huge help – special shout-out to Vicky Horst for holding my hand through the learning process. It's been amazing to experience first hand how emerging technology can help us not only to improve assays and analysis of biological data, but also to bridge the gap and facilitate hand-offs between data-producers and data-analyzers, and ultimately speed up and amplify the discovery process.