Simplify your analysis in the cloud with a Terra workspace
At the heart of working on Terra is a shareable workspace. It's like a computational sandbox with everything you need to complete your project: data, analysis tools, documentation, and provenance.
- Link to data in the cloud for analysis, instead of downloading and storing it yourself
- Keep data organized with in-app tables - no matter where in the cloud the data are stored; whether you're analyzing a hundred, or a hundred thousand, files.
- Boost your statistics by combining data from different sources
- Visualize and analyze data of any size in real time with interactive Jupyter notebooks, RStudio and Galaxy
- Find and run bulk analysis tools (workflows) even if you're not a programming expert
- Make your results reproducible with publicly-vetted analysis tools and options to standardize your virtual computational environment
- Share analysis results and collaborate while keeping control with built-in security
Workspace functions at a glance
Read on to learn all the ways a workspace can help you stay on track by keeping all the pieces you need for your analysis in one place.
Documentation in the Dashboard
The landing page is your project overview - what questions you’re trying to answer, what kind of data and analysis tools you'll use, etc. Good documentation makes your analysis easy to share (including with your future self).
Workspace information includes workspace owners (these can be changed as needed) and Authorization Domain information (used to protect access to controlled data).
Store data in the workspace bucket
Each workspace has an associated Google bucket for storing
- Your own data (uploaded from a local system)
- Workflow outputs (stored by default in the workspace bucket)
- Notebook files (i.e.
Note that data generated by an interactive analysis in a notebook is stored in the virtual application machine and not in the workspace bucket. To keep this data safe, you will need to explicitly copy the data to the workspace bucket. Learn more about that process here.
To access your workspace bucket, click on the link at the bottom right in the dashboard:
Manage and organize data in the Data page
You can track project data in workspace tables. They're like spreadsheets built right into the workspace.
- Combine data from different studies or across datasets into one table to create a more robust dataset to analyze
- Connect data across tables with Universally Unique Identification numbers (UUIDs) or subject IDs (left column of bottom screenshot)
Genomic data - The sample table includes links to wherever large data files are in the cloud. UUIDs identify the sample data files. In this example, the collaborator IDs ties a participant's phenotypic data (in a separate table) to the phenotypic data.
Phenotypic data - The subject table can include complete medical, population or lab data. In this example, the subject ID connects a participant's phenotypic data with genomic data in a separate table.
Workspace Data table - This table contains workspace-level files required to analyze any sample. Examples include Docker or reference files:
Analyze data in real time with Cloud Environment applications
Customize your virtual application Cloud Environment
Interactive analyses run in a virtual application, and you can customize the environment and compute power of the virtual application for your notebook. Terra includes several built-in environments that come pre-loaded with popular packages such as Bioconductor and Hail. Alternatively, use a custom Docker environment to control exactly the packages and libraries for your analysis.
Specifying the compute power and disk size for the virtual machine (or cluster) lets you interact with data of any size. You can document the options you use to allow others to reproduce the analysis.
Streamline bulk pipeline analysis with Workflows
You can collect, configure (set up) and run workflows for bulk analyses in the Workflows tab. These are the sorts of repetitive analyses that can be automated, such as aligning sequencer reads or calling variants. Workflows can be set up to take input directly from a workspace table and write output metadata back to the table. Configuring this way helps keep data organized as you run your analysis.
Monitor and Troubleshoot in the Job History page
Check on the status of workflow submissions here. The Job History maintains a record of every workflow submitted in the workspace. You can troubleshoot by selecting the workflow name in the "Submission" column:
You'll get to this Submissions details page, where you can access further details by clicking on the icons at right:
To learn more about troubleshooting workflows, see Troubleshooting Workflows: Tips and Tricks.
Collaborate in a shared workspace
Making it all work together Even with all the data in the world, you can’t make discoveries if you can’t store it, organize it, analyze it, and share your results. Like a construction site with all the building materials and tools you need close at hand and well organized, the workspace brings the data and tools and cloud resources you need together so you can focus on science.