The workspace: your dedicated project space on Terra
FollowSimplify your analysis in the cloud by keeping everything in a workspace
At the heart of working on Terra is a shareable computational workspace. It's like a computational sandbox with everything you need to complete your project.
- Link to data in the cloud for analysis, instead of downloading and storing it yourself
- Keep data organized with in-app tables - no matter where in the cloud the data are stored; whether you're analyzing a hundred, or a hundred thousand, files.
- Boost your statistics by combining data from different sources
- Visualize and analyze data of any size in real time with interactive Jupyter notebooks, RStudio and Galaxy
- Find and run bulk analysis tools (workflows) even if you're not a programming expert
- Make your results reproducible with publicly-vetted analysis tools and options to standardize your virtual computational environment
- Share analysis results and collaborate while keeping control with built-in security
Workspace functions at a glance
Expand each section below to learn how a workspace helps keep your project on track by keeping all the pieces together.
Documentation in the Dashboard
The landing page is your project overview - what questions you’re trying to answer, what kind of data and analysis tools you'll use, etc. Good documentation makes your analysis easy to share (including with your future self).
Workspace information includes workspace owners (these can be changed as needed) and Authorization Domain information (used to protect access to controlled data).
Store data in the workspace bucket
Each workspace has an associated Google bucket for storing
- Your own data (uploaded from a local system)
- Workflow outputs (stored by default in the workspace bucket)
- Notebook files (i.e.
my_notebook.jpnb
)
Note that data generated by an interactive analysis in a notebook is stored in the virtual application machine and not in the workspace bucket. To keep this data safe, you will need to explicitly copy the data to the workspace bucket. Learn more about that process here.
To access your workspace bucket, click on the link at the bottom right in the dashboard:
Manage and organize data in the Data page
Keep track of project data in workspace tables. They're like spreadsheets built right into the workspace.
- Combine data from different studies or across datasets into one table to create a more robust dataset to analyze
- Connect data across tables with Universally Unique Identification numbers (UUIDs) or subject IDs (left column of bottom screenshot)
Genomic data - The sample table includes links to wherever large data files are in the cloud. UUIDs identify the sample data files. In this example, the collaborator IDs ties a participant's phenotypic data (in a separate table) to the phenotypic data.
Phenotypic data - The subject table can include complete medical, population or lab data. In this example, the subject ID connects a participant's phenotypic data with genomic data in a separate table.
Workspace Data table - This table contains workspace-level files required to analyze any sample. Examples include Docker or reference files:
Analyze and visualize data in real time in a Notebook
Launch an in-app Jupyter Notebook to interact with and visualize the data. Code can be Python or R. Notebooks include documentation to help organize and communicate the analysis steps.
Customize your virtual application
Notebooks run on a virtual application, and you can customize the environment and compute power of the virtual application for your notebook. Terra includes several built-in environments with popular packages such as Bioconductor and Hail. Alternatively, to control exactly the packages and libraries for your analysis, choose a custom Docker environment. Specifying the compute power in the virtual machine (or cluster) lets you interact with data of any size. You can document the options you use to allow others to reproduce the analysis.
Streamline bulk pipeline analysis with Workflows
You can collect, configure (set up) and run workflows for bulk analyses in the Workflows tab. These are the sorts of repetitive analyses that can be automated, such as aligning sequencer reads or calling variants. Workflows can be set up to take input directly from a workspace table and write output metadata back to the table. Configuring this way helps keep data organized as you run your analysis.
Not a coding expert?
Browse and import published workflows in Dockstore or the Broad Methods Repository by selecting the "Find a Workflow" option:
Monitor and Troubleshoot in the Job History page
Check on the status of workflow submissions here. The Job History maintains a record of every workflow submitted in the workspace. You can troubleshoot by selecting the workflow name in the "Submission" column:
You'll get to this Submissions details page, where you can access further details by clicking on the icons at right:
To learn more about troubleshooting workflows, see this article.
Collaborate in a shared workspace
Click on the three vertical dots at the top right to share the workspace with other people working in Terra. You'll control how much access to give collaborators, and you can change it at any time.
Making it all work together
Even with all the data in the world, you can’t make discoveries if you can’t store it, organize it, analyze it, and share your results. Like a construction site with all the building materials and tools you need close at hand and well organized, the workspace brings the data and tools and cloud resources you need together so you can focus on science.
Comments
0 comments
Please sign in to leave a comment.