Terra's cloud-based model for data is built on the idea that you shouldn't need to copy or store data to your own local machine or even your own Google bucket to do your analysis. Large data sets already in the cloud include both public-access sets (i.e. 1,000 Genomes) and restricted-access sets (UKBiobank).
Storing data in the cloud
Where the data are stored is one important factor. This can be in your workspace Google bucket, or in an external bucket or BigQuery (for tabular data such as phenotype data). If your data are already in the cloud, great! You can skip down to see how to use the Terra interface to help organize and manage data for analysis. If you need to upload data to a Google bucket, see this article.
How to access, organize and manage data in Terra
The data table
You can use the in-app workspace data table to organize and keep track of data in the cloud. It's like a giant, expandable spreadsheet that coordinates participants, participant IDs,phenotypes, metadata for samples, and more. Its flexible design helps you keep as much metadata as you need in one place, which can help with collaborations as well as making your work more reproducible. If you configure workflows and notebooks to write output metadata to the table, intermediate and other output files are associated with the input files by default, no matter where the files are physically stored. Though they take a bit of setup time in the beginning, data tables can be enormously useful, especially as the amount of data grows. Imagine keeping track of hundreds or thousands of participants and their data as easily as one or two!
For more details about populating the workspace data table, see this article.
For practice updating a data table and using it to run workflows, see the Terra Quickstart workspace (Part 2).
For practice browsing data in the Data Library and bringing it into a notebook for analysis, see Part 1 of the Terra Quickstart workspace.