An overview of the data submission process to help AnVIL Data Submitters (GCP) get started staging and uploading data to TDR.
PrerequisitesThis document assumes you have already registered your study data with AnVIL and defined the data model for your dataset. For new projects that have not yet been approved, data submitters would complete the AnVIL Onboarding Application.
Process overview and requirements
For additional data submission support, reach out to the AnVIL Support team at anvil-data@broadinstitute.org.
AnVIL provides data submitters with a submission workspace where you will stage data for ingestion (large data files such as omics and image files and TSV files for each dataset table).
As the data submitter, you’re expected to abide by the following guidelines Only upload data from the current approved data submission.
You must have prior approval from the AnVIL program to run any compute or analysis in AnVIL-owned workspaces, including the submission workspace. Note that cloning the submission workspace is not allowed, as the clones may not have the same enhanced monitoring and logging required for controlled access data in a workspace.
Don’t copy or move primary data from this workspace without prior approval from the AnVIL program.
Ready to submit data to AnVIL? Follow step-by-step instructions in the links below
-
Step 1 - Register Study/Obtain Approvals
- Step 2 - Set Up Data Model
- Step 3 - Prepare data for submission to AnVIL
- Step 4 - Stage data in the staging workspace
- Step 5 - Ingest and validate data
Next steps: Accessing the data
Once the data is ingested, you'll be able to access it from TDR (for updates, for example) and via the AnVIL Data Explorer for analysis. You will also retain access to the data present in the submission workspace in a read-only capacity with Requester Pays enabled on the workspace bucket.
For more information on finding and using AnVIL data, see Terra Support articles for AnVIL researchers.
Additional data model resources
- Set up a Data Model in the AnVIL portal (the tables that hold your data)
- Managing data with tables
- Overview: Entity types and the standard genomic model