Once you've imported a Snapshot into your workspace, you're ready to run workflows on it!
Overview: Running workflows on TDR snapshots
Here are some important things to remember when running a workflow on a TDR snapshot.
- Cloning a workspace does not clone a Snapshot. You'll need to import snapshots every time you create a new workspace.
- You can have multiple snapshots in the same workspace.
- The WDL will run on the entire snapshot. There's currently no way to select specific rows within a Snapshot. This capability is coming soon! Until then, you can control what data you analyze by creating Snapshots with only the data you want to run.
- Running a WDL on a snapshot will not write outputs back to a table. If you really need this functionality, you can use the workflow launcher (WFL). Instructions for this are coming soon. For now, you can access outputs by going into the GCR console.
- Workflows don't distinguish between the input data. This means you won't be able to tell which Snapshot was used for any given run unless you add your own note about it to the workflow (you can either choose appropriate naming terminology, or use the built-in comment functionality for each submission).
- You can have a mixed input workflow setup (configuration) where the root entity is a snapshot and workspace variables are specified as input attributes (in the configuration form).
1: How to run a workflow on your snapshot
1.1. Import workflow
Import a workflow that you want to run on your Snapshot to your workspace and click on the workflow to open the configuration form.
To learn more, see Finding the workflow (method) you need (and its JSON) in the Methods Repository or How to import a workflow and its parameter file from Dockstore into Terra.
1.2. Select the data in the snapshot
First choose which Snapshot you want to run this analysis on (Select data).
Then select the table in the Snapshot that you want to run on (this is your root entity type).
1.3. Specify the specific inputs
You can use the usual format “this.{COLUMN_NAME}” or “workspace.{ATTRIBUTE}”. Terra should provide hints when you’re typing in the attributes column.
1.4. Specify outputs
Your results cannot currently be written back to the data table, so you'll need to put in bucket file paths here. This feature is still under active development. For now, your data should be written to the workspace bucket. You could also write the results to workspace data if you wish. You can leave the output field blank, you'll be able to access the bucket locations where the workflow stored output files through the Job Manager.
1.5. Run the workflow
Click "Save" to confirm the input and output attributes. If all required fields are properly filled in, you'll be able to click "Run Analysis".
1.6. Look at the results in the bucket
Go your Job History tab and find the workflow run you just completed. Click on it (under Submissions) to see the job, and click on the Execution Directory icon to the right of the workflow run.
This will take you to the bucket directory in Google Cloud console. Your results should be in stdout.
When you click on stdout you'll see a page of details with different options for accessing this file. You can use the gsutil URI to transfer the file to a desired location, or just use the Authenticated URL to open this file in the browser.
2: How to use the Workflow Launcher to write outputs to a workspace
Coming soon!