Using Snapshots with Workflows

Anton Kovalsky
  • Updated

Once you've imported a Snapshot into your workspace, you're ready to run workflows on it!

Some important things to note:

  • Cloning a workspace does not clone a Snapshot. You'll need to import Snapshots to workspaces every time you create a new workspace.
  • You can have multiple snapshots in the same workspace
  • If you want to run a WDL on a snapshot, it currently will run on the entire snapshot. There's currently no way to select specific rows within a Snapshot yet. This capability is coming soon, but for the time being you can control what data you use by creating Snapshots with only data you want to run.
  • Running a WDL on a snapshot will not write outputs back to a table yet. If you really need this, you need to use something called the workflow launcher (WFL). Instructions for this are coming soon. For now, you can just access the outputs by going into the GCR console.
  • The workflows don't distinguish between the input data, so you won't be able to tell which Snapshot was used for any given run unless you add your own note about it to the workflow (you can either choose appropriate naming terminology, or use the built-in comment functionality for each submission.
  • You can have a mixed input methods config where you have the root entity equal to snapshot and still configure workspace variables as input attributes as well.


1. Running a Workflow on your Snapshot


1.1. Import workflow

Import a Workflow that you want to run on your Snapshot. 


1.2. Select the Snapshoot

Select a root entity type. First choose which Snapshot you want to run this analysis on.

Then select the table in the Snapshot that you want to run on (this is your root entity type.)


1.3. Select the specific inputs

Select the inputs. Here you can use the usual “this.{COLUMN_NAME}” or “workspace.{ATTRIBUTE}”. The UI should provide hints when you’re typing.


1.4. Select outputs 

Your results cannot currently be written back to the data model, so you'll need to put in bucket file paths here. This feature is still under active development. For now, your data should be written to the workspace bucket. You could also write the results to workspace data if you wish. You can leave the output field blank, you'll be able to access the bucket locations where the workflow dropped its outputs through the Job Manager.


1.5. Run your workflow

Click "Save" to confirm your input and output attributes. If all required fields are properly filled in,  you'll be able to click "Run Analysis".


1.6. Look at the results in the bucket 

To do this, go your Job History tab, find the workflow run you just completed and click on it to see the Job and click on the “Execution Directory” icon to the right of the workflow run:




This will take you to the bucket directory in the google cloud console. Your results should be in stdout. When you click on stdout you'll see a page of details with different options for accessing this file. You can use the gsutil URI to transfer the file to a desired location, or just use the Authenticated URL to open this file in the browser.


2. Using the WFL to write outputs to a workspace


Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request



Please sign in to leave a comment.