Managing data and automating workflows with the FISS API

Allie Hajian

FireCloud Service Selector (FISS) is a utility command module that allows API (Application Programming Interface) calls from the notebook to the workspace. Scripting with FISS is much like you would run on a local machine, but with Terra's built-in security and cloud integration. This article covers the basics of scripting on top of the Terra and Google Cloud environments using FISS to access Terra's APIs.

Why use scripting/FISS APIs?

Why would you want to use Terra through an API when you can use Terra’s interface instead?  It comes down to automation and scalability, and personal preference. You can use FISS commands from a Jupyter notebook or from the command line. 

  1. Flexible data access
    Interacting directly with back-end Application Programming Interfaces (APIs) gives greater flexibility when manipulating data and setting up data tables. It lets you scale up your analysis by automating how you configure and run workflows. To do this, you can use.
  2. Automation and scalability
    Maybe you’re hoping to streamline your analysis while avoiding human errors. Using the APIs that power Terra programmatically means you can automate much of the setup process, which lets you standardize and scale up your work.

  3. Personal preference
    There may be times when you want to upload data and run workflows without using Terra's graphical interface. Maybe you are more comfortable with scripting than clicking buttons. If you just prefer scripting, you can do that as well. 

G0_tip-icon.png


Tasks that are easier - or even only possible - using FISS

  Accessing data from a workspace table in a notebook is only possible with FISS

Accessing controlled data in an interactive analysis (i.e. notebook) is only possible with FISS

Collecting data references from multiple tables into a single table, for example, is easier with FISS. Using the standard interface for this involves three manual steps: downloading and editing the tsv file by hand, then uploading back the workspace. 

What is FISS?

G0_tip-icon.png


FIreCloud Service Selector FISS) details

  - Python client API and command-line interface to FireCloud/Terra API
- Callable from R

Two library levels
- Low-level API: corresponds closely to the FireCloud/Terra API (calling the low-level/API layer is easy from Python/R using standard function call syntax)
- Higher-level: provides additional support for chunking/batching for scalability, etc

Pre-installed in Terra cloud environment
- Manually install on other systems using `pip install firecloud`

Technical details
- Refer to the
FISS code and FireCloud/Terra API (Swagger) for more details
- Data table operations listed in the Swagger `Entities` section

 

High-level FISS functions from the command-line

Calling high-level/FISS layer requires creation of a “parameter object” (e.g., Python named tuple) and it is often easier to “shell out” (!) from the Notebook to the FISS command-line than to call the high-level functions directly.

Use command-line interface for high-level FISS functions
  - List available commands: fissfc –l
  - Display command information: fissfc –help

FISS config file available
  - Default values for commonly used parameters 
  - Ex: billing project, workspace, etc.

Useful subcommands for working with data tables

Scalable/chunked upload of TSV file: entity_import

Copy entities from one workspace to another: entity_copy 

Delete specific entity(s)/row(s) in a data table: entity_delete

List all the entities/rows by name and id in a workspace: entity_list

Return data table entities/rows in TSV format (limited scalability): entity_tsv

Return the names of the entity types/tables in a workspace: entity_types 

Use case: Managing data with the FISS API - beyond the UI

Managing a small collection of samples by manually creating a workspace table to track them and their metadata is fairly straightforward. It can be accomplished using a spreadsheet and the "Upload TSV" feature of the Terra UI.  However large projects can produce data on hundreds, thousands, even hundreds of thousands of samples. The time it takes to upload this data to a Google bucket or reference the files in your workspace data model is significant, and manually creating a workspace, uploading data, and tracking potentially hundreds of fields of extra metadata is infeasible to do by hand.

Instead, when projects grow, the best approach is to script data management - both uploading data to your workspace bucket and creating a workspace data table to track the file locations along with all their extra metadata.  This allows you to deal with workspaces that contain thousands (or many more) samples while minimizing errors and maximizing your time.

G0_warning-icon.png


Be careful when managing controlled-access data with FISS

  It's important to remember that you are ultimately responsible for abiding by the data use agreements for data you are authorized to use - whether you are using FISS or the UI to manipulate that data. You should never copy controlled-access data to a workspace where it can be accessed by someone not authorized to access it.

Use case: Automating workflows with the FISS API

Manually setting up and running 5 workflows is trivial, but if you had to run that same workflow on 100,000 whole genome samples, setting up manually is impossible. Scripting the process will make the task tractable, saving you time and effort, especially when you want to repeat the same analysis over and over again. You can use this approach to build command line scripts written in Python, or even Jupyter Notebooks, that help you with your day-to-day work.

With the FISS software library you can build automation - set up a workspace, move and manage data, and perform an analysis - but through Python scripts that you run on your computer or in a Jupyter Notebook. Leveraging the library allows you to perform many of the same actions you can do in the graphical Terra interface, but interacting more directly using scripting enables automating and scaling. 

FISS API tutorial workspace

Click here to go to the FISS tutorial workspace.

Want to know when the tutorial will be updated?  Click the "Follow" button at the top right f the article.

The tutorial workspace will include use cases with sample code for times when you might want to go beyond Terra's graphic interface. Notebooks within the workspace will highlight different examples of when you might find scripting particularly helpful. 

Notebook #1: Create a workspace and add data in a Terra workspace with FISS

Python script in a notebook will help
- Create the workspace
- Upload data to your workspace Google bucket
- Upload data pointers and metadata to the workspace
- Use the data model in a Python script or notebook

Notebook #2: Automating and scaling workflows with the FISS API

Python script in a notebook will help
- Import and configure a workflow
- Launch and monitor the workflow analysis
- Debug your analysis 

Resources: Using the FISS library

See example notebooks in the Biodata Catalyst Collection workspace:
- Intro to FISS API in Python
- Intro to Fiss API in R

Use of `model = flexible parameter` is required for Gen3 data in Terra! Otherwise you are constrained to `sample` and `participant` data tables

FISS_Flexible-model-in-notebook_Screen_shot.png

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

4 comments

  • Comment author
    Priyanka Srivastava
    • Edited

    Allie Hajian, when can we expect the FISS API tutorial to be out? Keen to know how would the authorization work if we try to call the APIs from an external app? I couldn't find that mentioned any where.

    1
  • Comment author
    Allie Hajian

    Hi Priyanka Srivastava! User Ed is working on the FISS API tutorials, though I am not sure they will answer your specific question. I have submitted a ticket to Frontline on your behalf and you should hear from them, soon!

    0
  • Comment author
    Samantha (she/her)

    Hi Priyanka Srivastava,

    To authorize your account when calling the FISS APIs from external apps, you will just need to run gcloud auth login --update-adc. Please let me know if you have any other questions.

    Best,

    Samantha

    0
  • Comment author
    Priyanka Srivastava
    • Edited

    Thanks Allie and Samantha, 

    I have a couple of further questions.
    I want to be able to invoke the Firecloud APIs from an external app to import the tsv inorder to populate the data tables and also invoke some other APIS like getWorkspaces etc. How can I do that?
    Want to confirm my understanding from this doc:
    Is using FISS is the only way to make a call to the fire cloud APIs or could we invoke them directly through our java app? If yes, how will the user be authorized?
    Does FISS only allow invoking the APIs through the notebooks in the terra workspace?

    0

Please sign in to leave a comment.