FireCloud Service Selector (FISS) is a utility command module that allows API (Application Programming Interface) calls from the notebook to the workspace. Scripting with FISS is much like running on a local machine, but with Terra's built-in security and cloud integration. This article covers the basics of scripting on top of the Terra and Google Cloud environments using FISS to access Terra's APIs.
Why use scripting/APIs?
Why would you want to use Terra through an API when you can work directly in Terra? It comes down to automation and scalability, and personal preference. You can use FISS commands from a Jupyter Notebook or from the command line.
Flexible data access
Interacting directly with back-end Application Programming Interfaces (APIs) gives greater flexibility when manipulating data and setting up data tables. It lets you scale up your analysis by automating how you configure and run workflows.
Automation and scalability
Maybe you hope to streamline your analysis while avoiding human errors. Using the APIs that power Terra programmatically means you can automate much of the setup process, which lets you standardizeand scale up your work.
Personal preference
Sometimes you may you want to upload data and run workflows without using Terra's graphical interface. Maybe you are more comfortable with scripting than clicking buttons. If you just prefer scripting, you can do that as well.
Tasks that are easier - or even only possible - using FISSAccessing data from a workspace table in a notebook is only possible with FISS.
Accessing controlled data in an interactive analysis (i.e., notebook) is only possible with FISS.
Collecting data references from multiple tables into a single table is easier with FISS. Using the standard interface for this involves three manual steps: downloading and editing the tsv file by hand, then uploading back the workspace.
What is FISS?
FISS stands for (FI)reCloud (S)ervice (S)elector. It is a Python-based API and command-line interface to the FireCloud/Terra APIs. It is also callable from R.
Two library levels
- Low-level API: corresponds closely to the FireCloud/Terra API
(calling the low-level/API layer is easy from Python/R using standard function call syntax) - Higher-level: provides additional support for chunking/batching for scalability, etc.
Preinstalled in Terra cloud environment
- Manually install on other systems using `pip install firecloud`
Technical details
- Refer to the FISS code and FireCloud/Terra API (Swagger) for more details
- Data table operations listed in the Swagger `Entities` section
High-level FISS functions from the command-line
Calling high-level/FISS layer requires creation of a “parameter object” (e.g., Python named tuple) and it is often easier to “shell out” (!) from the notebook to the FISS command-line than to call the high-level functions directly.
Use command-line interface for high-level FISS functions
- List available commands:
fissfc –l
- Display command information:
fissfc –help
FISS config file available
- Default values for commonly used parameters
- Ex: billing project, workspace, etc.
Useful subcommands for working with data tables
Scalable/chunked upload of TSV file: entity_import
Copy entities from one workspace to another: entity_copy
Delete specific entity(s)/row(s) in a data table: entity_delete
List all the entities/rows by name and id in a workspace: entity_list
Return data table entities/rows in TSV format (limited scalability): entity_tsv
Return the names of the entity types/tables in a workspace: entity_types
Use case: Managing data with the FISS API - beyond the UI
Managing a small collection of samples by manually creating a workspace table to track them and their metadata is fairly straightforward. It can be accomplished using a spreadsheet and the "Upload TSV" feature of the Terra UI. However, large projects can produce data on hundreds, thousands, even hundreds of thousands of samples. The time it takes to upload this data to a Google bucket or reference the files in your workspace data model is significant, and manually creating a workspace, uploading data, and tracking potentially hundreds of fields of extra metadata is infeasible to do by hand.
Instead, when projects grow, the best approach is to script data management - both uploading data to your workspace bucket and creating a workspace data table to track the file locations along with all their extra metadata. This allows you to deal with workspaces that contain thousands (or many more) samples while minimizing errors and maximizing your time.
Be careful when managing controlled-access data with FISS It's important to remember that you are ultimately responsible for abiding by the data use agreements for data you are authorized to use - whether you are using FISS or Terra to manipulate that data. Never copy controlled-access data to a workspace where it can be accessed by someone not authorized to access it.