Data entity type for MuTect

Post author
Sehyun Oh

Hi! I'm quite confused about which data entity I should use for my workflow.

I'm trying to run MuTect_v1 on both tumor/normal paired-mode for SNV detection and on artifact-detection-mode for PoN construction. I initially thought using 'pair', but I couldn't access bam and bai files - only this.case_sample and this.control_sample. Could someone help me with this? Thanks!

Comments

11 comments

  • Comment author
    Sushma Chaluvadi

    Hi Sehyun,

    Is there a specific reason that you are using this specific version of Mutect? It happens to be an older version of the mutect pipeline and Mutect2 is now the most current! 

    This Featured Workspace demonstrates somatic SNV calling with Mutect2. In this workspace, you will see the specific 2-Mutect2-GATK4 Tool that uses a pair entity. 

    For the syntax of this.case_sample and this.control_sample - these two indicators point to the case and control samples respectively in the sample entity table. From the sample table, the Tool will describe which columns (.bam or .bai) files to use appropriately. 

    I would suggest using the most updated version of Tools as they follow our Best Practices!

    0
  • Comment author
    Sehyun Oh

    Hi Sushma,

    I'm using a specific version of MuTect (v.1.7.1), because my tool for the downstream analysis is optimized for this version of MuTect.

    I tried MuTect2 from the 'Featured Workspace' to understand the input entity, but I'm still lost. If I use a pair entity, the attributes I can select do not include .bam or .bai. When I just try to run with this.case_sample as an input attribute, I got the error below:

    Mutect2.tumor_reads_index - Attribute expression returned a reference to an entity.
    Mutect2.tumor_reads - Attribute expression returned a reference to an entity.
    FYI, because of this bug (https://support.terra.bio/hc/en-us/community/posts/360042965272-Workspace-Reference-Data-identifier-part-expected-), I used the gs://path/to/file directly, instead of workspace attributes. I don't think the above error is related to this anyway. 
     

     

     

    0
  • Comment author
    Sushma Chaluvadi

    Sehyun,

     

    The pair table will contain a sample_id in the case column that points to a case sample within the sample table and also it will contain a sample_id in the control column that points to the control sample in the sample table. In the Featured Workspace you will see that in the pair table that the case_sample is SM-74P4M and the control_sample is SM-74NEG. When you select "pair" in the Tool, the this.case_sample and this.control_samples are going to look in the Pair table, in the columns case_sample and control_sample, and take the sample IDs. Then it will go to the sample table and look at the rows SM-74P4M and SM-74NEG and the tool will know to look at the .bam and .bai columns. By choosing "pair" you are not telling the tool to directly look at the sample table where you have listed your .bam or .bai. 

     

    I think it may be easier to help troubleshoot if you can share your workspace with the GROUP_FireCloud-Support@firecloud.org

    user so that I can take a look at the set-up of your Data Model and Tool? I suspect that when the Data was uploaded to the Data Model before a new feature was introduced to Terra to explicitly link the participant, sample, and pair tables together so that the Tool knows how to sequentially read across the tables. Can you also try and upload your data tables again?
    0
  • Comment author
    Sehyun Oh

    I'm not sure whether I can share the workspace - it includes controlled dataset. ;(

    Actually, re-loading data table didn't cause the above error, but it failed with this error:

    Failed to evaluate 'Mutect2.tumor_reads_size' (reason 1 of 1): Evaluating ceil(size(tumor_reads, "GB") + size(tumor_reads_index, "GB")) failed: java.lang.IllegalArgumentException: Could not build the path "OV-04-1331-TP". It may refer to a filesystem not supported by this instance of Cromwell. Supported filesystems are: Google Cloud Storage, DRS. Failures: Google Cloud Storage: Path "OV-04-1331-TP" does not have a gcs scheme (IllegalArgumentException) DRS: OV-04-1331-TP does not have a dos scheme. (IllegalArgumentException) Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems

    I'm using `mutect2_nio` tool (https://dockstore.org/workflows/github.com/gatk-workflows/gatk4-somatic-snvs-indels/mutect2_nio:2.4.0?tab=info). Here are inputs.json and data tables.

    gs://terra-cnvworkflow/inputs.json
    gs://terra-cnvworkflow/pair.tsv
    gs://terra-cnvworkflow/participant.tsv
    gs://terra-cnvworkflow/sample.tsv
    0
  • Comment author
    Sushma Chaluvadi

    Hi Sehyun,

    In your attributes for the mutect2 Tool, where it says tumor_reads, can you enter in this.case_sample.WGS_bam_path. This format tells the tool to look for the bam file associated with the case_sample listed. Similarly the tumor_reads_index attribute should say this.case_sample.WGS_bai_path. I used the .tsv files that you shared to get the column headers WGS_bam_path and WGS_bai_path but essentially you would add whichever column header your bam files are listed under!

    Please let me know if this works!

    0
  • Comment author
    Sehyun Oh

    Hi Sushma,

    I tried this.case_sample.WXS_bam_path and this.case_sample.WXS_bai_path for tumor_reads and tumor_read_index attributes, respectively. It failed right away with the error below.

    M1_only.M1.normal_bai - Expected single value for workflow input, but evaluated result set was empty
    M1_only.M1.normal_bam - Expected single value for workflow input, but evaluated result set was empty
    Any other suggestion?
     
    - Sehyun
    0
  • Comment author
    Sushma Chaluvadi

    Would you be able to share screenshots of the Data Tables and the Tool page - you don't have to share the workspace since it has controlled data? I would like to know what your Tool set up looks like. Are you choosing Pair as the "Process multiple workflows from" choice? 

     

    Additionally, it looks like the M1_only task does not appear in the link to the Tool that you provided above (`mutect2_nio`). Can you confirm which tool you are using? The mutect2_nio Tool does not contain any task named M1_only.

    0
  • Comment author
    Sehyun Oh

    Sorry about posting a wrong output. This is what I got from mutect2_nio tool.

    Mutect2.tumor_reads_index - Expected single value for workflow input, but evaluated result set was empty
    Mutect2.tumor_reads - Expected single value for workflow input, but evaluated result set was empty
    And yes, I choose Pair as the "Process multiple workflows from" choice. Here are the screenshots of 1) selecting pair (I picked only one sample for the test), 2) input data, and 3) failed message.
     
     
    0
  • Comment author
    Sushma Chaluvadi

    Hi Sehyun,

    The screenshots all show that the set up looks okay. I replicated this analysis in my workspace (with the same Tool etc) and I was able to run this successfully. This indicates to me that the Pair table is not linked with the Sample table.

    When you uploaded your Data Tables, did you see in the pop-up window a checkbox that says 'Create participant, sample, pair, associations.."? 

    When I uploaded my pair table, for example, after my participants and samples table, it shows the checkbox. Did you see and click that button before hitting UPLOAD? Note: I clicked the checkbox each time - when I uploaded the participant.tsv, again when I uploaded the samples.tsv, and the third time (pictured above) when I uploaded the pair.tsv. 

    This step is required because it tells Terra to connect these three tables together. I know you mentioned above that you re-uploaded your tables but just wanted to double check that you checked the box.

    Thanks!

    1
  • Comment author
    Sehyun Oh
    • Edited

    Hi Sushma,

    This is THE answer! ;) I don't recall exactly how I uploaded the data tables before, but apparently, they weren't linked properly. I uploaded them making sure the linking steps, and now the tool properly recognizes this.case_sample.WXS_bam_path and this.case_sample.WXS_bai_path. Thanks a lot for your help!

    - Sehyun

     

    0
  • Comment author
    Sushma Chaluvadi

    Awesome! Glad we were able to get to the solution :)

    0

Please sign in to leave a comment.