Broken pipe error

Post author
Cora Ricker

Hello,

I am trying to run the two lines of code below in a python script & am getting the following error on Terra:

command =  'export NHOME=/netMHCpan-4.1; export NETMHCpan=/netMHCpan-4.1/Linux_x86_64; /netMHCpan-4.1/Linux_x86_64/bin/netMHCpan -a '+hlastring+' -f '+pepfile+' -inptype 0 -l ' + str(length) + '-s -xls -xlsfile '+outpath+'/NETMHCpan_out_' + str(length) + varianttype + '.xls -allname /netMHCpan-4.1/Linux_x86_64/data/allelenames -hlapseudo /netMHCpan-4.1/Linux_x86_64/data/MHC_pseudo.dat -t 500 -version /netMHCpan-4.1/Linux_x86_64/data/version -tdir /netMHCpan-4.1/scratch/XXXXXX -rdir /netMHCpan-4.1/Linux_x86_64/ -thrfmt /netMHCpan-4.1/Linux_x86_64/data/threshold/%s.thr.%s > '+outpath+'/netMHCpanoutlen_' + str(length) + varianttype +'.txt'
subprocess.call(command, shell=True)

 

The script runs successfully locally and I have tried increasing memory and disk space, but that has not helped.  I have also tried adding /cromwell_root/ to all paths linking to a file or folder, but have not had success there either. 

Do you have any idea where this error is coming from?  I'm an affiliate member of the Broad and was also wondering if you might have office hours that I could join?

 

 

Comments

7 comments

  • Comment author
    Jason Cerrato

    Hey Cora,

    Thanks for writing in. We don't currently offer office hours, but we are happy to dig into this as soon as we can.

    Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.

    1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
    2. Click Save.

    Please provide us with

    1. A link to your workspace
    2. The relevant submission ID
    3. The relevant workflow ID

    Many thanks,

    Jason

    0
  • Comment author
    Cora Ricker

    Thank you, Jason!

     

    Workspace: https://portal.firecloud.org/#workspaces/vanallen-firecloud-nih/VanAllen-Schoenfeld_16-284_SCC_copy

    Submission ID: 23d8e22d-50d6-4fac-aebb-bec4f7530739

    Workflow ID: e63e7ada-473b-4481-afa9-30b9c9daf457

     

    Please let me know if there's anything else you need.

     

     

    0
  • Comment author
    Jason Cerrato

    Hi Cora,

    Thanks for sharing that workspace. I was able to take a look at your job and noticed something interesting.

    Your runNetMHCpan task is the one that's showing the error messages you reported, but the task itself is showing as successfully completed.

    Is this task perhaps not catching that the script isn't working as expected? Is there a way I can see the script itself?

    Kind regards,

    Jason

    0
  • Comment author
    Cora Ricker

    Hi Jason, 

    The script will create a dummy file that's empty when something isn't working, so the task will still succeed with the dummy file as it's output. The next task mutPostProcessing fails because it tries to use the empty file from runNetMHCpan. 

    This is the whole script:

    # ----------------------------------------------------------------------------------------------- #
    # Import necessary packages

    #!/usr/bin/python
    import sys
    import numpy as np
    import subprocess
    import os

    # ----------------------------------------------------------------------------------------------- #


    # ----------------------------------------------------------------------------------------------- #
    # Function: runNetMHCIpan
    # Inputs: FASTA file of peptide sequences, patient HLA alleles (these are automatically given
    # by Polysolver and come in a .txt file that needs to be pre-processed into the correct format for
    # netMHCpan), peptide length outpath
    # Returns: None (netMHCpan will automatically write output to a .xls file)
    # Summary: Pre-processes patient HLA alleles, runs netMHCIpan.
    def runNetMHCIpan(pepfile, hlafile, length, outpath):
    # Determine whether we're dealing with a snv or indel file (for naming the outfile)
    varianttype = ''
    if pepfile.split('_')[1].split('.')[0] == 'snv':
    varianttype = 'SNV'
    if pepfile.split('_')[1].split('.')[0] == 'indel':
    varianttype = 'InDel'
    # Read in HLA alleles file and process
    with open(hlafile) as f:
    hlalines = f.read().splitlines()
    hlaalleles = []
    # Determine which input format the hla allele file is in
    if len(hlalines[0].split('\t')) <= 1: # In already pre-processed format
    hlaalleles = hlalines
    else: # Polysolver output file
    for line in hlalines:
    split = line.split('\t')
    # Reformat each allele (2 for each type of HLA A, B, and C)
    for i in range(1, 3):
    currallele = 'HLA-'
    allele = split[i]
    components = allele.split('_')
    currallele += components[1].upper() + components[2] + ':' + components[3]
    hlaalleles.append(currallele)
    hlaalleles = list(set(hlaalleles)) # Remove duplicate alleles if there are any
    hlastring = ','.join(hlaalleles)
    # Run netMHCI pan
    command = 'export NHOME=/netMHCpan-4.1; export NETMHCpan=/netMHCpan-4.1/Linux_x86_64; /netMHCpan-4.1/Linux_x86_64/bin/netMHCpan -a '+hlastring+' -f '+pepfile+' -inptype 0 -l ' + str(length) + '-s -xls -xlsfile '+outpath+'/NETMHCpan_out_' + str(length) + varianttype + '.xls -allname /netMHCpan-4.1/Linux_x86_64/data/allelenames -hlapseudo /netMHCpan-4.1/Linux_x86_64/data/MHC_pseudo.dat -t 500 -version /netMHCpan-4.1/Linux_x86_64/data/version -tdir /netMHCpan-4.1/scratch/XXXXXX -rdir /netMHCpan-4.1/Linux_x86_64/ -thrfmt /netMHCpan-4.1/Linux_x86_64/data/threshold/%s.thr.%s > '+outpath+'/netMHCpanoutlen_' + str(length) + varianttype +'.txt'
    subprocess.call(command, shell=True)

    # Catch case where peptide file was empty (create dummy file)
    dummyfile = outpath+'/NETMHCpan_out_'+str(length)+varianttype+'.xls'
    open(dummyfile, 'a').close()

    return

    # ----------------------------------------------------------------------------------------------- #


    # ----------------------------------------------------------------------------------------------- #
    # Function: runNetMHCIIpan
    # Inputs: FASTA file of peptide sequences, patient HLA alleles (these are automatically given
    # by Polysolver and come in a .txt file that needs to be pre-processed into the correct format for
    # netMHCIIpan), peptide length outpath
    # Returns: None (netMHCIIpan will automatically write output to a .xls file)
    # Summary: Pre-processes patient HLA alleles, runs netMHCIIpan
    def runNetMHCIIpan(pepfile, hlafile, length, outpath):
    # Determine whether we're dealing with a snv or indel file (for naming the outfile)
    varianttype = ''
    if pepfile.split('_')[1].split('.')[0] == 'snv':
    varianttype = 'SNV'
    if pepfile.split('_')[1].split('.')[0] == 'indel':
    varianttype = 'InDel'
    # Read in HLA alleles file and process
    with open(hlafile) as f:
    hlalines = f.read().splitlines()
    hlaalleles = []
    # Determine which input format the hla allele file is in
    if len(hlalines[0].split('\t')) <= 1: # In already pre-processed format
    hlaalleles = hlalines
    else: # PHLAT output file
    # DQA1
    DQA1a = hlalines[4].split('\t')[1].split('*')[1][0:5]
    DQA1a = DQA1a.split(':')[0]+DQA1a.split(':')[1]
    DQA1b = hlalines[4].split('\t')[2].split('*')[1][0:5]
    DQA1b = DQA1b.split(':')[0]+DQA1b.split(':')[1]
    # DQB1
    DQB1a = hlalines[5].split('\t')[1].split('*')[1][0:5]
    DQB1a = DQB1a.split(':')[0]+DQB1a.split(':')[1]
    DQB1b = hlalines[5].split('\t')[2].split('*')[1][0:5]
    DQB1b = DQB1b.split(':')[0]+DQB1b.split(':')[1]
    # Concatenate four DQ isoforms to be in correct format
    DQA1B1a = 'HLA-DQA1'+DQA1a+'-DQB1'+DQB1a
    DQA1aB1b = 'HLA-DQA1'+DQA1a+'-DQB1'+DQB1b
    DQA1bB1a = 'HLA-DQA1'+DQA1b+'-DQB1'+DQB1a
    DQA1B1b = 'HLA-DQA1'+DQA1b+'-DQB1'+DQB1b
    # DRB1
    DRB1a = hlalines[6].split('\t')[1].split('*')[1][0:5]
    DRB1a = DRB1a.split(':')[0]+DRB1a.split(':')[1]
    DRB1b = hlalines[6].split('\t')[2].split('*')[1][0:5]
    DRB1b = DRB1b.split(':')[0]+DRB1b.split(':')[1]
    # Format DRB1 alleles
    DRB1a = 'DRB1_'+DRB1a
    DRB1b = 'DRB1_'+DRB1b
    # Add alleles to list
    hlaalleles.append(DQA1B1a)
    hlaalleles.append(DQA1aB1b)
    hlaalleles.append(DQA1bB1a)
    hlaalleles.append(DQA1B1b)
    hlaalleles.append(DRB1a)
    hlaalleles.append(DRB1b)
    hlaalleles = list(set(hlaalleles)) # Remove duplicate alleles if there are any
    hlastring = ','.join(hlaalleles)

    # Run netMHCIIpan if file is not empty
    if os.path.getsize(pepfile) > 1:
    command = 'export NHOME=/netMHCIIpan-4.0; export NETMHCpan=/netMHCIIpan-4.0/Linux_x86_64; /netMHCIIpan-4.0/netMHCIIpan -a '+hlastring+' -f '+pepfile+' -inptype 0 -length '+str(length)+' -fast -filter 1 -affF 500 -rankF 2.0 -s -xls -xlsfile '+outpath+'/NETMHCIIpan_out_'+str(length)+varianttype+'.xls rdir /netMHCIIpan-4.0/Linux_x86_64/ > '+outpath+'/netMHCIIpanoutlen_'+str(length)+varianttype+'.txt'
    subprocess.call(command, shell=True)
    # Catch case where peptide file was empty (create dummy file)
    dummyfile = outpath+'/NETMHCIIpan_out_'+str(length)+varianttype+'.xls'
    open(dummyfile, 'a').close()

    return

    # ----------------------------------------------------------------------------------------------- #


    # ----------------------------------------------------------------------------------------------- #
    # Main function
    def main():
    # Check to make sure we have the right number of inputs
    if len(sys.argv) != 6:
    print ('Error: incorrect number of inputs.')
    print ('Please input FASTA file(s), a HLAalleles.txt file, the peptide length(s), a netMHCpan version, and an outpath.')
    sys.exit()
    # Parse inputs
    fastas = sys.argv[1]
    alleles = sys.argv[2]
    peplengths = sys.argv[3]
    versionchoice = sys.argv[4]
    outpath = sys.argv[5]
    # Split FASTA files and peptide lengths
    fastalist = fastas.split(',')
    lengthslist = peplengths.split(',')
    if len(fastalist) != len(lengthslist):
    print ('Error: Please make sure your peptide lengths correspond to the fasta files and are in the same order.')
    sys.exit()
    # Run whichever netMHC version is desired
    if versionchoice == '1':
    for i in range(0, len(fastalist)):
    runNetMHCIpan(fastalist[i], alleles, lengthslist[i], outpath)
    else:
    for i in range(0, len(fastalist)):
    runNetMHCIIpan(fastalist[i], alleles, lengthslist[i], outpath)
    return

    if __name__ == '__main__':
    main()

    # ----------------------------------------------------------------------------------------------- #
    0
  • Comment author
    Jason Cerrato

    Hi Cora,

    Thanks for sharing that. I'll take a closer look at the script to see if I can identify what's going on.

    As far as the WDL is concerned, if you wanted to make sure it failed in the case of a blank file, perhaps you can perform a check on the contents of the script's output file and fail the workflow in cases where that file is empty?

    Kind regards,

    Jason

    0
  • Comment author
    Cora Ricker

    Hi Jason,

    Thank you for the recommendation - that helped me figure out the problem! I still get the broken pipe error, but I'm able to run the tasks successfully. 

    My original assumption was wrong, it wasn't failing because of a blank file. The mutPostProcessing task was failing because it was trying to look for files that are labeled 'InDel' or 'SNV', as below:

    But instead the runNetMHCpan was simply labeling the files as follows:

    This was because my script was not getting the variant type from the pepfile name (i.e. len9pep_FASTA_indel.txt). I changed my code to the following & it worked:

    	varianttype = ''
    if pepfile.split('_FASTA_')[1].split('.')[0] == 'snv':
    varianttype = 'SNV'
    if pepfile.split('_FASTA_')[1].split('.')[0] == 'indel':
    varianttype = 'InDel'

    Thank you again!

    Best,

    Cora

     

     

    0
  • Comment author
    Jason Cerrato

    Hi Cora,

    Ah, I see! I'm glad to hear the recommendation was helpful and that you were able to find the true cause of the issue.

    If we can help with anything else, please let us know!

    Kind regards,

    Jason

    0

Please sign in to leave a comment.