Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate. Terra powers important scientific projects like FireCloud, AnVIL, DataSTAGE. Click on the "Co-branded Projects" link above to learn more.

Task run request has exceeded the maximum PAPI request size

Comments

5 comments

  • Avatar
    Sushma Chaluvadi

    Hello Walt,

    Is this task perhaps requiring a localization step wherein it has to read in a long list of files? 

    0
    Comment actions Permalink
  • Avatar
    Walt Shands

    Yes, the error shows up between two tasks; I received a suggestion to localize the files with gsutil cp, but gsutil cp needs a google bucket path like 'gs://', which I wouldn't have. Also this wouldn't work if running the workflow locally with Cromwell. So maybe the best to do is tar gz up the output of the previous task and use that as an input to the following task and untar gz the input file. Would that get around the PAPI error? The input would be only one file instead of 1000's and would be smaller, but I don't really understand if the PAPI error is the number of files or the total size; although it seems like large CRAM files are localized with no problem.

    0
    Comment actions Permalink
  • Avatar
    Sushma Chaluvadi

    Hi Walt,

    I think that the error is stating that the actual command is too long in length since it looks like the input is 1000's of files/filenames. This is assuming that all of these inputs are read in at a single time perhaps as an array rather than one at a time.

    You mentioned that there are large CRAM files that are localized with no problem but are there as many CRAM files (on the order of 1000's) as there are in the task that is failing? My guess is that if you were to modify the workflow such that only a handful of inputs are passed instead of the 1000, that it would work or as you suggested, creating a single tar.gz. 

    Sushma

    0
    Comment actions Permalink
  • Avatar
    Walt Shands

    Thanks there is an Array[File] that is input that contains around 1800 files. I can create a tar gz file for input instead so there is only on file input. I hope that might solve the problem.

    0
    Comment actions Permalink
  • Avatar
    Sushma Chaluvadi

    Hi Walt,

    Another option is to use a file of file names (FoFn). All this means is writing the paths of all the 1000s of inputs to a text file which is output from the first task and input to the second task which then reads the file paths from the FoFn as input. I'm not sure which would be easier to do in your case but just another option!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk