Failed run at container starting step

Post author
dplichta

Hi,

I got a following type of error (not seen before) for a few shards in my pipeline in rjxmicrobiome workspace, method Micropilot_v2_mOTUS_v2.

Any ideas?

message: Task workflowMotus.motus:10:1 failed. The job was stopped before the command finished. PAPI error code 2. Execution failed: action 13: starting container: running container: running ["docker" "run" "-d" "--env-file" "/tmp/environment419495526" "--network" "pipelines" "-v" "/mnt/disks/local-disk:/cromwell_root:rslave" "-v" "/var/lib/pipelines/google:/google:ro,rslave" "google/cloud-sdk:slim" "/bin/sh" "-c" "python -c 'import base64; print(base64.b64decode(\"IyEvYmluL2Jhc2gKCmZvciBpIGluICQoc2VxIDMpOyBkbwogICgKICAgIHJtIC1mICRIT01FLy5jb25maWcvZ2Nsb3VkL2djZSAmJiBnc3V0aWwgIGNwIGdzOi8vZmMtc2VjdXJlLTgwMmJmODgwLTE2YjEtNGExMC1hZDg5LWRhOThmNzk5MTliOC8wMDIwMzM1MS1kZmY0LTQzYWUtOTY2Ny00MGVkNzA1NGU4ZDMvd29ya2Zsb3dNb3R1cy82NDRhMjI0My1iOGYwLTQxYTMtOTE5YS0xNTYxODczMzc1YzkvY2FsbC1xY1F1YWxpdHlIdW1hbi9zaGFyZC0xMC9hdHRlbXB0LTIvQkktMTYtMDI1Ny5hZGFwdGVyVHJpbW1lZC4xX2tuZWFkZGF0YV9wYWlyZWRfMS5mYXN0cS5neiAvY3JvbXdlbGxfcm9vdC9mYy1zZWN1cmUtODAyYmY4ODAtMTZiMS00YTEwLWFkODktZGE5OGY3OTkxOWI4LzAwMjAzMzUxLWRmZjQtNDNhZS05NjY3LTQwZWQ3MDU0ZThkMy93b3JrZmxvd01vdHVzLzY0NGEyMjQzLWI4ZjAtNDFhMy05MTlhLTE1NjE4NzMzNzVjOS9jYWxsLXFjUXVhbGl0eUh1bWFuL3NoYXJkLTEwL2F0dGVtcHQtMi9CSS0xNi0wMjU3LmFkYXB0ZXJUcmltbWVkLjFfa25lYWRkYXRhX3BhaXJlZF8xLmZhc3RxLmd6ID4gZ3N1dGlsX291dHB1dC50eHQgMj4mMQojIFJlY29yZCB0aGUgZXhpdCBjb2RlIG9mIHRoZSBnc3V0aWwgY29tbWFuZCB3aXRob3V0IHByb2plY3QgZmxhZwpSQ19HU1VUSUw9JD8KaWYgWyAiJFJDX0dTVVRJTCIgIT0gIjAiIF07IHRoZW4KICBwcmludGYgJyVzICVzXG4nICIkKGRhdGUgLXUgJyslWS8lbS8lZCAlSDolTTolUycpIiBybVwgLWZcIFwkSE9NRS8uY29uZmlnL2djbG91ZC9nY2VcIFwmXCZcIGdzdXRpbFwgXCBjcFwgZ3M6Ly9mYy1zZWN1cmUtODAyYmY4ODAtMTZiMS00YTEwLWFkODktZGE5OGY3OTkxOWI4LzAwMjAzMzUxLWRmZjQtNDNhZS05NjY3LTQwZWQ3MDU0ZThkMy93b3JrZmxvd01vdHVzLzY0NGEyMjQzLWI4ZjAtNDFhMy05MTlhLTE1NjE4NzMzNzVjOS9jYWxsLXFjUXVhbGl0eUh1bWFuL3NoYXJkLTEwL2F0dGVtcHQtMi9CSS0xNi0wMjU3LmFkYXB0ZXJUcmltbWVkLjFfa25lYWRkYXRhX3BhaXJlZF8xLmZhc3RxLmd6XCAvY3JvbXdlbGxfcm9vdC9mYy1zZWN1cmUtODAyYmY4ODAtMTZiMS00YTEwLWFkODktZGE5OGY3OTkxOWI4LzAwMjAzMzUxLWRmZjQtNDNhZS05NjY3LTQwZWQ3MDU0ZThkMy93b3JrZmxvd01vdHVzLzY0NGEyMjQzLWI4ZjAtNDFhMy05MTlhLTE1NjE4NzMzNzVjOS9jYWxsLXFjUXVhbGl0eUh1bWFuL3NoYXJkLTEwL2F0dGVtcHQtMi9CSS0xNi0wMjU3LmFkYXB0ZXJUcmltbWVkLjFfa25lYWRkYXRhX3BhaXJlZF8xLmZhc3RxLmd6XCBmYWlsZWQKICAjIFByaW50IHRoZSByZWFzb24gb2YgdGhlIGZhaWx1cmUKICBjYXQgZ3N1dGlsX291dHB1dC50eHQKCiAgIyBDaGVjayBpZiBpdCBtYXRjaGVzIHRoZSBCdWNrZXRJc1JlcXVlc3RlclBheXNFcnJvck1lc3NhZ2UKICBpZiBncmVwIC1xICJCdWNrZXQgaXMgcmVxdWVzdGVyIHBheXMgYnVja2V0IGJ1dCBubyB1c2VyIHByb2plY3QgcHJvdmlkZWQuIiBnc3V0aWxfb3V0cHV0LnR4dDsgdGhlbgogICAgcHJpbnRmICclcyAlc1xuJyAiJChkYXRlIC11ICcrJVkvJW0vJWQgJUg6JU06JVMnKSIgUmV0cnlpbmdcIHdpdGhcIHVzZXJcIHByb2plY3QKICAgIHJtIC1mICRIT01FLy5jb25maWcvZ2Nsb3VkL2djZSAmJiBnc3V0aWwgLXUgcmp4bWljcm9iaW9tZSBjcCBnczovL2ZjLXNlY3VyZS04MDJiZjg4MC0xNmIxLTRhMTAtYWQ4OS1kYTk4Zjc5OTE5YjgvMDAyMDMzNTEtZGZmNC00M2FlLTk2NjctNDBlZDcwNTRlOGQzL3dvcmtmbG93TW90dXMvNjQ0YTIyNDMtYjhmMC00MWEzLTkxOWEtMTU2MTg3MzM3NWM5L2NhbGwtcWNRdWFsaXR5SHVtYW4vc2hhcmQtMTAvYXR0ZW1wdC0yL0JJLTE2LTAyNTcuYWRhcHRlclRyaW1tZWQuMV9rbmVhZGRhdGFfcGFpcmVkXzEuZmFzdHEuZ3ogL2Nyb213ZWxsX3Jvb3QvZmMtc2VjdXJlLTgwMmJmODgwLTE2YjEtNGExMC1hZDg5LWRhOThmNzk5MTliOC8wMDIwMzM1MS1kZmY0LTQzYWUtOTY2Ny00MGVkNzA1NGU4ZDMvd29ya2Zsb3dNb3R1cy82NDRhMjI0My1iOGYwLTQxYTMtOTE5YS0xNTYxODczMzc1YzkvY2FsbC1xY1F1YWxpdHlIdW1hbi9zaGFyZC0xMC9hdHRlbXB0LTIvQkktMTYtMDI1Ny5hZGFwdGVyVHJpbW1lZC4xX2tuZWFkZGF0YV9wYWlyZWRfMS5mYXN0cS5negogIGVsc2UKICAgIGV4aXQgIiRSQ19HU1VUSUwiCiAgZmkKZWxzZQogIGV4aXQgMApmaQogICkKICBSQz0kPwogIGlmIFsgIiRSQyIgPSAiMCIgXTsgdGhlbgogICAgYnJlYWsKICBmaQogIGlmIFsgJGkgLWx0IDMgXTsgdGhlbgogICAgcHJpbnRmICclcyAlc1xuJyAiJChkYXRlIC11ICcrJVkvJW0vJWQgJUg6JU06JVMnKSIgV2FpdGluZ1wgNVwgc2Vjb25kc1wgYW5kXCByZXRyeWluZwogICAgc2xlZXAgNQogIGZpCmRvbmUKZXhpdCAiJFJDIg==\"));' > /tmp/186edd92-1537-4b7d-9210-1ee73c34502b.sh && chmod u+x /tmp/186edd92-1537-4b7d-9210-1ee73c34502b.sh && sh /tmp/186edd92-1537-4b7d-9210-1ee73c34502b.sh"]: exit status 125 (standard error: "docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"failed to write 7071 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/f901fdd0393e9c46d91c97d5dab10713ff6ca3877c3f1c3417b88f1a88946e69/cgroup.procs: invalid argument\\\"\".\n")

Comments

5 comments

  • Comment author
    Sushma Chaluvadi

    Hi Damian,

    We will look into it and get back to you as soon as possible!

    0
  • Comment author
    dplichta

    Hi Sushma,

    Thank you. Another piece of info for this run. Two shards were running for way too long (10h vs 1h) and upon inspection I saw they didn't move beyond container initiation step - so 4 shards in total with weird behavior (two failed from above and two stuck). I aborted that run and after re-running everything is fine. Still interesting to know what glitch it was.

    Damian 

    0
  • Comment author
    Sushma Chaluvadi

    Great, thank you for the additional information! Would you also be able to confirm the date of this submission?

    0
  • Comment author
    dplichta
    rjxmicrobiome/Micropilot_v2_mOTUS_v2 May 22, 2019, 9:57 AM
    0
  • Comment author
    Sushma Chaluvadi

    Damian,

    The team has reported that PAPIv2 sometimes doesn’t correctly detect preemptions and so Cromwell, the execution engine, thinks a preempted task has failed when in fact it has not failed, rather it was preempted. The workaround we have for now is to add a `maxRetries` field to the `runtime` section while we speak with the Google team for a permanent fix to this issue.

     

    Thanks.

    0

Please sign in to leave a comment.