Terra hybrid workshop test 3/22/2022 - 3/23/2022 and 4/5/2022 - 4/6/2022
Hello! Please post any questions you have on 3/22, 3/23, 4/5, or 4/6 during our Terra workshop testing days. We will have teaching assistants watching this forum post on those days.
Comments
41 comments
The wording in this question is a little confusing.
"What is the name of the data table in the Tables section of the Data Tables Quickstart workspace data page?"
I think using "..workspace's Data tab" or something similar might be more clear!
Sushma Chaluvadi - Thanks for that! We struggle with the right wording...
In Course 2, I noticed that there isn't any learning content in section #1 ("Costs and Billing"), just Quiz #1. I'm assuming the content in Quiz #1 will be covered in section #2 ("Securely Sharing Resources"), just a formatting note for next time :)
Ah, thanks Leyla Tarhan! That's definitely a formatting/content error we can fix.
Can one of the facilitators please join me in the Google meet for a quick question?
Just double-checking: when we get to the section where we clone the Data Tables Quickstart Workspace, should we clone it to the 'workshop-test' billing project? Thanks!
Yes, Leyla Tarhan! That's right.
Leyla Tarhan yes that's correct
Quick note that one of the quiz questions for Quiz #4 on the Introduction to Workflows course was a bit confusing to me: the question was "When a workflow finishes successfully, your Job History page will:" and the correct answer was "Have a green arrow in the submission row" (I think it's actually a green check mark, not an arrow)
Here's how long it took me to finish course 1 and 2:
Leyla Tarhan and Michael de la Maza thank you both for the useful feedback!
https://leanpub.com/courses/terra/intrototerra/quizzes/quiz2
Quiz 2, Question 4: What is the name of the table I should use for storing paths to Docker containers that I will reference across multiple workflows?
May be it is just me, but I don't find where this content was covered in the study unit.
Maria, thank you for the feedback! We will work to update the quiz to make this question/answer more obvious. The answer is the Workspace Data table, and it was mentioned very briefly (too briefly) in the video overview.
https://leanpub.com/courses/terra/datatables/quizzes/quiz1
Quiz 1, Question: What is the Google bucket URL for the r1_FASTQ file of the neurons2k_lane1 specimen listed in the specimen table? HINT: It starts with the “gs://” prefix.
I went to GCP and checked that bucket URL ending with "neuron6k_mouse_chr19_genic" - i.e. bucket URL is gs://hca-dcp-sc-pipelines-test-data/smallDatasets/chemistry_X10_V2/neuron6k_mouse_chr19_genic
and got this answer as incorrect. It is hard to see what is correct answer as the answer field is not scrollable, but it looks like an URL in it points to an actual object in the bucket.
I'm confused on "Your first workflows on Terra". The hands on exercise has you running a preconfigured workflow (I assume using the human/mouse set from the data tables section) but it then links to Workflows Quickstart Part 1 - Run pre-configured workflow – Terra Support which lists different preconfigured workflows than what I see in my workflows section. I dont have part1_cram_to_bam_workflow, I have 1-single-input-workflow and 2-sets-as-input-workflow, so the instructions don't seem to match up.
Maria Yazykova thanks for pointing this out, we'll try to reword the question so it's clearer where to find the answer. The answer is actually found in the dialogue box when you click on the link of the file in the data table:
Anton Kovalsky I literally clicked the copy button next to that textbox, deleted the terminal commands, and still got it wrong 🤷♀️
Adina Shanholtz Oh I get it! You're not running the workflow in the Data Tables Quickstart for the workflows hands-on. It should be from the Workflows Quickstart. I see that we need to make it clearer what workspace you're expected to be working in.
Just a suggestion for the section here:
https://support.terra.bio/hc/en-us/articles/4417345161627#h_01F2HX4AWZVKQ5C1WKRW1JW7N0
" Add additional data (rows) to a table "
The current documentation indicates deleting the entire data table from Terra followed by uploading a modified version. This seems a bit extreme given that most use cases will probably involve adding a few new rows to the existing data. I'd suggest instructing users to just add the data table containing just the few new rows needed, which will get added to the currently installed data table on Terra. Then, later on in the documentation indicating the delete/replace option as a more extreme alternative.
I'm curious about the reason why 'C' below might not be a requirement? Can you actually run a workflow inTerra if you're not on a billing project, or am I misreading the statement somehow?
There's a potential issue with the workflow section:
https://leanpub.com/courses/terra/pipelining/read/2#leanpub-auto-what-is-a-workflow
in that it directs to:
https://support.terra.bio/hc/en-us/articles/360034701991-Pipelining-with-workflows
which has at the bottom instructions to clone a workspace and run through parts 1-3 of the workflows quickstart exercises.
But, then in the next section on leanpub: https://leanpub.com/courses/terra/pipelining/read/4#leanpub-auto-your-first-workflows-on-terra
it states "This workspace tutorial has 3 parts. For this exercise, you’ll work through Part 1 only."
so might want to indicate in the leanpub first part to not do the exercises yet as indicated in the terra docs.
For this question:
can you please explain where the acceptable this.cram_file is derived from?
The sample set execution workflow
https://support.terra.bio/hc/en-us/articles/360053601712-Workflows-Quickstart-Part-3-Run-back-to-back-analysis-pipeline-
might be a bit confusing in that it's taking an aligned bam file and generating multiple unaligned bam files (uBAMs) instead of a single uBAM. It indicates this is because of how the pipeline has sharded the data, which is technically true, but the real reason is that there are multiple read groups in the aligned bam and separate uBAMs are being generated for each read group. This is because of the option specified here:
https://dockstore.org/api/ga4gh/trs/v2/tools/%23workflow%2Fgithub.com%2Fgatk-workflows%2Fseq-format-conversion%2FBAM-to-Unmapped-BAM/versions/3.0.0/plain-WDL/descriptor//bam-to-unmapped-bams.wdl
under the RevertSam task of the workflow as
Using a simpler example (uBAM conversion workflow) might suffice to generate a single uBAM from a single aligned BAM and/or indicating that the workflow is generating separate uBAMs per read group (but the latter requires more complexity about understanding aspects of the input data).
The 'Pipelining with workflows on Terra' section would benefit from including under 'Next steps' a link to a leanpub or tutorial on developing new workflows using WDL and Docker, or more simply comment on what's needed to explore that aspect further - beyond what's available in Docker Store or the Broad tool library.
Brian Haas Thanks for all this great feedback! I will take note and update the relevant docs/courses. Would it be OK if I followed up with you about the Workflows Quickstart workflow? I am not as savvy on the actual science, so might not have used the best examples. Especially if it trips up people who actually do know the science!!
sure thing, Allie Cliffe
any time
I keep getting this error on the BAM to unmapped BAM workflow. What am I missing?
Ron Paulsen I don't see the error... Can you copy paste it into this comment?
If your workflow failed right away (before it even got into the queue), it will be something with the input - a typo in the name or something.
I posted a screen shot of the error. It is visible in my view... I will try again.
Please sign in to leave a comment.