Way of knowing output is the latest version/run in data model
Sometimes we rerun workflows multiple times as the version of a pipeline matures. In this case some of the columns in the data model will be filled in from previous runs, but when we run the latest version it will overwrite those columns for the rows where the workflow succeeded. Since some runs inevitably fail it is no longer clear which elements in the data model use the up to date pipeline or not. This request is actually coming from some of our users who have since written scripts to scrape the time stamp on files to check if they've been run with the most up to date pipeline or not. Another option would be to add a task to the WDL that takes all other tasks as input and records a time stamp to add as another column of "output" to the data model. But, I think it would be better as a built in feature to Terra rather than having to write the same boilerplate "timestamp" task into all of our WDLs.
Comments
0 comments
Please sign in to leave a comment.