Service Incident - April 27, 2026 (Terra Data Repository)

Jason Cerrato
  • Updated

Summary

Terra Data Repository (TDR) is experiencing degraded performance and instability. This issue began approximately last Thursday, April 23, 2026, and has continued through the weekend and into today. The issue appears to be driven by an extremely high volume of DRS resolution requests, which has overwhelmed TDR’s capacity. See the Timeline section for troubleshooting and resolution updates, and the Impact section to understand how this affected use of the system.

Timeline

April 29, 2026 11:49 AM ET - Fix Released, Continuing to Monitor

We have released a fix that successfully resolves DRS URLs with NIH RAS Passports. We are asking users to test workflows using this data at a small scale as we monitor the system. 

All other workflow users should see improved performance in the workflow queue. 

April 28, 2026 11:24 AM ET - Issue identified; Workflow Failures Continue

Terra is currently experiencing issues accessing NIH data via DRS URL requests to the NIH Research Auth Service Passport. Due to shared workflow infrastructure, excessive DRS failures are impacting the workflow queue for all Terra users. 

Please refrain from submitting workflows that process NIH data via DRS until further notice.

April 27, 2026 12:54 PM ET – Mitigation Deployed; Investigation Ongoing
Engineers have released a mitigation. Early signs are encouraging, but the team continues to monitor. A potential long-term solution is actively being worked on.

April 27, 2026 11:02 AM ET – Issue Identified
Engineers reported that TDR had been struggling over the weekend. The current working theory is that the instability is driven by high load—specifically, a large number of DRS resolutions. Initial review suggests the issue may have begun as early as Thursday, April 23, based on monitoring data.

Impact

During this incident, users may have experienced the following:

  • TDR actions (such as data access and DRS-based file resolution) taking longer than usual or failing entirely
  • 503 errors when attempting to interact with TDR
  • Downstream workflows dependent on TDR stalling or failing
  • Failures accessing AnVIL data via DRS resolution

This may also be impacting workflows as a side effect. See Service Incident - April 25, 2026 (Workflows) for more details on workflows.

For more information

Please follow this article to get the most up-to-date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page.

 


 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.