temporary outages

Incident Report for TrustSource

Resolved

We confirm the issue having been resolved. No further unusual changes in cache file size could be observed, even while putting extraordinary load to the system. We are returning to normal operations.

Posted Sep 22, 2023 - 19:14 CEST

Monitoring

We found the reason. A cache file has been growing out of control due to a configuration mismatch. Under heavy load this mismatch leads to backlog which tears the internal DB logic apart increasing the the load on the cache file, increasing the gap. This self-enforcing mechanism lead to the observed behaviour. Restarting the server lead to the emptying of the cache-file and thus to the observation that disc space was varying.
We reconfigured the systems to ensure equal behaviour even under heavy load scenarios. We will keep monitoring the service for a while.

Posted Sep 22, 2023 - 16:39 CEST

Investigating

We are experiencing operational issues due to unreliable disc space. DB server instances are failing due to sudden storage outage. 100GB suddenly occupied, then freed again.
We are investigating the issue. Our service operator has been alarmed and is investigating.

Posted Sep 22, 2023 - 15:46 CEST

This incident affected: TrustSource Services (Core Service, API).