System Performance is down

Incident Report for TrustSource

Postmortem

An essential, required database upgrade lead to performance issues with our legal check service with some scans. This did not impact the majority but some large and very large scans. Due to functional limitations of the older DB version and the forward strategy, we decided not to rollback to the prior settings but selected the way forward.

Due to the limited number of affected solutions, we resolved the issue by providing a new caching strategy in combination with a long term planned update on our legal check service. This new implementation has been tested intensely and has been released today. Thus, the new solution will not only resolve the experienced performance issues for large and very large scans (e.g. several thousands of components) but lay the foundation for new features and an extended license base.

Posted Sep 26, 2024 - 17:05 CEST

Resolved

We have provided an upgraded implementation allowing to overcome the structurally caused performance issues for large scans.
Posted Sep 26, 2024 - 16:41 CEST

Update

We were able to remove the limitations of LegalCheck, so that it can be used. But there still may be conditions, under which the results will not be available in time. If you are facing such a situation, please just re-process the scan/re-analyse.
We are working on a short term fix to reduce the impact of this issue.
Posted Jul 08, 2024 - 15:42 CEST

Update

We are continuing to monitor for any further issues.
Posted Jul 08, 2024 - 12:28 CEST

Monitoring

During repair we ran into issues with updating our infrastructure due to limitations of our provider. The changes require to re-deploy some stacks containing NICs. The replacement of NICs took hours instead of seconds which lead to unforeseen race-conditions. The resulting complications are understood and will be addressed.
As a result the legalCheck service will be affected for a while and analysis performance will be limited.
Posted Jul 08, 2024 - 11:30 CEST

Update

We are continuing to investigate this issue.
Posted Jul 08, 2024 - 11:04 CEST

Update

We were able to identify the causes and rebuilt the lost indexes. Performance should have returned to normal since 0913 CET. We keep monitoring behaviour and will conduct a root cause analysis.
However, our legalcheck service still has issues due to problems in re-creating the index. Thus very complex requests may not receive answers fast enough. We recommend to reprocess scans uploaded between 240707-2325 and now later, when legalcheck service will be fully restored.
Posted Jul 08, 2024 - 09:54 CEST

Update

We are continuing to investigate this issue.
Posted Jul 08, 2024 - 01:58 CEST

Investigating

Our monitoring recognised a significant performance decrease in all DB-related services. We are investigating the issue together with our service provider
Posted Jul 08, 2024 - 01:58 CEST
This incident affected: TrustSource Services (Core Service, LegalCheck Service, API).