Post-Mortem: Authentication Service Disruption
Date: January 29, 2026 Duration: 53 minutes (18:15 – 19:08 CET)
Executive Summary
On Thursday, Jan 29, Trengo experienced a service disruption that prevented users from logging into the platform. The issue was traced to a failure in our internal authentication token renewal process. A fix was deployed at 18:45, and full service was restored by 19:08.
What Happened? The disruption began at 18:15 when our system-level bearer token, used to communicate with our authentication provider (Stytch), expired. While our automated cron job had successfully requested a new token, a caching error caused the system to store the new token in an incorrect location within our backend cache. As a result, even though the authentication provider was operational, Trengo’s backend continued attempting to use the expired token, leading to failed login attempts for all users.
Timeline of Events (CET) 18:15: Initial reports of user logout and login failures. 18:21: Triage initiated. Verified that external providers and recent deployments were stable. 18:35: Root cause identified: The 60-day system bearer token was refreshed but misdirected in the cache. 18:45: Fix deployed to refresh the token using a more robust pathing logic. 19:08: System-wide recovery confirmed; all users able to log in.
Corrective Actions To prevent a recurrence, we are implementing the following: Cache Validation: Added automated verification to ensure refreshed tokens are stored and retrievable from the correct keys.