Live Status\Incident History\Healthcheck Assertion Failed
Healthcheck Assertion Failed / NeetoCal Application
Outage
Opened: Apr 1, 2025, 9:02 AM UTC
Duration: 42min 25sec
- OpenedApr 1, 2025, 9:02 AM UTC
Large number of requests are timing out. Placed the app in maintenance mode.
- InvestigatingApr 1, 2025, 9:10 AM UTC
Slow PostgreSQL responses noticed. We had deployed a patch to the NeetoCal PostgreSQL addon earlier in the day aimed at fixing memory leaks during database backups. This fix may be the cause for the slowdown. Rolling it back.
- IdentifiedApr 1, 2025, 9:25 AM UTC
Despite the rollback, PG performance remained poor. Further investigation revealed that disk usage had surged, particularly in the PGDATA/pg_wal directory, where WAL files accumulate. These files are normally archived after a checkpoint, but archiving was not occurring, leading to disk bloat. Disk I/O metrics confirmed that we were consistently hitting the maximum throughput allowed by our GP2 disk type. This bottleneck explained both the PostgreSQL slowness and the WAL backlog.
- ResolvedApr 1, 2025, 9:44 AM UTC
NeetoCal’s PostgreSQL addon experienced a performance degradation due to exceeding disk I/O limits. The root cause was linked to unusually heavy auto-vacuum activity and WAL (Write-Ahead Logging) file buildup. The issue was resolved temporarily, and several follow-up actions are in place to prevent recurrence. Detailed Report: https://gist.github.com/unnitallman/6ce1789bee1e896eef4471d0d13f2486