Live Status\Incident History\Experiencing video processing getting stuck randomly
Experiencing video processing getting stuck randomly
Outage
Opened: Dec 5, 2024, 5:17 AM UTC
Duration: 4hrs 35min 58sec
- OpenedDec 5, 2024, 5:17 AM UTC
We are currently investigating this issue.
- ResolvedDec 5, 2024, 9:52 AM UTC
There was an issue with our AWS Lambda crashing because of memory limits. We have increased the memory limit and have manually processed the affected videos.
- ResolvedDec 13, 2024, 2:54 PM UTC
What happened: - A 2-hour recording was uploaded, which had a size of 1.6 GB. - Our AWS Lambda was configured to have 3GB RAM and 10GB of ephemeral storage. - Because of the size and processing of the recording, AWS Lambda ran out of memory and crashed. - The crash caused a failure in generating a transcript for this particular recording. So why other recordings were not processed: - Contrary to our understanding, AWS lambda can use the same execution environment for subsequent invocations. - For processing the WEBM file stored on S3, we download it and store it in the /tmp directory of the Lambda environment. Temporary files generated because of the subsequent operations to this file are also stored in the same /tmp directory. We manually clear the /tmp directory after every execution. But the /tmp directory did not get cleared because of the crash. Because of that, a few of the subsequent invocations also crashed because storage crossed the 10GB limit. Action items related to the above issue: - We manually processed the files for which the transcoding had failed - https://github.com/bigbinary/neeto-record-web/blob/main/docs/ops/transcoding-files-manually.md - We have bumped up the RAM to 7 GB. Ephemeral storage is already set to a maximum of 10GB. Other action items identified: - https://github.com/bigbinary/neeto-record-web/issues/2658 - Transcript generation was taking a lot of time for longer videos. We identified the fix and made transcript generation faster. - We bumped up the AWS Lambda timeout from 5 to 15 mins. This will ensure that larger files won't timeout. - https://github.com/bigbinary/neeto-record-web/issues/2657 - We created a cron job that runs every hour to process unprocessed files. - https://github.com/bigbinary/neeto-record-web/issues/2655 - We updated the code in AWS Lambda to clear the /tmp directory at the start of the execution as well. - https://github.com/bigbinary/neeto-record-web/issues/2659 - We set up an alarm to report crashes in AWS Lambda.