Preventing H12 Errors (Request Timeouts)
Last updated October 16, 2024
Table of Contents
H12 Request Timeout errors occur when an HTTP request takes longer than 30 seconds to complete. These errors are often caused by:
- Long-running requests, such as expensive queries or a slow external API call.
- Insufficient concurrency resulting in high request queue times during spikes in traffic.
This article contains steps to take to minimize the number of H12 errors in your application.
If your app exhibits a high volume of H12 errors, see Addressing H12 Errors (Request Timeouts) for immediate remediation steps.
If requests must run for longer than 30 seconds, Heroku supports features such as long-polling and streaming responses and WebSockets.
Add a Timeout to the Webserver
The Heroku router drops a long-running request after 30 seconds, but the dyno behind it continues processing the request until completion. Adding a timeout ensures that the dyno itself drops the long-running request, creating capacity for other requests.
Set a timeout in the app between 10-20 seconds, as low as possible without adversely affecting your users. Here are suggestions for specific languages:
- Node.js: Install the Node.js timeout module. It raises a Response timeout exception.
- PHP: Set the
max_execution_time
option inphp.ini
to force PHP to stop executing after a period of time. The default value is 30 seconds. - Python: When using Gunicorn, lower the timeout from its default of 30 seconds.
- Ruby: Use the rack-timeout gem. After crossing the timeout threshold, the gem raises a
Rack::TimeoutError
exception. See Request Timeout in Ruby (MRI) for more tips.
Debug Request Timeout Transactions
The Heroku router timeout value isn’t configurable. To minimize H12 errors, review and fix slow code and use background jobs.
Identify Slowness in the App Code and External Calls
While the Heroku Dashboard shows some metrics, it doesn’t provide the details required to diagnose which parts of requests take the longest to complete. Use an add-on like New Relic or another performance monitoring add-on to trace slow transactions.
Refer to the monitoring tool’s documentation to learn how to:
- Trace transactions in the 99th percentile and for max response times. Aim for response times under 500 ms.
- Identify the transactions that are both time-consuming and have high throughput. These transactions likely have the highest impact on the app’s overall performance.
- Identify the transactions with the slowest average response times. Even if there’s low throughput, slow transactions can still affect the overall performance. Slow transactions can hold locks in the database, for example, preventing other queries from running.
- Follow the trace and timings for these transactions. Some monitoring tools automatically capture sample transaction traces or offer the ability to designate critical transactions to monitor. Transaction traces identify how long code takes to perform a task, interact with the database, call external APIs, etc.
After you’ve identified which part of your slow transactions is causing the bottleneck, take steps to mitigate that slowness. For example:
- Move slow tasks to background jobs, such as uploading a file or highly computational actions.
- Move slow external calls to background jobs. If you can’t move an external call to the background, plan for the failure case. You can specify a timeout on HTTP requests in most languages.
- Address your slow queries. See more info on Identifying Slowness in the Database.
Investigate Node JS Responses
Node.js doesn’t implicitly handle requests by sending responses. Instead, developers must create the handling logic separately. As a result, a common oversight in Node.js applications is code paths with missing response logic.
- Add logs along every branch of the failing route, such as each
if
statement and function call. Include the x-request-id header for added logging filterability. - Generate a request to that route and search for that request in the logs. What is the last successful step?
- Find which path inside the application doesn’t generate a proper response.
With Express, use the chain-of-responsibility pattern instead of using a single function with many branches to minimize these logical errors.
For example, instead of:
app.*get*('/app/:id', doEverythingInOneBigFunctionWithAsyncBranches);
Use the chain-of-responsibility pattern:
app.get('/app/:id', checkUserAuth, findApp, renderView, sendJSON);
function checkUserAuth(req, res, next) {
if (req.session.user) return next();
return next(new NotAuthorizedError());
}
The above pattern adds the benefit of unified error handling through the error middleware stack.
Identify Slowness in the Database
View expensive queries within a database at data.heroku.com. Select the database from the list and navigate to its Diagnose tab. Search through the logs to find the exact parameters passed to these queries during the investigation.
The pg:outliers
command from the Heroku pg-extras
plug-in helps find slow queries. Run that command to find queries that have a high proportion of execution time. Adding missing indexes is often a quick fix for many slow queries. Review the Expensive Queries section for further instructions for identifying slow queries in Heroku Postgres and other tips for optimizing them.
Review the Monitoring Heroku Postgres section to identify and address potential database performance issues such as database locks and bloat.
Optimize Web Concurrency
Concurrency is the handling of numerous requests at the same time. Forking into multiple subprocesses or using multiple threads increases concurrency. In general, more subprocesses mean more memory use, while more threads mean more load on the dyno.
Gather Baseline Metrics
Determining the correct web concurrency and number and type of dynos is a non-trivial task that requires continual iteration. Make observations about dyno load and memory to understand dyno resource utilization levels and if they’re above or below our recommendations. Within the Heroku Dashboard, under the Metrics tab or, if connected, the application’s monitoring tool, note dyno load and memory usage over the last seven days. Compare these metrics to the suggested guidelines:
- See What is an acceptable amount of dyno load? Consider the recommendations for the application’s dyno type.
- Keep max memory usage under 85% of the memory quota. Review the memory limits for the application’s dyno type.
Optimize Resource Usage
If the application’s baseline metrics reveal high load issues, move CPU-heavy tasks into background jobs.
Review the memory baseline metrics. Total memory usage includes RAM and swap memory. If swap is in use, or R14 or R15 memory-related errors are logged, check for memory leakage or bloat. It is common to see some swap, ~50 MB on Common Runtime. Private Spaces dynos don’t swap memory but will instead restart when memory is exhausted. See R14 - Memory Quota Exceeded in Ruby (MRI) and Troubleshooting Node.js Memory Use for specific guidance for investigating memory usage in those languages.
Adjust and Test Concurrency Settings
Always adjust concurrency settings in a staging environment first to test them before applying them to production.
After optimizing slow transactions and resource usage, adjust concurrency settings. The ideal concurrency level blends desired response times while remaining within the recommended guidelines for dyno load and under the 85% memory quota. Adjust concurrency by changing the number of threads, processes, and dynos.
- Increase concurrency by raising the number of threads. However, more threads result in higher dyno loads.
- Increase concurrency by raising the number of processes. However, more processes consume more memory.
- If you can’t increase concurrency by adding more threads and processes due to dyno load and memory limits:
- Increase the number of running dynos.
- Upgrade the dyno type. A higher-capacity dyno offers additional resources. Increase the number of threads and processes from there.
- Decrease concurrency by doing the opposite of the previous bullet points.
While every application is unique, here is more info and suggested default concurrency settings for some languages:
- Ruby: Use the Puma web server, with multiple processes and threads. Each process or thread requires a different connection to the database. In Rails, ActiveRecord provides a connection pool that can hold several connections at a time. See Concurrency and Database Connections in Ruby with ActiveRecord for more info.
- Node: Node is single-threaded but can fork multiple processes to maximize their available resources. See Optimizing Node.js Application Concurrency for specific tips.
- Python: Gunicorn forks multiple system worker processes within each dyno. See Optimizing Python Application Concurrency for specific tips.
- PHP: PHP-FPM spawns multiple child processes.
Always test changes to concurrency settings in a staging environment before applying them to production. Check response times, dyno load, and memory usage during testing to help adjust concurrency. Test using different traffic patterns to determine how many dynos meet high, low, and average traffic demands. See the Load Testing article for further guidance.
Enable or Adjust Autoscaling
Autoscaling allows the number of web dynos to adjust automatically. Enabling autoscaling when response times reach a threshold helps prevent H12 errors since it increases web concurrency by adding more dynos. Choose the minimum and the maximum number of dynos to run based on previous load tests against different traffic patterns. Always test changes to autoscaling settings in a staging environment before applying them to production. See Autoscaling for more info.
Periodic Review
Schedule a quarterly or annual review to go through the steps in this article. Adjust resources appropriately during these reviews based on how code and traffic patterns change.