Server Monitoring15 min read

    How to fix the 500 Internal Server Error in Nginx

    Share

    A 500 Internal Server Error from Nginx is the HTTP status equivalent of "something went wrong and I am not going to tell you what" — the server caught an unhandled error somewhere in the request path and returned the most generic 5xx response it has. The frustrating part is that a 500 internal server error in Nginx is almost never Nginx's fault. The response usually comes from an application upstream (PHP-FPM, Node, Python, Go, Ruby) that crashed, returned malformed output, or never responded at all — and Nginx is just the messenger.

    Fixing a 500 fast comes down to telling the difference between an upstream-generated 500 and an Nginx-generated 500, then reading the right log to confirm which is which. This guide walks through both, in the order to rule them out, with the exact commands and the log lines that distinguish them.


    What 500 actually means (and how it differs from 502 / 503 / 504)

    The 5xx family looks similar but the causes are completely different — using the wrong fix for the wrong status burns hours.

    Status Meaning in practice Most common cause
    500 Internal Server Error The server caught an unhandled error while processing the request Upstream application threw an exception or returned malformed output
    501 Not Implemented The server does not support the HTTP method Almost never from Nginx in the wild — usually a custom app
    502 Bad Gateway Nginx tried to reach the upstream and got an invalid response (or none at all) Upstream is down or unreachable; socket/port wrong
    503 Service Unavailable The server is intentionally refusing requests right now Maintenance mode; rate limiting (limit_req); explicit return 503
    504 Gateway Timeout Nginx reached the upstream but it didn't reply in time Upstream is slow (long DB query, deadlock); proxy_read_timeout too low

    The mental model:

    • 500 — Nginx talked to the upstream and got a response that constituted an error.
    • 502 — Nginx couldn't talk to the upstream at all.
    • 504 — Nginx talked to the upstream but the upstream never finished replying.

    This article is specifically about 500. If you are seeing 502 or 504 instead, the diagnosis path is different — start with the upstream's process state (502) or its slow-query / deadlock behavior (504), not with Nginx config.


    Step 1 — Decide whether the 500 is from Nginx or from upstream

    This is the single most useful triage step. The answer changes which logs you read and which fixes you try.

    The error log tells you immediately

    sudo tail -f /var/log/nginx/error.log
    

    Reproduce the request and look at the error log line. The pattern names the source:

    Error log signature Source Likely cause
    FastCGI sent in stderr: "PHP message: PHP Fatal error: ..." PHP-FPM upstream App threw a fatal — see Step 2
    upstream prematurely closed connection while reading response header Upstream crashed mid-response Worker OOM, segfault — see Step 3
    upstream sent invalid header while reading response header Upstream returned malformed HTTP Bug in app's response code — see Step 3
    upstream sent too big header while reading response header Upstream sent oversized headers *_buffer_size too small — see Step 4
    rewrite or internal redirection cycle Pure Nginx Rewrite rule recursion — see Step 5
    worker_connections are not enough Pure Nginx Capacity — see Step 6
    open() "..." failed (24: Too many open files) Pure Nginx (or upstream) File descriptor limit — see Step 6
    SSL_do_handshake() failed against an upstream Pure Nginx Misconfigured proxy_ssl_*

    If the error log is empty when you reproduce, raise the level temporarily:

    error_log /var/log/nginx/error.log debug;
    

    Reload (sudo nginx -s reload), reproduce, revert. debug is verbose; don't leave it on.

    Confirm with the access log

    A 500 in the access log paired with a non-zero upstream_response_time proves Nginx reached the upstream — the upstream returned the 500 (or a malformed response Nginx translated to 500). Add this format if you don't already have it:

    log_format upstream '$remote_addr "$request" $status '
                        'urt=$upstream_response_time uct=$upstream_connect_time '
                        'us=$upstream_status ucs=$upstream_cache_status';
    access_log /var/log/nginx/access.log upstream;
    

    Look at the new fields:

    • $upstream_status = 500 — the upstream itself returned 500. Fix the app.
    • $upstream_status = - — Nginx never reached the upstream. The 500 is Nginx-generated (often actually a 502 in disguise — check the error log).
    • $upstream_response_time = 0.001 and $upstream_status = 500 — the app returned 500 instantly; almost always a startup/config error in the app.

    Step 2 — Upstream application errors (the 90% case)

    For most stacks, the 500 you are looking at is your application throwing an unhandled exception. Nginx's error log captures the upstream's stderr, so for PHP-FPM you can read PHP fatals directly:

    FastCGI sent in stderr: "PHP message: PHP Fatal error:
    Uncaught Error: Call to undefined function mysqli_connect()
    in /var/www/site/db.php:7" while reading response header from upstream
    

    That is your stack trace. The fix is in the application, not in Nginx.

    Where each stack writes its real logs

    Don't stop at Nginx's view of the error — open the application's own log too. It's almost always more detailed.

    Stack Default log location
    PHP-FPM /var/log/php<version>-fpm.log (FPM master) + per-pool php_admin_value[error_log] if set
    WordPress wp-content/debug.log (with WP_DEBUG_LOG = true in wp-config.php)
    Laravel storage/logs/laravel.log
    Symfony var/log/<env>.log
    Node.js (PM2) ~/.pm2/logs/<app>-error.log and <app>-out.log
    Node.js (systemd) journalctl -u <service>
    Python (gunicorn) wherever --error-logfile and --access-logfile point
    Python (uwsgi) logto directive in the uwsgi config
    Ruby (Puma + Rails) log/production.log plus Puma's own log
    Go journalctl -u <service> (typical)

    For PHP specifically, raise the visibility temporarily so 500s become traceable:

    ; /etc/php/8.x/fpm/php.ini
    log_errors = On
    error_log = /var/log/php_errors.log
    display_errors = Off       ; never On in production — leaks paths to clients
    

    Restart PHP-FPM (sudo systemctl restart php8.x-fpm) and reproduce.

    Common upstream causes of a 500

    1. Missing extension or dependencyCall to undefined function mysqli_connect() (mysqli not installed), Class "Redis" not found (php-redis missing), ImportError: No module named X (Python venv not active).
    2. Database connection failure — wrong credentials, DB host unreachable, max_connections hit. Check the DB's own log alongside the app log.
    3. Permission denied on a writable path — sessions, cache, uploads. The app needs to write to /var/www/site/storage or similar; check ownership matches the worker user.
    4. Out-of-memoryAllowed memory size of N bytes exhausted (PHP memory_limit); for Node/Python, the worker may be SIGKILL'd by the OOM killer (dmesg | grep -i 'killed process').
    5. A bug in code — the boring 90%. Read the stack trace; the fix is wherever it points.

    Step 3 — Upstream crashed or returned a malformed response

    A subset of 500s look like this in the error log:

    upstream prematurely closed connection while reading response header from upstream
    

    That means the upstream worker died during the response. Distinct from a 502 (where the upstream never accepted the connection) — the connection was made, then dropped mid-flight. Causes:

    • Upstream worker segfaulted. dmesg | grep -i segfault | tail and journalctl -u php8.x-fpm will show it. Often a bad PHP extension or a C extension version mismatch.
    • OOM killer. dmesg -T | grep -i 'killed process' | tail — kernel killed the worker because the host ran out of memory. Increase host memory, lower PHP-FPM pm.max_children, or set memory_limit lower so the worker bails politely instead of being killed.
    • Worker timeout. PHP-FPM kills its own children after request_terminate_timeout. The fix is to either find the slow code path or raise the timeout for that endpoint specifically, not globally.
    • Upstream wrote to a closed connection. The app finished and exited before flushing — usually a framework bug. Update the framework version.

    The other "malformed" variant:

    upstream sent invalid header while reading response header
    

    The application sent something Nginx can't parse as HTTP. Common cause: PHP header() called after output has already been sent (a stray echo or BOM at the start of a file). Fix: enable output_buffering = On in php.ini so headers can be set after some output, or hunt down the early output (var_dump left in code, UTF-8 BOM in an included file).


    Step 4 — Upstream response too large for buffers

    Specific error log line:

    upstream sent too big header while reading response header from upstream
    

    Nginx buffers the response headers and the start of the body in fixed-size buffers. When the upstream sends headers larger than the buffer (giant cookies, fat Set-Cookie chains, oversized JWTs), Nginx returns 502 — but 502 is sometimes mis-presented as 500 by intermediate proxies, or you may see a 500 if error_page 502 = 500; is in play.

    Increase the relevant buffers:

    # For proxied upstreams (proxy_pass):
    proxy_buffer_size       16k;
    proxy_buffers           8 16k;
    proxy_busy_buffers_size 32k;
    
    # For FastCGI / PHP-FPM:
    fastcgi_buffer_size     16k;
    fastcgi_buffers         8 16k;
    fastcgi_busy_buffers_size 32k;
    

    Reload (sudo nginx -s reload). If the headers are that large, also figure out why and trim them — fat headers cause problems further down the chain (CDN, browser).


    Step 5 — Nginx-internal: rewrite loops and bad config

    A pure-Nginx 500 — no upstream involvement — is rare but distinctive. The error log line names it:

    [error] ... rewrite or internal redirection cycle while internally redirecting to "/index.php"
    

    This is a try_files or rewrite rule pointing at itself. Classic broken WordPress config:

    # WRONG — falls through to itself
    location / {
        try_files $uri $uri/ /index.php?$args;
    }
    
    location ~ \.php$ {
        try_files $uri =404;
        fastcgi_pass unix:/run/php/php8.x-fpm.sock;
        # ... fastcgi_params include
    }
    

    If the FastCGI block somehow fails to handle /index.php (wrong socket, missing fastcgi_params), Nginx falls back to its own location / which redirects to /index.php?$args, which loops.

    Fix: make sure the PHP location block actually matches and dispatches. The error log will name the file Nginx is looping on; trace why that location block didn't catch it.

    Other Nginx-internal 500 causes:

    • server_name-less default server returning 500default_server block with no body.
    • Bad error_page recursionerror_page 500 = /500.html; where /500.html itself errors. Use error_page 500 = @fallback; with a tiny named location returning a static body.
    • if directive misuse — Nginx's if is famously sharp-edged; nesting it or using it in location blocks can produce 500s in edge cases. The general rule is "if is evil" — restructure with map or try_files instead.

    Always run sudo nginx -t after a config change. If it passes but you still get 500, run sudo nginx -T | less to see the fully merged config — the answer is usually a duplicate or shadowed directive somewhere.


    Step 6 — Resource exhaustion

    Under load, a 500 can mean Nginx itself ran out of room. Two common triggers:

    worker_connections are not enough

    [alert] ... 4096 worker_connections are not enough
    

    Each Nginx worker has a connection budget shared by client connections and upstream connections. Raise it:

    events {
        worker_connections 10240;   # was 4096
    }
    

    Plus the matching system limit:

    # As root or via /etc/security/limits.d/
    ulimit -n 65535
    

    Or persistently in the systemd unit drop-in:

    # /etc/systemd/system/nginx.service.d/limits.conf
    [Service]
    LimitNOFILE=65535
    

    sudo systemctl daemon-reload && sudo systemctl restart nginx.

    Too many open files

    [crit] ... open() "/var/www/..." failed (24: Too many open files)
    

    The Nginx process has hit its file-descriptor ceiling. Same fix — raise LimitNOFILE for the systemd unit and worker_rlimit_nofile:

    worker_rlimit_nofile 65535;
    

    If only a few endpoints exhaust descriptors, the actual root cause is usually a runaway connection or a leak in the upstream — fix that, don't just keep raising the limit.


    Step 7 — Cache, temp, and write directories

    Nginx writes to several directories at runtime: client_body_temp_path, proxy_temp_path, fastcgi_temp_path, proxy_cache_path, etc. If any of those is missing or unwritable by the worker user, you can get 500s on requests that hit them:

    [crit] ... open() "/var/lib/nginx/tmp/proxy/..." failed (13: Permission denied)
    

    Fix:

    sudo chown -R www-data:www-data /var/lib/nginx
    sudo chmod -R u+rwX,g+rX /var/lib/nginx
    

    Or, on RHEL/CentOS/Rocky/Alma where Nginx runs as nginx:

    sudo chown -R nginx:nginx /var/lib/nginx
    

    If proxy_cache_path was added or moved recently, make sure the directory exists and SELinux context is correct:

    sudo mkdir -p /var/cache/nginx/proxy
    sudo chown nginx:nginx /var/cache/nginx/proxy
    sudo restorecon -Rv /var/cache/nginx        # SELinux only
    

    Operational tips

    • Add the upstream fields to your access log permanently. $upstream_status, $upstream_response_time, and $upstream_connect_time are the difference between "the app is slow" and "the app is broken" being a five-second decision instead of a five-minute one.
    • Surface a request ID. Add $request_id to the log format and to a custom error page (error_page 500 /500.html; with $request_id rendered into it). When a user reports "I got a 500", you can find the exact log line in seconds.
    • Don't error_page 500 = 200 to make 500s look like success. Surprisingly common in panic. It hides outages from monitoring and breaks API consumers that branch on status.
    • display_errors = Off in PHP, always. Stack traces in HTTP responses leak file paths, library versions, and sometimes credentials. Log them server-side; show a generic 500 page to the client.
    • Set proxy_intercept_errors on; deliberately. It lets Nginx replace upstream 5xx with its own error_page content — useful for branding, dangerous because it can mask the upstream's own response (including useful headers).
    • A 500 immediately after a deploy is almost always config or dependency. Compare to the previous deploy: missing extension, changed env var, wrong file owner, fresh permission denied on a cache directory. The diff is small; don't go on a wild hunt.

    Catch 500s before users do

    The painful failure mode for a 500 is not "I saw it during testing" — it's "an entire endpoint has been returning 500 for forty minutes and the first report came from a customer". HTTP monitoring exists to close that gap.

    Wire status checks into your monitoring on the URLs that matter — not just /, which usually keeps serving cached HTML even when half the API is broken:

    • For each high-value path (/, /login, /api/health, key product pages), check that the status is 200 and (for HTML) that an expected keyword is present in the body. This catches a "soft 500" where the server returns 200 with an error page, which is more common than people think.
    • For API endpoints, parse the JSON and assert on a field, not just on the status code. A 200 with {"error": "internal"} is functionally a 500.
    • Alert on the first failure from one location and the first failure from any two locations simultaneously — different thresholds for different failure shapes.

    Xitoring's website monitoring runs HTTP/HTTPS checks from multiple regions, supports keyword matching and JSON assertions, and pages on the first failure. Pair it with the Nginx integration and server monitoring on the same host so a 500 spike is visible and you can see immediately whether the host's CPU, memory, or PHP-FPM process count moved at the same moment. That correlation usually answers "is this the app, the web server, or the host?" before anyone has to SSH in.

    If you also operate the Nginx process itself, the 403 article in this series covers the permissions side of 5xx-adjacent failures: How to fix the 403 Forbidden error in Nginx.


    Summary

    For a 500 Internal Server Error in Nginx, work through this order:

    1. Read /var/log/nginx/error.log. The first line tells you whether the 500 is upstream or Nginx-internal.
    2. Decide upstream vs internal. Add $upstream_status + $upstream_response_time to the access log if not already there. Non-zero upstream response time + $upstream_status = 500 ⇒ fix the app, not Nginx.
    3. Read the app's own log. PHP-FPM, WordPress, Laravel, gunicorn, PM2 — they all have their own log files with the actual stack trace. Nginx only echoes a fragment.
    4. Check for upstream crashes. dmesg | grep -iE 'segfault|killed process' and journalctl -u <upstream-service> for OOM kills, segfaults, and request timeouts.
    5. Check buffers if the error mentions "too big header". Increase proxy_buffer_size / fastcgi_buffer_size.
    6. For Nginx-internal 500s, look for rewrite loops, bad error_page, or if blocks doing too much. sudo nginx -T to see the merged config.
    7. Check resource limits if the error mentions worker_connections or Too many open files — raise worker_connections, worker_rlimit_nofile, and the systemd LimitNOFILE.
    8. Verify cache/temp/write directories are writable by the worker user; restore SELinux context if recently moved.

    A 500 is almost always a symptom of an unhandled error one layer below Nginx. The discipline that pays off is reading the right log first instead of editing nginx.conf and hoping. With upstream fields in the access log and the application's own log open in another window, most 500s become a five-minute fix instead of a forty-minute hunt — and with continuous HTTP monitoring on the URLs that matter, the next outage is caught before the support inbox notices.