# Production Error Debugging Guide

> **CRITICAL: Never run Symfony console commands as `root` on production!**
>
> Running `sudo php bin/console cache:clear --env=prod` creates cache files owned by `root`.
> Apache runs as `www-data` and **cannot write** to those files, causing every single request
> to fail with *"Unable to write in the cache directory"*. This instantly brings the site down
> (load average spiked to 60+ on a 2-core VM with 72 stuck Apache workers).
>
> **ALSO: Never use `cache:clear` — use `cache:warmup` instead!**
>
> `cache:clear` deletes `var/cache/prod/` before rebuilding. During that gap, Doctrine proxy
> class hashes change, which invalidates all serialized security tokens in user sessions →
> **every logged-in user gets logged out**. `cache:warmup` writes new cache files alongside
> the old ones, so sessions stay valid.
>
> **Always use:**
> ```bash
> sudo -u www-data php bin/console cache:warmup --env=prod
> ```
>
> **Never use:**
> ```bash
> # sudo -u www-data php bin/console cache:clear --env=prod   ← CAUSES MASS LOGOUT
> # sudo php bin/console cache:clear --env=prod               ← CAUSES MASS LOGOUT + PERMISSION BREAK
> ```
>
> **If you already broke permissions**, fix with:
> ```bash
> sudo chown -R www-data:www-data /var/www/html/academia_v2/var/cache /var/www/html/academia_v2/var/log
> sudo chmod -R 775 /var/www/html/academia_v2/var/cache /var/www/html/academia_v2/var/log
> sudo systemctl restart apache2
> ```

---

## CRITICAL: Never use `doctrine:schema:update --force` on Production

> **This command DELETES tables that don't have corresponding Doctrine entities!**
>
> We discovered this on 2026-03-08 when the deploy workflow had a fallback that ran
> `doctrine:schema:update --force` if migrations failed. It deleted `playground_tier_config`
> and `app_setting` tables because they were created via raw SQL migrations without
> backing Doctrine entities.
>
> **What happened:**
> 1. Migration failed (connectivity or other transient issue)
> 2. Deploy workflow ran `doctrine:schema:update --force` as fallback
> 3. Command compared DB schema to entity mappings
> 4. Tables without entities → **DROPPED**
>
> **Tables affected:**
> - `playground_tier_config` — tier pricing configuration
> - `app_setting` — key-value feature settings
>
> **Fix applied:**
> - Removed the destructive fallback from `.github/workflows/deploy-prod.yml`
> - If migrations fail, deploy now logs a warning but does NOT run schema:update
>
> **Never use:**
> ```bash
> # php bin/console doctrine:schema:update --force --env=prod   ← DESTROYS UNMAPPED TABLES
> ```
>
> **Always use migrations:**
> ```bash
> sudo -u www-data php bin/console doctrine:migrations:migrate --no-interaction --env=prod
> ```

---

## Server Access

### SSH Connection (from Windows PowerShell)

```powershell
cd C:\Users\shadisaleh\Documents\linux
ssh -i shamramain_user.pem azureuser@20.241.4.71
```

### Server Details

| Resource       | Host / Address                  | Notes                          |
|----------------|---------------------------------|--------------------------------|
| **App Server** | `20.241.4.71`                   | Azure VM, SSH via PEM key      |
| **Elasticsearch** | `20.106.250.185:9200`        | Hostname: `shamraindex`, HTTPS + Basic Auth |
| **MySQL**      | `20.236.64.82:3306`             | Database: `academia_v2_prod2`  |
| **App Path**   | `/var/www/html/academia_v2`     | Symfony project root           |
| **Domain**     | `shamra-academia.com`           |                                |
| **Flask App**  | `127.0.0.1:5000`               | Gunicorn/Flask, systemd: `flaskapp.service` (currently disabled) |

### Flask App (academia_scripts)

- **Path**: `/home/azureuser/aous/academia_scripts/FlaskWebApp/`
- **Service**: `flaskapp.service` — currently **stopped & disabled**
- **Manage**:
  ```bash
  sudo systemctl start flaskapp.service    # start
  sudo systemctl stop flaskapp.service     # stop
  sudo systemctl enable flaskapp.service   # auto-start on boot
  sudo systemctl disable flaskapp.service  # prevent auto-start
  ```

---

## Log Files

### Location

```
/var/www/html/academia_v2/var/log/prod.log
```

### Quick Log Overview

```bash
# Count entries by severity
grep -c "\.CRITICAL:" /var/www/html/academia_v2/var/log/prod.log
grep -c "\.ERROR:" /var/www/html/academia_v2/var/log/prod.log
grep -c "\.WARNING:" /var/www/html/academia_v2/var/log/prod.log
```

### Get Unique Error Types with Counts (Most Useful)

```bash
# Top CRITICAL errors by unique message
grep "\.CRITICAL:" /var/www/html/academia_v2/var/log/prod.log \
  | sed 's/\[.*\] //' \
  | cut -d'{' -f1 \
  | sort | uniq -c | sort -rn | head -20

# Top ERROR entries by unique message
grep "\.ERROR:" /var/www/html/academia_v2/var/log/prod.log \
  | sed 's/\[.*\] //' \
  | cut -d'{' -f1 \
  | sort | uniq -c | sort -rn | head -20
```

### Filter by Date Range

```bash
# Errors from a specific date
grep "2026-02-19" /var/www/html/academia_v2/var/log/prod.log | grep "\.ERROR:" | tail -20

# Errors in the last hour (approximate)
grep "$(date -u +%Y-%m-%dT%H)" /var/www/html/academia_v2/var/log/prod.log | grep -c "\.ERROR:"
```

### Filter by Specific Error

```bash
# Search for a specific error message
grep "Malformed UTF-8" /var/www/html/academia_v2/var/log/prod.log | wc -l

# Get full context of an error (with stack trace)
grep -A 5 "CRITICAL" /var/www/html/academia_v2/var/log/prod.log | head -50
```

### Tail Live Errors

```bash
sudo tail -f /var/www/html/academia_v2/var/log/prod.log | grep --line-buffered "CRITICAL\|ERROR"
```

---

## Log Format Reference

Symfony Monolog writes entries in this format:
```
[TIMESTAMP] CHANNEL.LEVEL: MESSAGE {CONTEXT} {EXTRA}
```

### Channels

| Channel        | Source                                    |
|----------------|-------------------------------------------|
| `app.ERROR`    | Application code (`$logger->error(...)`)  |
| `app.CRITICAL` | Application critical errors               |
| `request.ERROR`| Symfony HTTP kernel (404s, routing, etc.)  |
| `doctrine.WARNING` | Doctrine ORM warnings                |
| `security.INFO`| Authentication / authorization events     |

### Key Insight: Errors Without Stack Traces

When you see errors like:
```
[2026-02-19T01:54:39] app.ERROR: Malformed UTF-8 characters, possibly incorrectly encoded [] []
```
The empty `[] []` at the end means **no context and no stack trace was logged**. This happens when code catches an exception and logs only the message:
```php
catch (\Exception $e) {
    $logger->error($e->getMessage());  // No stack trace!
}
```

**To find the source**: Search the codebase for `$logger->error($e->getMessage())` or similar catch-and-log patterns. The error message itself is the clue.

---

## Common Error Categories & How to Fix

### 1. Dependency Injection / Service Not Found (CRITICAL)

**Pattern**: `Cannot autowire ... argument "$service"` or `You have requested a non-existent service`

**Cause**: Controller or service uses `$this->get()` or `$this->container->get()` (deprecated in Symfony 7.x).

**Fix**: Inject services via constructor or method parameters:
```php
// Before (broken)
$service = $this->container->get(MyService::class);

// After (correct)
public function myAction(MyService $service): Response
```

### 2. Entity Not Resolved from Route Parameter (CRITICAL)

**Pattern**: `Could not resolve argument ... no value was provided`

**Cause**: Doctrine can't automatically map a route `{slug}` parameter to an entity when the property name doesn't match.

**Fix**: Add `#[MapEntity]` attribute:
```php
use Symfony\Bridge\Doctrine\Attribute\MapEntity;

public function show(#[MapEntity(mapping: ['slug' => 'slug'])] University $publisher): Response
```

### 3. setParameters Expects ArrayCollection (CRITICAL)

**Pattern**: `setParameters() ... must be ... ArrayCollection, array given`

**Cause**: Doctrine `setParameters()` changed to require `ArrayCollection` instead of plain arrays.

**Fix**:
```php
use Doctrine\Common\Collections\ArrayCollection;

// Before
->setParameters(['param' => $value])

// After
->setParameters(new ArrayCollection(['param' => $value]))
```

### 4. Validator Constraint Syntax (CRITICAL)

**Pattern**: `Since symfony/validator 7.1: Passing an array of options ... is deprecated`

**Cause**: Symfony 7.x validators require named arguments instead of array options.

**Fix**:
```php
// Before
new NotBlank(['message' => 'Please enter...'])

// After  
new NotBlank(message: 'Please enter...')
```

### 5. Null Entity Access (CRITICAL)

**Pattern**: `Call to a member function getXxx() on null`

**Cause**: Database query returned null but code assumes entity exists.

**Fix**: Add null guard:
```php
$entity = $repository->find($id);
if (!$entity) {
    throw $this->createNotFoundException();
}
```

### 6. Malformed UTF-8 Characters (ERROR)

**Pattern**: `Malformed UTF-8 characters, possibly incorrectly encoded`

**Cause**: Database connection not using utf8mb4 charset, or data stored with mixed encodings.

**Fix** (multi-layered):
1. **DATABASE_URL** in `.env.local`: add `&charset=utf8mb4` to connection string
2. **doctrine.yaml**: set `charset: utf8mb4` and `default_table_options`
3. **Code-level**: use `mb_convert_encoding($string, 'UTF-8', 'UTF-8')` to sanitize input
4. **JSON encoding**: use `JSON_INVALID_UTF8_SUBSTITUTE` flag

### 7. Too Many MySQL Connections (CRITICAL)

**Pattern**: `SQLSTATE[HY000] [1040] Too many connections`

**Cause**: MySQL `max_connections` limit exceeded.

**Fix** (infrastructure):
```sql
-- Check current limit
SHOW VARIABLES LIKE 'max_connections';
-- Increase if needed
SET GLOBAL max_connections = 200;
```
Also check for connection leaks / missing `doctrine:close-connection` in long-running processes.

### 8. Route Not Found / 404s (ERROR)

**Pattern**: `No route found for "GET /wp-login.php"`

**Cause**: Bot/scanner traffic hitting non-existent WordPress endpoints. Not a real bug.

**Fix**: Block at nginx level:
```nginx
location ~* ^/(wp-login|wp-admin|wp-includes|xmlrpc) {
    return 444;
}
```

---

## Post-Fix Operations

### Warm Cache (safe — no logouts)

```bash
# On the prod server — ALWAYS use cache:warmup, NEVER cache:clear
cd /var/www/html/academia_v2
sudo -u www-data php bin/console cache:warmup --env=prod
```

> **WARNING**: Do NOT use `cache:clear` on production. It deletes the cache directory,
> which changes Doctrine proxy class hashes and invalidates all user sessions → mass logout.
> The deploy workflow (`deploy-prod.yml`) already uses `cache:warmup` for this reason.
> There is no need to run any manual cache command after a deploy.

### Fix Cache Permissions (if needed)

```bash
sudo chown -R www-data:www-data /var/www/html/academia_v2/var/cache
sudo chmod -R 775 /var/www/html/academia_v2/var/cache
```

### Verify Fix is Working

```bash
# Truncate the log to start fresh monitoring
sudo truncate -s 0 /var/www/html/academia_v2/var/log/prod.log

# Wait some time, then check for the specific error again
sleep 300
grep -c "YOUR_ERROR_MESSAGE" /var/www/html/academia_v2/var/log/prod.log
```

---

## Running SSH Commands from Windows PowerShell

### Escaping Gotchas

PowerShell has issues with special characters in SSH commands:

| Character | Problem                     | Workaround                          |
|-----------|-----------------------------|-------------------------------------|
| `&`       | PowerShell treats as operator | Use `\x26` hex escape in sed       |
| `$`       | Variable interpolation       | Use single quotes or `` \$ ``      |
| `\r`      | Windows CRLF in piped scripts | Avoid piping files; use inline cmds |

**Example — sed with ampersand**:
```powershell
ssh -i shamramain_user.pem azureuser@20.241.4.71 "sudo sed -i 's/serverVersion=8.0/serverVersion=8.0\x26charset=utf8mb4/' /var/www/html/academia_v2/.env.local"
```

### Reading Remote Config

```powershell
# Check .env.local
ssh -i shamramain_user.pem azureuser@20.241.4.71 "sudo grep DATABASE_URL /var/www/html/academia_v2/.env.local"

# Check a Symfony config file
ssh -i shamramain_user.pem azureuser@20.241.4.71 "cat /var/www/html/academia_v2/config/packages/doctrine.yaml"
```

---

## Key Config Files

| File                              | Purpose                              |
|-----------------------------------|--------------------------------------|
| `.env.local`                      | Production env vars (DATABASE_URL)   |
| `config/packages/doctrine.yaml`   | Doctrine DBAL & ORM config           |
| `config/packages/monolog.yaml`    | Logging configuration                |
| `config/services.yaml`            | Service container definitions        |
| `config/routes.yaml`              | Route definitions                    |

---

## Bot 404 Noise Suppression

Bots and scanners generate thousands of 404 errors that drown out real issues (was 83% of all ERROR entries).

**Two layers of protection:**

1. **Apache rules** (in `/etc/apache2/sites-enabled/academia2.conf`) — blocks WordPress scanners (`/wp-login`, `/wp-admin`, `/wp-content`, `/xmlrpc.php`), security probes (`/.env`, `/.git`, `/phpmyadmin`), and browser probes (`/.well-known/*`) before PHP runs. Zero overhead.

2. **Symfony `Bot404Listener`** (`src/EventListener/Bot404Listener.php`) — catches remaining junk 404s, returns plain-text 404, logs at DEBUG (invisible in prod.log). Has two arrays to customize: `BLOCKED_PREFIXES` and `BLOCKED_EXTENSIONS`.

**Design choice:** Static asset extensions (`.css`, `.js`, `.jpg`, fonts) are NOT suppressed — those 404s stay visible so broken asset references on our own pages are caught.

---

## Debugging Workflow Summary

```
1. SSH into server
2. Check log severity counts (CRITICAL → ERROR → WARNING)
3. Get unique error messages with counts (sort by frequency)
4. For each error:
   a. Read the full log line + stack trace
   b. Identify the channel (app vs request vs doctrine)
   c. Find the source file/line in the codebase
   d. Apply the fix locally
   e. Push & deploy (or hotfix on server)
   f. Clear cache
   g. Monitor logs to confirm fix
5. Move to next error
```

---

## Error Audit Log

### 2026-03-05 — Full Log Audit (logs reset after)

**Totals before reset:** 15,016 CRITICAL / 81,202 ERROR (accumulated over weeks)

#### CRITICAL errors (by frequency)

| Count | Error | Status |
|-------|-------|--------|
| 6,616 | `Unknown "string" filter` in `all.html.twig:80` | **Fixed previously** — 0 today |
| 2,084 | `CommentBundle/Resources/views` directory does not exist | **Fixed previously** — 0 today |
| 1,836 | `Property App\Entity\User::$follower does not exist` (ReflectionException) | **Fixed previously** — 0 today |
| 976 | `ResearchController::showAction` argument could not be resolved (`$profilePostRepository`, `$readHistoryService`, `$relatedResearchService`, `$trendingResearchService`) | **Stale cache** — resolved by cache rebuild at 19:37 UTC, 0 errors after |
| 816 | YAML `ParseException` — colon in unquoted mapping value in translations | **Fixed previously** — 0 today |
| 522 | `No registered paths for namespace "Comment"` in `show.html.twig:825` | **Fixed previously** — 0 today |
| 308 | `Too many connections` (MySQL) | 0 today — intermittent under load |
| 212 | `setParameters()` expects `ArrayCollection`, array given (`ResearchRepository:503`) | Stale — needs fix if recurring |
| 166 | `Expr::quoteLiteral()` — `Publisher` entity passed instead of string | Stale — needs fix if recurring |
| 90 | `TrendingResearchService::__construct()` — wrong arg type (logger vs cache) | **Stale cache issue** — 0 today |
| 88 | `fos_user_security_login` route not found | Stale — legacy route reference |
| 48 | `social_feed` route not found | Stale — legacy route reference |
| 44 | `Unknown "string" filter` in `all.html.twig:96` | **Fixed previously** — 0 today |
| 42 | `User::setUsername()` — null given | Edge case in profile update |
| 36 | Failed opening cache container file | **Stale cache** — resolved |
| 34 | `CommunityMember::getDeleted()` undefined method | Stale code reference |
| **2** | **`User::setStudyField()` — null given** (`SettingsController.php:196`) | **ACTIVE BUG — needs fix** |

#### ERROR errors (by frequency)

| Count | Error | Status |
|-------|-------|--------|
| 5,898 | `RelatedResearch MLT failed: Malformed UTF-8` | **Known/ongoing** — bad encoding in ES data, not critical |
| 2,998 | Empty error messages `[] []` | Noise from bare `$logger->error()` calls |
| 1,432 | `/favicon.ico/` route not found | Bot/browser noise |
| 776 | `/uploads/publishers/logo79ed5...jpeg` not found | **Missing publisher logo file** — referenced from research show pages |
| ~2,000 | Various font/static asset 404s (`.woff`, `.woff2`, `.ttf`) | Static assets served with trailing slash — Apache misconfiguration or relative path issue |
| 414 | Cache file `Failed to open stream` (today only) | **Stale cache** — resolved by rebuild |
| 286 | `show.notFound.university` | Users/bots hitting deleted university pages |
| 240 | `getBot404ListenerService.php` missing | **Stale cache** — resolved |
| 176 | `RelatedResearch MLT failed` (no message) | Same as UTF-8 issue above |
| 128 | `/ip/` route not found | Bot probe |
| 118 | `/apple-touch-icon.png` not found | Browser probe — add file or Apache redirect |
| 106 | `RelatedResearch MLT failed: Syntax error` | Empty query hitting ES |

#### Active issues needing fix

1. **`setStudyField()` null bug** — `SettingsController.php:196` passes null to `User::setStudyField()` which has a non-nullable `Field` type hint. Fix: make param nullable or add null guard in controller.
2. **Missing publisher logo** — `logo79ed5af8a95659f386b3feb06a340589.jpeg` referenced but doesn't exist in `/uploads/publishers/`. Either restore the file or fix the DB record.
3. **`ResearchRepository:503` ArrayCollection** — `setParameters()` still using plain array. Fix: wrap in `new ArrayCollection()`.

#### Resolved by cache rebuild

The deploy on 2026-03-05 left a stale cache (`Container5isxXG5`) that caused ~1,400 errors. Cache was rebuilt at 19:37 UTC and all argument-resolution and container-file errors stopped immediately.

**Logs were reset (truncated) after this audit.**
