# CLAUDE.md — Session Reference

> Read this at the start of every session. It contains critical context about the project.

---

## Project Overview

**Shamra Academia** / **شمرا أكاديميا** (`shamra-academia.com`) — Arabic-first academic research platform with AI-powered tools (writing assistant, translation, OCR, research planning). Symfony 7+ / PHP 8.2+ / Doctrine ORM / MariaDB.

### Branding

- **English**: Shamra Academia
- **Arabic**: شمرا أكاديميا
- Always use the full brand name in user-facing text (emails, UI, subjects, meta tags).
- Domain: `shamra-academia.com`

---

## Server & Infrastructure

| Resource | Details |
|---|---|
| **App Server** | Azure VM `20.241.4.71`, SSH via PEM key |
| **SSH Command** | `cd C:\Users\shadisaleh\Documents\linux` then `ssh -i shamramain_user.pem azureuser@20.241.4.71` |
| **MySQL Server** | Separate VM `20.236.64.82:3306`, DB: `academia_v2_prod2` |
| **MySQL SSH** | `cd C:\Users\shadisaleh\Documents\linux` then `ssh -i ubuntudev1keypair.pem ubuntu@20.236.64.82` |
| **App Path (prod)** | `/var/www/html/academia_v2` |
| **Domain** | `shamra-academia.com` |
| **Web Server** | Apache, runs as `www-data` |
| **Messenger Worker** | `shamra-messenger-worker.service` (single worker, consumes `async` transport via Doctrine) |
| **Pandoc** | `/usr/bin/pandoc` (for DOCX/PDF conversion) |
| **Elasticsearch** | `https://shamraindex:9200` (VM `20.106.250.185`), user: `elastic`, cluster: `shamra-academia` |
| **Redis** | `localhost:6379` on App Server, used for Symfony cache (search results, homepage data) |

### Elasticsearch

- Single-node cluster on dedicated ES VM (`20.106.250.185:9200`, hostname `shamraindex`)
- Auth: `elastic` (password stored on ES server in `/root/es_elastic_password.txt`)
- TLS: HTTPS enabled; cert CN/SAN uses `shamraindex` (prefer hostname over raw IP)
- Indices: `arabic_research`, `english_research`, `ai_study`, `user_reference_chunks`
- Health check: `curl -s -u elastic:<ES_PASSWORD> 'https://shamraindex:9200/_cluster/health?pretty'`
- Node stats: `curl -s -u elastic:<ES_PASSWORD> 'https://shamraindex:9200/_cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m,disk.used_percent'`
- **Query timeout**: 5s (configured in `Elasticsearch.php`), connection timeout: 2s, request timeout: 10s

### Redis (Cache Layer)

- Installed on App Server (`localhost:6379`)
- Used for Symfony app cache (search results, homepage data, user queries)
- Config: `config/packages/cache.yaml` → `cache.adapter.redis`
- PHP extension: `php8.4-redis`

```bash
# Check Redis status
redis-cli ping                          # Should return PONG
systemctl is-active redis-server        # Should return active

# Monitor Redis in real-time
redis-cli monitor

# Check cache stats
redis-cli info stats | grep -E "keyspace_hits|keyspace_misses"

# Clear all cache (use carefully!)
redis-cli FLUSHALL
```

**Cache TTLs** (in `HomepageController.php`):
- Search results: 24 hours (`PT24H`)
- Homepage tabs: 30 minutes (`PT30M`)
- Static data (fields, publishers): 6 hours (`PT6H`)

### Git Remotes

- `origin` → `git@github.com:shadisaleh/academia.git` (SSH — push from WSL, not PowerShell)

### Push & Deploy

```bash
# From WSL (PowerShell SSH key doesn't work for git push):
wsl -d Ubuntu -- bash -c "cd /var/www/academia_v2 && git add -A && git commit -m 'message' && git push origin main"
```

Push to `main` triggers `deploy-prod` GitHub Action automatically.

**Tip**: If git shows a pager during commit, prefix with `GIT_PAGER=cat`:
```bash
wsl -d Ubuntu -- bash -c "cd /var/www/academia_v2 && GIT_PAGER=cat git commit -m 'message'"
```

---

## CRITICAL Rules

### 1. Cache rebuild strategy — `rm -rf` + `cache:warmup`

The deploy workflow deletes `var/cache/prod/` then runs `cache:warmup` so that
config changes (e.g. session save_path) are always picked up. This does NOT cause
mass logouts because:

- Sessions live in `var/sessions/` (not in the cache dir)
- The `User` entity has `__serialize()`/`__unserialize()` producing a stable
  format that doesn't depend on the container hash

Never run `cache:clear` directly — it does the same delete + rebuild but also
runs extra cleanup that can interfere with a live site. The deploy approach
(`rm -rf var/cache/prod/ && cache:warmup`) is safer.

```bash
# CORRECT — deploy workflow does this:
sudo rm -rf var/cache/prod/
sudo -u www-data php bin/console cache:warmup --env=prod

# WRONG — don't use cache:clear on production:
# sudo -u www-data php bin/console cache:clear --env=prod   ← NEVER DO THIS
```

### 2. Sessions are stored in `var/sessions`, NOT `/tmp`

Apache runs with systemd's `PrivateTmp=yes`. This means `/tmp` is an **isolated private directory** that gets **destroyed on every `systemctl restart apache2`**. If sessions were stored in `/tmp`, every deploy would log out all users.

Sessions are configured in `config/packages/framework.yaml`:
```yaml
session:
    save_path: '%kernel.project_dir%/var/sessions'
```

NEVER change `save_path` back to `/tmp`. The `var/sessions/` directory is created by the deploy workflow and owned by `www-data`.

### 3. Never run Symfony commands as root on prod

```bash
# ALWAYS use:
sudo -u www-data php bin/console cache:warmup --env=prod

# If you broke it:
sudo chown -R www-data:www-data /var/www/html/academia_v2/var/cache /var/www/html/academia_v2/var/log
sudo chmod -R 775 /var/www/html/academia_v2/var/cache /var/www/html/academia_v2/var/log
sudo systemctl restart apache2
```

### 4. PowerShell + SSH escaping is painful

- `$` gets interpolated → use single quotes or base64-encode scripts
- Parentheses in SQL break bash → write SQL to a file first, or use base64+PHP approach
- For complex commands, write a heredoc script on the server first

### 5. User table is `fos_user`, not `user`

FOSUserBundle legacy. The entity is `App\Entity\User` but the table name is `fos_user`.

### 6. PHP linter version mismatch

VS Code PHP linter may use an older PHP version than the server (8.2+). Avoid:
- Constructor property promotion (`public function __construct(private X $x)`)
- `match` expressions
- `static` return types
- Nullsafe operator (`?->`)

Use traditional patterns instead to avoid false linter errors.

### 7. Translations are YAML files

- App translations: `translations/` (e.g., `Ocr.ar.yml`, `Ocr.en.yml`)
- Bundle translations: `src/syndex/AcademicBundle/Resources/translations/`
- Header/nav labels: `translations/UserBundle.ar.yml` / `UserBundle.en.yml`

### 8. NEVER run `doctrine:schema:update --force` on production

This command compares DB schema to Doctrine entity mappings and **DROPS any tables that don't have corresponding entities**. Tables like `playground_tier_config` and `app_setting` (created via raw SQL migrations) will be destroyed.

```bash
# NEVER DO THIS:
# php bin/console doctrine:schema:update --force --env=prod   ← DESTROYS UNMAPPED TABLES

# CORRECT — always use migrations:
sudo -u www-data php bin/console doctrine:migrations:migrate --no-interaction --env=prod
```

If migrations fail, fix the migration — don't use schema:update as a fallback.

---

## Key Architecture

### Symfony Messenger

- Transport: Doctrine (`async`)
- Worker: `shamra-messenger-worker.service` (single worker handles ALL async messages)
- Config: `config/packages/messenger.yaml`
- Messages route to `async` transport

### Credit System (PlaygroundSubscription)

| Tier | Price | Credits |
|------|-------|---------|
| trial | $0 | 100 |
| starter | $9 | 500 |
| researcher | $19 | 1,500 |
| professional | $39 | 4,000 |
| institution | $99 | 15,000 |

- Entity: `src/Entity/PlaygroundSubscription.php`
- Tier config table: `playground_tier_config` (raw SQL, no entity, collation: `utf8mb4_unicode_ci`)
- OCR costs: 2 credits/page, minimum 5 credits

**Important**: `playground_tier_config` was created manually on prod (no migration). Collation must be `utf8mb4_unicode_ci` to match other tables — `utf8mb4_0900_ai_ci` causes collation mismatch errors.

### Dual Subscription System (CRITICAL)

Two parallel subscription systems exist:

| System | Table | Purpose |
|--------|-------|---------|
| `PlaygroundSubscription` | `playground_subscription` | AI credits (OCR, writing assistant) |
| `AcademicSubscription` | `academic_subscription` | PDF downloads |

**AcademicSubscription** requires THREE fields for `isActive()` to return true:
- `subscribed = 1`
- `status = 'active'` ← **Often missing! Causes redirect loops**
- `subscribedTill > NOW()`

**Common Bug**: User sees "Download PDF" button but gets redirect loop because `status` is NULL.

**Fix**: `UPDATE academic_subscription SET status = 'active' WHERE user_id = X;`

**Admin Grant**: Dashboard "💳 Subs" tab creates BOTH subscription types for paid tiers.

- Entity: `src/syndex/AcademicBundle/Entity/AcademicSubscription.php`
- Service: `src/syndex/AcademicBundle/Service/SubscribeService.php` (`isActive()`, `isSubscribed()`)
- Download action: `HomepageController::documentDownloadAction()` uses `isActive()`

### OCR Service (built Feb 2026)

- **Controller**: `src/Controller/OcrController.php` — 7 routes under `/ocr`
- **Entity**: `src/Entity/OcrJob.php` — table `ocr_job`
- **Service**: `src/Service/OcrJobService.php` — job lifecycle
- **Message**: `src/Message/ProcessOcrJob.php` + `src/MessageHandler/ProcessOcrJobHandler.php`
- **Mistral OCR**: `src/Service/Playground/MistralOcrService.php` (`extractWithImages()`)
- **Template**: `templates/ocr/index.html.twig`
- **Translations**: `translations/Ocr.ar.yml`, `translations/Ocr.en.yml`
- **Email templates**: `templates/emails/ocr_complete.html.twig`, `ocr_failed.html.twig`
- **Uploads dir (prod)**: `/var/www/html/academia_v2/public/uploads/ocr/` (owned by www-data)

### Admin Dashboard

- **URL**: `/jim19ud83/playground/dashboard`
- **Controller**: `src/Controller/PlaygroundAdminController.php`
- **Template**: `templates/admin/playground/dashboard.html.twig`
- Includes OCR stats tab with API at `/jim19ud83/playground/api/ocr-stats`

### Key Services

| Service | Purpose |
|---|---|
| `UsageMonitorService` | Credit tracking, deductions, dashboard stats |
| `MistralOcrService` | Mistral Document AI OCR calls |
| `OcrJobService` | OCR job lifecycle (create, download, delete) |

### Search / Elasticsearch

- **Filter route**: `/filter?title=...&type=1` → `HomepageController::filterDataAction`
- **Controller**: `src/syndex/AcademicBundle/Controller/HomepageController.php` (line ~1425)
- **ES query classes**: `src/syndex/AcademicBundle/Service/Elasticsearch/`
  - `QueryBuilder.php` — builds full query JSON
  - `SimpleQuery/SimpleQuery.php` — single field match (accepts string or array with `query`/`operator`/`boost`)
  - `BooleanQuery/BooleanGroupQuery.php`, `Must.php`, `Should.php` — bool query composition
- **Index mapping**: `english_full_title` (text), `arabic_full_title` (text, arabic analyzer), `english_abstract`, `content`, `tag`
- **Entity mapper**: `src/syndex/AcademicBundle/Module/ElasticMapper.php`
- **`type=null/0`** → Arabic index; **`type=1`** → English index
- **Sort**: defaults to relevance when title search is active; user can toggle date/relevance via `sortOrder` param
- **Caching**: Results cached in `DatabaseProxy` with key `filterEnglishDataResults::{page}::{sortBy}::{title}::{...}`
- **Search operators**: `operator: and` requires all terms; field boosting via `boost` param; phrase boost via raw `match_phrase` added to `should`

### Search CTR Tracking (built Mar 2026)

- **Entities**: `src/Entity/SearchQueryLog.php` (table `search_query_log`), `src/Entity/SearchClickLog.php` (table `search_click_log`)
- **Repositories**: `src/Repository/SearchQueryLogRepository.php`, `src/Repository/SearchClickLogRepository.php`
- **Click tracking API**: `src/Controller/SearchTrackingController.php` — `POST /api/search-click`
- **Logging**: Title-only searches on page 1 are logged in `filterDataAction` (HomepageController)
- **JS tracking**: Click tracking via Beacon API in `all.html.twig`, search query ID propagated via AJAX in `filters.js`
- **Admin dashboard**: "Search CTR" tab in admin dashboard — stats, trend chart, top queries, queries without clicks, zero-result queries, position distribution
- **Admin API routes**: `/jim19ud83/playground/api/search-ctr-stats`, `search-ctr-trend`, `search-ctr-top-queries`, `search-ctr-no-clicks`, `search-ctr-zero-results`, `search-ctr-positions`

---

## File Locations Quick Reference

| What | Where |
|---|---|
| Routes | `config/routes.yaml` + attribute-based in controllers |
| Services | `config/services.yaml` |
| Messenger config | `config/packages/messenger.yaml` |
| Entities | `src/Entity/` |
| Controllers | `src/Controller/` |
| Templates | `templates/` |
| Migrations | `migrations/` |
| Header/nav | `templates/header.html.twig` |
| Base layout | `templates/base.html.twig` |
| Bot 404 filter | `src/EventListener/Bot404Listener.php` |
| Debug guide | `DEBUG_ERRORS.md` |

---

## Debugging Production

```bash
# SSH in
cd C:\Users\shadisaleh\Documents\linux
ssh -i shamramain_user.pem azureuser@20.241.4.71

# Count errors
grep -c "\.CRITICAL:" /var/www/html/academia_v2/var/log/prod.log
grep -c "\.ERROR:" /var/www/html/academia_v2/var/log/prod.log

# Top unique errors
grep "\.CRITICAL:" /var/www/html/academia_v2/var/log/prod.log \
  | sed 's/\[.*\] //' | cut -d'{' -f1 | sort | uniq -c | sort -rn | head -20

# Recent errors
tail -30 /var/www/html/academia_v2/var/log/prod.log

# Live tail
sudo tail -f /var/www/html/academia_v2/var/log/prod.log | grep --line-buffered "CRITICAL\|ERROR"
```

### Hotfix directly on server (emergency only)

```bash
sudo sed -i 's/OLD_STRING/NEW_STRING/' /var/www/html/academia_v2/path/to/file
sudo -u www-data php /var/www/html/academia_v2/bin/console cache:clear --env=prod
```

---

## Common Patterns

### Running SQL on prod

```bash
sudo -u www-data php /var/www/html/academia_v2/bin/console dbal:run-sql --env=prod 'SELECT ...'
```

For INSERT/UPDATE with quotes, write a bash heredoc script on the server or use base64-encoded PHP.

### Creating a migration

```bash
# Local
php bin/console doctrine:migrations:diff
# Or create manually in migrations/VersionYYYYMMDDHHMMSS.php
```

### Checking messenger worker

```bash
sudo systemctl status shamra-messenger-worker
sudo journalctl -u shamra-messenger-worker -f
```

---

## Known Issues / Recurring Errors

- **Malformed UTF-8**: Ongoing, comes from mixed-encoding data in DB. Not critical.
- **Elasticsearch parse errors**: `[match_phrase] requires query value` — empty search queries hitting ES.
- **Bot 404s**: Suppressed by `Bot404Listener` + Apache rules, but some still appear.
- **`/img/shamra-white.png` 404**: Static asset referenced somewhere but doesn't exist at that path.
- **Collation mismatches**: New tables default to `utf8mb4_0900_ai_ci` (MySQL 8 default) but existing tables use `utf8mb4_unicode_ci`. Always specify `COLLATE utf8mb4_unicode_ci` when creating tables.
- **`/${base}docx` 404**: Unresolved Twig variable in a download URL template — investigate if recurring.
