# Site Performance Skill

> Quick reference for diagnosing and optimizing Shamra Academia performance.

---

## Performance Diagnosis Skills

Use this skill as a diagnosis playbook, not just a command list. The goal is to identify which layer is actually failing before making changes.

### 1. Establish the symptom first
- Confirm whether the problem is load, latency, errors, or a single broken feature.
- Capture the timeframe: now, last hour, or all day.
- Check whether the issue affects all pages or only one path such as search, article pages, OCR, or admin.

### 2. Separate infrastructure from application issues
- Start with server load, memory, disk, and Apache connection counts.
- Then check Symfony `prod.log` for `ERROR`, `CRITICAL`, and repeated signatures.
- If load is healthy but errors are high, the root cause is usually an upstream dependency, bad runtime config, or a failing code path rather than VM exhaustion.

### 3. Diagnose dependencies explicitly
- For MySQL issues: verify slow queries, running queries, and connection count.
- For Elasticsearch issues: verify TCP reachability, authenticated cluster health, node stats, and whether the app is using the correct endpoint.
- For Messenger/background tasks: verify worker status and whether async backlog is contributing to latency.

### 4. Trace runtime configuration, not just repository config
- A correct value in `.env` does not guarantee production is using it.
- Check runtime sources in this order: compiled env (`.env.local.php`), prod-only env files, service-level environment, then cached container behavior.
- When symptoms contradict the repo config, assume runtime drift until proven otherwise.

### 5. Verify the fix at the same layer as the failure
- If the failure was runtime config, validate the effective runtime value after cache warmup.
- If the failure was infrastructure, re-check health from the app server, not from local development.
- Always compare error rate before and after the change for at least a few minutes.

### Fast Diagnosis Checklist
- Is the VM overloaded?
- Are app errors concentrated on one signature?
- Is the failing service reachable from the app server?
- Is the app using the intended runtime host/credentials?
- Did the error rate drop after the change?

---

## IMPORTANT: PowerShell SSH Command Escaping

When running SSH commands from Windows PowerShell, special characters cause issues:

### What DOESN'T work from PowerShell:
- `$` gets interpolated (use simple commands or avoid `$NF`, `$1` etc.)
- Complex `awk` scripts with `{}` and parentheses
- Nested quotes
- `sed` with regex groups `\(\)`

### Solutions:

**Option 1: Use simple grep/tail commands (RECOMMENDED)**
```powershell
# This works - simple commands
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "uptime; free -h"
```

**Option 2: SSH in interactively first**
```powershell
# SSH in, then run complex commands
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71
# Now you're on the server - run any bash command
```

**Option 3: Create a script on the server**
```bash
# On server, create /usr/local/bin/check_slow.sh with complex commands
# Then from PowerShell just call: ssh ... "bash /usr/local/bin/check_slow.sh"
```

---

## 1. Server Health Check (Quick Overview)

```bash
# SSH to app server
ssh -i shamramain_user.pem azureuser@20.241.4.71

# One-liner server health
uptime; free -h; df -h / | tail -1

# Detailed check
echo "=== Load ===" && uptime
echo "=== Memory ===" && free -h
echo "=== Disk ===" && df -h /
echo "=== Apache connections ===" && ss -tnp | grep -c ':80\|:443'
echo "=== Top CPU processes ===" && ps aux --sort=-%cpu | head -8
echo "=== Top Memory processes ===" && ps aux --sort=-%mem | head -8
```

### Interpreting Load Average
- Load < CPU cores (4 on this VM) = healthy
- Load 4-8 = moderate, monitor
- Load > 8 = investigate immediately

---

## 2. Log Analysis

### Recent Errors
```bash
# Count errors in last hour
grep "$(date +'%Y-%m-%dT%H')" /var/www/html/academia_v2/var/log/prod.log | grep -c "CRITICAL\|ERROR"

# Unique error types (top 20)
grep "\.CRITICAL:\|\.ERROR:" /var/www/html/academia_v2/var/log/prod.log \
  | sed 's/\[.*\] //' | cut -d'{' -f1 | sort | uniq -c | sort -rn | head -20

# Live tail for real-time monitoring
sudo tail -f /var/www/html/academia_v2/var/log/prod.log | grep --line-buffered "CRITICAL\|ERROR"

# Errors with full context
grep -B2 -A5 "CRITICAL" /var/www/html/academia_v2/var/log/prod.log | tail -50
```

### Apache Logs
```bash
# The academia vhost access log (large — use sudo)
sudo tail -n 10000 /var/log/apache2/academia_v2_access.log | head -5

# Recent 500 errors
sudo grep " 500 " /var/log/apache2/academia_v2_access.log | tail -20

# Slow requests (response time > 5 seconds, time in microseconds — last column)
sudo awk '$NF > 5000000 {print $7, $NF/1000000 "s"}' /var/log/apache2/academia_v2_access.log | tail -20

# Requests per minute (last ~20k lines)
sudo tail -n 20000 /var/log/apache2/academia_v2_access.log | awk '{print $4}' | cut -d: -f1,2,3 | sort | uniq -c | tail -10

# Top 20 IPs (last 10k lines)
sudo tail -n 10000 /var/log/apache2/academia_v2_access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -20

# Top URLs being hit (last 5k lines)
sudo tail -n 5000 /var/log/apache2/academia_v2_access.log | awk '{print $7}' | cut -d'?' -f1 | sort | uniq -c | sort -rn | head -20

# Top User-Agents — reveals bots/crawlers (last 5k lines)
sudo tail -n 5000 /var/log/apache2/academia_v2_access.log | awk -F'"' '{print $6}' | sort | uniq -c | sort -rn | head -15

# How many hits are from meta-webindexer today
sudo grep -c "meta-webindexer" /var/log/apache2/academia_v2_access.log

# How many hits are from OAI-SearchBot today
sudo grep -c "OAI-SearchBot" /var/log/apache2/academia_v2_access.log
```

---

## 3. Slow Endpoint Detection

### Method 1: Apache Access Log Analysis
```bash
# Endpoints sorted by average response time (requires custom log format with %D)
awk '{
  url=$7; time=$NF/1000000  # Convert microseconds to seconds
  count[url]++
  total[url]+=time
} END {
  for (url in count) 
    if (count[url] > 5) 
      printf "%6.2fs avg (%4d hits) %s\n", total[url]/count[url], count[url], url
}' /var/log/apache2/access.log | sort -rn | head -30
```

### Method 2: Symfony Profiler (dev only)
Enable in `config/packages/dev/web_profiler.yaml` and access `/_profiler`

### Method 3: RequestTimingSubscriber (IMPLEMENTED ✅)
**Location:** `src/EventSubscriber/RequestTimingSubscriber.php`

Automatically logs all requests taking > 2 seconds.

#### Commands that work from PowerShell SSH:
```powershell
# View recent slow requests (WORKS)
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "grep 'Slow request' /var/www/html/academia_v2/var/log/prod.log | tail -20"

# Count total slow requests in recent logs (WORKS)
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "tail -5000 /var/www/html/academia_v2/var/log/prod.log | grep 'Slow request' | wc -l"

# Count by route type (WORKS)
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "tail -5000 /var/www/html/academia_v2/var/log/prod.log | grep 'shamra_academia_filter' | grep 'Slow' | wc -l"
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "tail -5000 /var/www/html/academia_v2/var/log/prod.log | grep 'shamra_academia_research_show' | grep 'Slow' | wc -l"

# Get slowest requests (duration values) (WORKS)
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "tail -2000 /var/www/html/academia_v2/var/log/prod.log | grep 'shamra_academia_filter' | grep 'Slow' | grep -oP 'duration_s.:[0-9.]+' | cut -d: -f2 | sort -rn | head -10"
```

#### Commands to run AFTER SSH-ing in:
```bash
# SSH in first
ssh -i shamramain_user.pem azureuser@20.241.4.71

# Then run these on the server:
grep "Slow request" /var/www/html/academia_v2/var/log/prod.log | tail -20
grep "Slow request" /var/www/html/academia_v2/var/log/prod.log | grep "$(date +'%Y-%m-%d')"
grep "Slow request" /var/www/html/academia_v2/var/log/prod.log | grep -oP '"route":"[^"]+"' | sort | uniq -c | sort -rn | head -20
```

Log format includes: route, method, uri, duration_ms, duration_s, ip

---

## 4. Database Performance

### Connect to MySQL Server
```bash
# SSH to MySQL server
ssh -i ubuntudev1keypair.pem ubuntu@20.236.64.82

# Or from app server via Symfony
sudo -u www-data php /var/www/html/academia_v2/bin/console dbal:run-sql --env=prod 'SHOW STATUS LIKE "Threads_connected"'
```

### Slow Query Analysis
```bash
# On MySQL server (20.236.64.82)
# Check if slow query log is enabled
mysql -u root -p -e "SHOW VARIABLES LIKE 'slow_query%'; SHOW VARIABLES LIKE 'long_query_time';"

# Enable slow query log (temporary)
mysql -u root -p -e "SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 2;"

# View slow queries
sudo tail -100 /var/lib/mysql/mysql-slow.log
```

### Current Query Analysis
```bash
# Show running queries
mysql -u root -p -e "SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE, LEFT(INFO, 80) as QUERY FROM information_schema.PROCESSLIST WHERE COMMAND != 'Sleep' ORDER BY TIME DESC;"

# Show queries running > 5 seconds
mysql -u root -p -e "SELECT * FROM information_schema.PROCESSLIST WHERE TIME > 5 AND COMMAND != 'Sleep';"

# Kill a stuck query
mysql -u root -p -e "KILL QUERY <process_id>;"
```

### Table Statistics
```bash
# Large tables
mysql -u root -p academia_v2_prod2 -e "
SELECT table_name, 
       ROUND(data_length/1024/1024, 2) as data_mb,
       ROUND(index_length/1024/1024, 2) as index_mb,
       table_rows
FROM information_schema.tables 
WHERE table_schema = 'academia_v2_prod2' 
ORDER BY data_length DESC 
LIMIT 20;"

# Missing indexes (tables without indexes)
mysql -u root -p academia_v2_prod2 -e "
SELECT t.TABLE_NAME
FROM information_schema.TABLES t
LEFT JOIN information_schema.STATISTICS s ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
WHERE t.TABLE_SCHEMA = 'academia_v2_prod2' AND s.INDEX_NAME IS NULL;"
```

### Query Explain
```bash
# Analyze a slow query
mysql -u root -p academia_v2_prod2 -e "EXPLAIN SELECT ... YOUR QUERY HERE ...;"

# Via Symfony
sudo -u www-data php /var/www/html/academia_v2/bin/console dbal:run-sql --env=prod 'EXPLAIN SELECT id FROM research WHERE ...'
```

---

## 5. Elasticsearch Performance

### Health Check
```bash
# On app server
curl -s -u elastic:<ES_PASSWORD> 'https://shamraindex:9200/_cluster/health?pretty'

# Node stats
curl -s -u elastic:<ES_PASSWORD> 'https://shamraindex:9200/_cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m,disk.used_percent'

# Index sizes
curl -s -u elastic:<ES_PASSWORD> 'https://shamraindex:9200/_cat/indices?v&h=index,docs.count,store.size,pri.store.size'

# Slow queries (if slow log enabled)
curl -s -u elastic:<ES_PASSWORD> 'https://shamraindex:9200/_cat/indices?v' | grep slowlog
```

### Runtime Host Diagnosis

Use this when logs show `No alive nodes found in your cluster` or when the expected cluster is healthy but the app still fails.

#### Diagnosis goals
- Confirm the endpoint the app should use.
- Confirm the endpoint the app is actually using at runtime.
- Confirm whether the problem is cluster health, network reachability, or stale runtime configuration.

#### Recommended production sequence
```bash
# 1) Test the intended endpoint from the app server
curl -sS --max-time 10 'https://elastic:<ES_PASSWORD>@shamraindex:9200/_cluster/health?pretty'

# 2) Confirm DNS and TCP reachability from the app server
getent hosts shamraindex
timeout 5 bash -c '</dev/tcp/shamraindex/9200' && echo OPEN || echo CLOSED

# 3) Inspect effective runtime ELASTIC_HOST from compiled env if present
php -r '$e=@include ".env.local.php"; if (is_array($e) && isset($e["ELASTIC_HOST"])) echo $e["ELASTIC_HOST"], PHP_EOL;'

# 4) Fallback to prod env files if compiled env is absent
grep -E '^ELASTIC_HOST=' .env.prod.local .env 2>/dev/null
```

#### Interpretation
- If `shamraindex` is green but the app still throws `No alive nodes`, the app is likely using a stale or wrong `ELASTIC_HOST`.
- If TCP to `shamraindex:9200` is closed, this is network or service availability, not Symfony config.
- If runtime `ELASTIC_HOST` differs from the intended endpoint, fix the runtime source and warm cache safely.

#### Safe remediation reminder
```bash
# Correct production cache rebuild approach
sudo rm -rf /var/www/html/academia_v2/var/cache/prod/
sudo -u www-data php /var/www/html/academia_v2/bin/console cache:warmup --env=prod
```

Do not use `cache:clear` on production.

### Query Profiling
```bash
# Profile a search query
curl -s -u elastic:<ES_PASSWORD> -X POST 'https://shamraindex:9200/arabic_research/_search?pretty' \
  -H 'Content-Type: application/json' \
  -d '{
    "profile": true,
    "query": { "match": { "arabic_full_title": "الذكاء الاصطناعي" } }
  }'
```

---

## 6. Caching Analysis

### Symfony Cache Status
```bash
# Cache directory size
du -sh /var/www/html/academia_v2/var/cache/prod/

# List cache pools
sudo -u www-data php /var/www/html/academia_v2/bin/console cache:pool:list --env=prod

# Clear specific pool
sudo -u www-data php /var/www/html/academia_v2/bin/console cache:pool:clear cache.app --env=prod
```

### OPcache Status
```bash
# Check OPcache stats (create a temp file)
echo '<?php var_dump(opcache_get_status());' | sudo -u www-data php

# Or check via phpinfo
php -i | grep -i opcache
```

### Redis/Memcached (if used)
```bash
# Check Redis connection
redis-cli ping
redis-cli info memory
redis-cli info stats
```

---

## 7. Security & Penetration Testing

### Basic Security Checks
```bash
# Check open ports
sudo ss -tlnp

# Check failed SSH attempts
sudo grep "Failed password" /var/log/auth.log | tail -20

# Check unusual processes
ps aux | grep -v "^\[" | awk '{print $11}' | sort | uniq -c | sort -rn | head -20

# Check cron jobs
sudo crontab -l
crontab -l
ls -la /etc/cron.d/
```

### Web Application Testing
```bash
# Check for common vulnerabilities with nikto (install first: apt install nikto)
nikto -h https://shamra-academia.com -ssl

# Check SSL configuration
curl -sI https://shamra-academia.com | head -20
openssl s_client -connect shamra-academia.com:443 -brief

# Check security headers
curl -sI https://shamra-academia.com | grep -iE "x-frame|x-xss|x-content|strict-transport|content-security"
```

### SQL Injection Test Points
Check these endpoints manually with tools like sqlmap (use responsibly):
- `/filter?title=` - search endpoint
- `/login` - authentication
- Any `?id=` parameters

### Rate Limiting Check
```bash
# Test rate limits
for i in {1..20}; do curl -s -o /dev/null -w "%{http_code}\n" "https://shamra-academia.com/filter?title=test"; done
```

---

## 8. Load Testing

### Apache Benchmark (ab)
```bash
# Simple load test (100 requests, 10 concurrent)
ab -n 100 -c 10 https://shamra-academia.com/

# With custom headers
ab -n 100 -c 10 -H "Accept-Language: ar" https://shamra-academia.com/
```

### Siege (more realistic)
```bash
# Install: apt install siege

# 60-second test with 25 concurrent users
siege -c 25 -t 60S https://shamra-academia.com/

# Test multiple URLs from file
siege -c 25 -t 60S -f urls.txt
```

### Locust (Python-based, scriptable)
```python
# locustfile.py
from locust import HttpUser, task, between

class ShamraUser(HttpUser):
    wait_time = between(1, 3)
    
    @task(3)
    def homepage(self):
        self.client.get("/")
    
    @task(2)
    def search(self):
        self.client.get("/filter?title=علم&type=0")
    
    @task(1)
    def research_page(self):
        self.client.get("/research/12345")

# Run: locust -f locustfile.py --host=https://shamra-academia.com
```

---

## 9. Quick Optimization Checklist

### Frontend
- [ ] Enable gzip/brotli compression (check: `curl -sI -H "Accept-Encoding: gzip" URL | grep -i encoding`)
- [ ] Set proper cache headers for static assets
- [ ] Minify CSS/JS
- [ ] Optimize images (WebP format)
- [ ] Lazy load images below fold

### Backend
- [ ] Enable OPcache with proper settings
- [ ] Use Doctrine query cache
- [ ] Add database indexes for frequent queries
- [ ] Implement HTTP caching (ETags, Cache-Control)
- [ ] Use async processing for heavy tasks (Messenger)

### Database
- [ ] Enable slow query log
- [ ] Add composite indexes for common WHERE+ORDER BY
- [ ] Use EXPLAIN on slow queries
- [ ] Consider read replicas for heavy reads
- [ ] Optimize table structures (proper column types)

### Infrastructure
- [ ] CDN for static assets
- [ ] Redis for session/cache storage
- [ ] Connection pooling
- [ ] Horizontal scaling (load balancer)

---

## 10. Monitoring Setup

### Health Endpoints (IMPLEMENTED ✅)
**Location:** `src/Controller/HealthController.php`

| Endpoint | Purpose | Use Case |
|----------|---------|----------|
| `GET /health` | Quick check (app running) | High-frequency monitoring (30s-1min) |
| `GET /health/full` | Deep check (app + DB + cache) | Less frequent checks (5min) |

```bash
# Quick health check
curl -s https://shamra-academia.com/health | jq

# Full health check (includes database)
curl -s https://shamra-academia.com/health/full | jq

# Use with Uptime Robot, Pingdom, AWS Route53 Health Checks, etc.
```

**Response format:**
```json
{
  "status": "ok",
  "checks": { "app": true, "database": true, "cache": true },
  "response_time_ms": 45.2,
  "timestamp": 1741531200
}
```

### Recommended External Tools
1. **Prometheus + Grafana** - Metrics visualization
2. **New Relic / Datadog** - APM (Application Performance Monitoring)
3. **Sentry** - Error tracking
4. **Uptime Robot** - Availability monitoring (use `/health` endpoint)

### Quick DIY Monitoring Script
```bash
#!/bin/bash
# /usr/local/bin/health_check.sh

LOG_FILE="/var/log/health_check.log"
ALERT_EMAIL="admin@shamra-academia.com"

# Check load
LOAD=$(uptime | awk -F'load average:' '{print $2}' | cut -d, -f1 | tr -d ' ')
if (( $(echo "$LOAD > 10" | bc -l) )); then
    echo "$(date) HIGH LOAD: $LOAD" >> $LOG_FILE
fi

# Check disk
DISK=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
if [ "$DISK" -gt 85 ]; then
    echo "$(date) HIGH DISK: ${DISK}%" >> $LOG_FILE
fi

# Check Apache
if ! systemctl is-active --quiet apache2; then
    echo "$(date) APACHE DOWN" >> $LOG_FILE
    systemctl restart apache2
fi

# Check Messenger worker
if ! systemctl is-active --quiet shamra-messenger-worker; then
    echo "$(date) MESSENGER WORKER DOWN" >> $LOG_FILE
    systemctl restart shamra-messenger-worker
fi
```

Add to cron: `*/5 * * * * /usr/local/bin/health_check.sh`

---

## 11. Emergency Response

### High Load
```bash
# 1. Identify cause
ps aux --sort=-%cpu | head -10
sudo tail -n 5000 /var/log/apache2/academia_v2_access.log | awk '{print $7}' | cut -d'?' -f1 | sort | uniq -c | sort -rn | head -10

# 2a. Check for DDoS / single abusive IP
sudo tail -n 10000 /var/log/apache2/academia_v2_access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10

# 2b. Check for crawler flood (top user agents)
sudo tail -n 5000 /var/log/apache2/academia_v2_access.log | awk -F'"' '{print $6}' | sort | uniq -c | sort -rn | head -10

# 2c. Check requests/minute trend
sudo tail -n 30000 /var/log/apache2/academia_v2_access.log | awk '{print $4}' | cut -d: -f1,2,3 | sort | uniq -c | tail -10

# 3. Temporary mitigations
# Block abusive IP via iptables
sudo iptables -A INPUT -s <IP> -j DROP

# For crawler floods: ensure Cache-Control: public s-maxage is set on /filter
# (already implemented Mar 2026 — Cloudflare should absorb repeat hits after warm-up)

# Restart Apache gracefully (does NOT log users out — sessions in var/sessions/)
sudo systemctl reload apache2

# Emergency cache rebuild
sudo rm -rf /var/www/html/academia_v2/var/cache/prod/
sudo -u www-data php /var/www/html/academia_v2/bin/console cache:warmup --env=prod
```

### Database Deadlock
```bash
# Show locked processes
mysql -u root -p -e "SHOW ENGINE INNODB STATUS\G" | grep -A30 "LATEST DETECTED DEADLOCK"

# Kill blocking queries
mysql -u root -p -e "SELECT * FROM information_schema.PROCESSLIST WHERE TIME > 30" 
mysql -u root -p -e "KILL <id>"
```

---

## 12. Known Bottlenecks (Updated March 2026)

### Primary Bottleneck: `/filter` Search Endpoint — Bot/Crawler Flood

**Finding (Mar 14 2026):** The `/filter` endpoint attracts very heavy crawler traffic that saturates the 2-core VM. The slowness is not caused by slow PHP/DB/ES queries alone — it is caused by the *volume* of crawlers all hitting PHP simultaneously.

| Metric | Value |
|--------|-------|
| Route | `shamra_academia_filter` |
| Controller | `HomepageController::filterDataAction` |
| Typical request time | 1–4 seconds |
| Sustained traffic rate | **700–950 requests/minute** (all through Cloudflare CDN) |
| Server load avg | 13–17 on a 2-core VM (healthy max: ~2.0) |
| CPU in user space | ~84%, 0% idle |
 | Apache prefork workers | 66+ simultaneous PHP processes |

**Top offending crawlers (from access log):**

| Crawler | Agent substring | Impact |
|---------|----------------|--------|
| Meta web indexer | `meta-webindexer/1.1` | ~16% of requests |
| OpenAI SearchBot | `OAI-SearchBot/1.3` | ~6% of requests |
| Googlebot | `Googlebot/2.1` | Ongoing |
| Bingbot | `bingbot/2.0` | Ongoing |
| Amazon SearchBot | `Amzn-SearchBot` | Ongoing |

**Why the 2-core Apache prefork setup gets overwhelmed:**  
Each PHP process ties up a worker for 1–4 seconds. At 700+ req/min with 66 workers, requests queue faster than they can be served → load average climbs to 13–17.

**Fix implemented (Mar 14 2026) ✅**

1. **`CDN-Cache-Control` header** — Anonymous `/filter` responses now return:
   ```
   CDN-Cache-Control: public, max-age=86400, stale-while-revalidate=300
   ```
   **Why `CDN-Cache-Control` and not `Cache-Control`:** Symfony automatically injects `private`
   into `Cache-Control` whenever the session is touched (auth check, CSRF, etc.), which causes
   Cloudflare to return `cf-cache-status: DYNAMIC` and skip caching entirely. `CDN-Cache-Control`
   is a Cloudflare-specific header that takes priority over `Cache-Control` for CDN decisions
   and is stripped before the response reaches the browser — Symfony never interferes with it.

   **Verified working:** `cf-cache-status: HIT` confirmed on repeated filter requests.

2. **Logged-in users** are unaffected — their responses remain `Cache-Control: private, no-cache, no-store`.
   The `CDN-Cache-Control` header is only set for anonymous (unauthenticated) requests.

3. **Cache TTL:** 24 hours (`max-age=86400`), `stale-while-revalidate=300` so Cloudflare
   refreshes in the background without users seeing a miss.

> ⚠️ **Do NOT add `noindex` to filter pages.** These pages rank in Google and drive real traffic
> (e.g. `/filter?tags=soil+phosphorus`). CDN caching is the correct way to reduce origin load.

**Limitation — unique URL problem:**  
Crawlers explore the full tag space, hitting ~1,400 unique URLs per 2,000 requests. The cache
only helps for URLs that have been visited before (cache warm). First-pass crawling still hits
origin. Load will trend down as the cache warms over hours/days but won't drop to zero from
CDN caching alone.

**Additional protections deployed (Mar 14 2026):**
- ✅ Cloudflare WAF Rule: Challenges Chrome/124 + Vietnam + /filter traffic (systematic scraper)
- ✅ Bot Fight Mode: Auto-challenges generic automated traffic
- ✅ Block AI Training Bots: GPTBot, OAI-SearchBot, CCBot, etc. blocked at Cloudflare edge

**Result:** Load dropped from 13–17 to ~4 within 3 hours of deploying these protections.

### Secondary Bottleneck: Research Show Page

| Metric | Value |
|--------|-------|
| Route | `shamra_academia_research_show` |
| Typical slow range | 2–8 seconds |
| % of slow requests | ~20% |

---

## 13. Potential Improvements

### High Priority (Filter Endpoint)

1. **✅ DONE — `CDN-Cache-Control` for anonymous filter responses** (Mar 14 2026)
   - `CDN-Cache-Control: public, max-age=86400` — Cloudflare caches `/filter?...` for 24 hours
   - Must use `CDN-Cache-Control`, NOT `Cache-Control` — Symfony injects `private` into
     `Cache-Control` via session handling, which causes `cf-cache-status: DYNAMIC`
   - Logged-in user responses remain private/uncached
   - **Verified:** `cf-cache-status: HIT` confirmed
   - ⚠️ Do NOT add `noindex` to `/filter` pages — they rank in Google and bring real traffic

2. **✅ DONE — Cloudflare WAF Rule: Block Vietnam scraper** (Mar 14 2026)
   - Expression: `(http.user_agent contains "Chrome/124.0.0.0" and ip.geoip.country eq "VN" and http.request.uri.path contains "/filter")`
   - Action: **Managed Challenge** (CAPTCHA)
   - Targets systematic scraper using outdated Chrome UA from Vietnam hitting /filter
   - Real users can pass the challenge; bots fail

3. **✅ DONE — Cloudflare Bot Fight Mode enabled** (Mar 14 2026)
   - Security → Bots → Bot Fight Mode: ON
   - Auto-detects and challenges automated traffic without manual rules

4. **✅ DONE — Cloudflare Block AI Training Bots** (Mar 14 2026)
   - Security → Bots → Block AI Bots: ON (all pages)
   - Blocks: GPTBot, OAI-SearchBot, CCBot, Google-Extended, Bytespider, etc.
   - These bots consume CPU but provide no SEO value

5. **`robots.txt` Crawl-delay for Meta crawler (OPTIONAL)**
   - Meta web-indexer is still allowed (needed for Facebook/Instagram link previews)
   - If it's still causing load, add to `public/robots.txt`:
     ```
     User-agent: meta-webindexer
     Crawl-delay: 30
     ```
   - Note: Compliance is not guaranteed but Meta generally respects it

6. **Cloudflare Cache Rule (OPTIONAL — belt-and-suspenders)**
   - Forces caching on `/filter*` at the Cloudflare level regardless of response headers
   - Dashboard → Rules → Cache Rules: URI path contains `/filter`, Edge TTL override 24h,
     bypass cookie `PHPSESSID` for logged-in users
   - More reliable than `CDN-Cache-Control` as it can't be overridden by PHP

7. **Profile Elasticsearch queries**
   ```bash
   curl -X POST 'https://shamraindex:9200/arabic_research/_search?pretty' \
     -H 'Content-Type: application/json' \
     -d '{"profile": true, "query": {...}}'
   ```

8. **Switch Apache prefork → event MPM + PHP-FPM**
   - Prefork spawns one process per connection; event MPM reuses workers
   - On a 2-core server this alone could handle 3–4× the current concurrency

### Medium Priority

9. **Enable MySQL slow query log**
   ```sql
   SET GLOBAL slow_query_log = 'ON';
   SET GLOBAL long_query_time = 2;
   ```

10. **Add database indexes** for common filter combinations

11. **Add swap space** — currently 0 swap on prod, no memory safety valve

### Lower Priority

12. **CDN for static assets** (already partial via Cloudflare)

13. **Connection pooling** for MySQL

14. **Read replicas** for heavy read operations

---

## Quick Commands Reference

### From PowerShell (via SSH)
```powershell
# Server health
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "uptime; free -h; df -h /"

# Recent errors
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "tail -50 /var/www/html/academia_v2/var/log/prod.log | grep -E 'CRITICAL|ERROR'"

# Slow requests
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "grep 'Slow request' /var/www/html/academia_v2/var/log/prod.log | tail -20"

# Count slow by route
ssh -i C:\Users\shadisaleh\Documents\linux\shamramain_user.pem azureuser@20.241.4.71 "tail -5000 /var/www/html/academia_v2/var/log/prod.log | grep 'shamra_academia_filter' | grep 'Slow' | wc -l"

# Health check
curl -s https://shamra-academia.com/health/full
```

### On Server (after SSH-ing in)
| What | Command |
|------|---------||
| Server load | `uptime` |
| Memory | `free -h` |
| Disk | `df -h` |
| Recent errors | `grep "CRITICAL\|ERROR" /var/www/html/academia_v2/var/log/prod.log \| tail -20` || Slow requests | `grep "Slow request" /var/www/html/academia_v2/var/log/prod.log \| tail -20` |
| Slow by route | `grep "Slow request" prod.log \| grep -oP '"route":"[^"]+"' \| sort \| uniq -c \| sort -rn` |
| Health check | `curl -s https://shamra-academia.com/health/full \| jq` || Apache status | `systemctl status apache2` |
| Messenger status | `systemctl status shamra-messenger-worker` |
| Cache warmup | `sudo -u www-data php bin/console cache:warmup --env=prod` |
| ES health | `curl -u elastic:PASS 'https://shamraindex:9200/_cluster/health?pretty'` |
