# AI Text Detection — Implementation

> Detect AI-generated content in academic texts. Deployed as a standalone feature at `/arabic-ai-detection` with Arabic corpus baseline integration.

**Status**: Phase 0 + Phase 1 + Phase 1B + Phase 1C (Analytics, Feedback, SEO) — COMPLETE and deployed to production.
**Commits**: `8fae040f` (backend, 10 files), `011d1cd1` (frontend, 7 files), `ead1f1bd` (access control, 11 files), `ce1c3b37` (bugfix), `d8f60c34` (admin analytics tab), `c170d593` (SEO optimization), `9f4a010d` (feedback widget + CSAT/NPS), `0f223840` (feature settings DB persistence), `7dcada7a` (feedback nudge trigger fix)

---

## 1. Product Overview

### What It Does
Users visit `/arabic-ai-detection` → paste text → the system analyzes it and returns:
- **AI probability score** (0–100%) displayed as an animated SVG ring gauge
- **Verdict**: Human / Likely Human / Mixed / Likely AI / AI Generated (5-level scale)
- **Confidence level**: Low / Medium / High
- **Sentence-level highlighting**: color-coded sentences showing which parts are likely AI vs. human
- **Baseline comparison table**: submitted text metrics vs. human/AI corpus baselines
- **Brief explanation** of why the text was flagged

### Target Users
- Researchers verifying originality of submissions
- Students self-checking before submission
- Journal editors / reviewers screening papers
- Academic institutions

### Credit Cost & Limits
- **3 credits** per detection (`ai_detection` in `UsageMonitorService`)
- Word limits: **50 minimum**, **10,000 maximum** (per request)
- Word counting uses Unicode-aware regex: `preg_match_all('/[\p{Arabic}\w]+/u', $text)`

### Access Control (4-State System)

| State | User Type | Condition | Behavior |
|-------|-----------|-----------|----------|
| 1 | Visitor (not logged in) | — | UI disabled, login prompt shown |
| 2 | Logged in, subscribed | Any tier | Full access |
| 3 | Logged in, not subscribed | `ai_detection_free_access = true` | Free access |
| 4 | Logged in, not subscribed | `ai_detection_free_access = false` | Subscribe prompt shown |

Admin toggle: `app_setting` DB table → key `ai_detection_free_access` (boolean string `'true'`/`'false'`).
Managed via Settings tab in admin dashboard (`/jim19ud83/playground/dashboard`).

> **Note**: Previously stored in `config/ai_prompts/feature_settings.json`, which was tracked in git and reset on every deploy. Migrated to DB table `app_setting` in commit `0f223840` to persist across deployments.

---

## 1B. Analytics & Tracking

### Page Visit Tracking

Every page load of `/arabic-ai-detection` logs a row to `ai_detection_visit`:

| Column | Type | Description |
|--------|------|-------------|
| `id` | INT AUTO_INCREMENT | Primary key |
| `user_id` | INT (nullable FK → fos_user) | Logged-in user, or NULL for visitors |
| `visited_at` | DATETIME | Timestamp of visit |

Indexes: `idx_adv_visited_at`, `idx_adv_user_visited`.

Logging is fire-and-forget (wrapped in try/catch — never breaks the page).

### Admin Analytics Tab

The **AI Detection** tab in the admin dashboard (`/jim19ud83/playground/dashboard`) displays:
- **Visits chart**: Daily visit counts (Chart.js bar chart)
- **Unique users chart**: Daily unique visitors (line chart)
- **Top users table**: Most frequent visitors with visit count
- **Summary cards**: Total visits, unique users, visits today, unique today

API: `GET /jim19ud83/playground/api/ai-detection-stats?days=N` returns `{ summary, daily[], top_users[], feedback{} }`.

Added in commit `d8f60c34`.

---

## 1C. User Feedback (CSAT / NPS)

### Feedback Widget

After analysis completes, logged-in users see a **sticky nudge pill** at the bottom of the page that slides up after a short delay:
- **2 seconds** if user has never submitted feedback
- **5 seconds** if user has previously submitted (allows re-feedback)

Clicking the nudge expands a **slide-up feedback panel** with:
1. **Star rating** (1–5, CSAT) — required
2. **NPS score** (1–10, "Would you recommend this?") — required
3. **Free-text comment** (optional, max 2,000 chars)
4. **Submit button** → `POST /api/ai-detection/feedback`

The nudge is triggered by a `MutationObserver` watching for the results div gaining the CSS class `active`. Only shown to logged-in users (`{% if app.user %}`).

### Feedback Database Table

`ai_detection_feedback` (migration `Version20260224150000.php`):

| Column | Type | Description |
|--------|------|-------------|
| `id` | INT AUTO_INCREMENT | Primary key |
| `user_id` | INT (nullable FK → fos_user, ON DELETE SET NULL) | Submitting user |
| `rating` | TINYINT | Star rating 1–5 (CSAT) |
| `recommend` | TINYINT | NPS score 1–10 |
| `comment` | TEXT (nullable) | Optional free-text feedback |
| `created_at` | DATETIME | Submission timestamp |

Indexes: `idx_adf_created`, `idx_adf_user`.

### Feedback API Endpoints

| Method | Route | Controller | Description |
|--------|-------|------------|-------------|
| POST | `/api/ai-detection/feedback` | `AiDetectionController::submitFeedback` | Submit stars + NPS + comment. Requires login. |
| GET | `/api/ai-detection/feedback-status` | `AiDetectionController::feedbackStatus` | Returns `{ has_feedback, feedback_count }` for current user |

### Admin Dashboard — CSAT Section

The AI Detection tab includes a **Customer Feedback (CSAT)** section:
- **4 summary cards**: Total Responses, Avg Rating (stars), CSAT Score (% rating ≥ 4), NPS Score
- **NPS breakdown bar**: Colored bar showing Promoters (9-10) / Passives (7-8) / Detractors (1-6) percentages
- **CSAT Trend chart**: Dual-axis chart — bar for daily response count, line for daily average rating
- **Recent Feedback table**: Last 50 entries showing user, stars, NPS badge (color-coded), comment, date

Feedback data is returned by the same `api/ai-detection-stats` endpoint under the `feedback` key:
```json
{
  "feedback": {
    "summary": { "total", "avg_rating", "csat_score", "nps_score", "promoters", "passives", "detractors" },
    "daily": [{ "date", "count", "avg_rating" }],
    "entries": [{ "user_name", "email", "rating", "recommend", "comment", "created_at" }]
  }
}
```

Added in commit `9f4a010d`.

---

## 1D. SEO Optimization

Targeting Arabic search keywords:
- **كشف الذكاء الاصطناعي** (AI detection)
- **فحص نسبة الذكاء الاصطناعي في النص** (check AI percentage in text)
- **نسبة الاقتباس من الذكاء الاصطناعي** (AI quotation ratio)

### Changes Made (commit `c170d593`):
- `page_title`: `"كشف الذكاء الاصطناعي | فحص نسبة الذكاء الاصطناعي في النص"`
- `page_description`: Includes all 3 target phrases + value proposition (9,000+ Arabic papers baseline)
- `metaKeywords`: Updated with target keywords
- Navbar label: Changed from "كشف AI" to "كشف الذكاء الاصطناعي" (`translations/UserBundle.ar.yml`)

---

## 1E. Feature Settings Persistence

The admin toggle for `ai_detection_free_access` was originally stored in `config/ai_prompts/feature_settings.json`. This file was tracked in git, so every deploy overwrote the admin's setting back to `false`.

**Fix (commit `0f223840`)**: Migrated to database table `app_setting`:

| Column | Type | Description |
|--------|------|-------------|
| `setting_key` | VARCHAR(128) PRIMARY KEY | Setting identifier |
| `setting_value` | TEXT | Setting value |
| `updated_at` | DATETIME | Last update timestamp |

Migration (`Version20260224180000.php`) creates the table and seeds `ai_detection_free_access = 'true'`.

All 3 controllers updated to read/write from DB instead of JSON:
- `AiDetectionController::getFeatureSetting()` — reads from DB
- `PlaygroundAdminController::readFeatureSettings()` / `apiSaveFeatureSettings()` — reads/writes DB
- `PlaygroundAPIController::checkAiDetectionAccess()` — reads from DB

---

## 2. Detection Approach

### Hybrid: Statistical Analysis + LLM Classification

The detection pipeline combines two layers:

**Layer 1 — Arabic Text Analyzer** (`ArabicTextAnalyzer.php`, 503 lines):
Pure PHP statistical analysis computing 7 metric categories from the submitted text, then comparing against pre-computed baselines from our Arabic corpus.

**Layer 2 — LLM Classification** (`AiDetectionService.php`, 269 lines):
Azure OpenAI (main GPT model, non-streaming) receives the text + statistical comparison as structured context, and returns a scored judgment with sentence-level analysis.

### Arabic Corpus Baseline

**This is our competitive advantage.** We built baselines from real data:

**Human Baseline** — computed from **8,730 pre-2022 Arabic academic papers** (pre-ChatGPT era) via the `content` field in the `arabic_research` Elasticsearch index. 740 documents were skipped due to short content (< 200 words).

**AI Baseline** — computed from **100 GPT-4o-mini samples** generated by rewording randomly selected abstracts from the same corpus.

#### Baseline Results (Production)

| Metric | Human (8,730 docs) | AI (100 samples) | Key Insight |
|--------|---------------------|-------------------|-------------|
| Sentence length mean | 43.96 words | 23.14 words | AI writes much shorter sentences |
| Sentence length std dev | 10.50 | 5.41 | Humans vary more |
| Connector density / 100 sentences | 1.38 | 42.96 | **AI uses 31× more connectors** |
| First-person markers / 100 sentences | 5.85 | 0.00 | AI never uses first-person |
| Passive constructions / 100 sentences | 29.33 | 77.86 | AI strongly prefers passive voice |
| Burstiness index | 0.2245 | 0.2917 | Surprisingly similar |
| Vocabulary richness (TTR/100) | 0.8228 | 0.8376 | Similar |

**Key finding**: Connector density is the strongest signal — Human 1.38 vs AI 42.96 per 100 sentences (AI uses Arabic hedging phrases like بالإضافة إلى ذلك، علاوة على ذلك ~31× more frequently).

#### Why This Approach Is Unique for Arabic

Most AI detection tools are English-first. Arabic barely works on GPTZero/Originality.ai. By building our own Arabic baseline from **real Arabic academic papers**, we have:

1. **Domain-specific calibration**: Baseline from actual Arabic academic research, not generic web text
2. **Dialect awareness**: Real Syrian/Levantine academic writing patterns differ from AI-generated MSA
3. **No external dependency**: Baselines live on our server as JSON files
4. **Continuous improvement**: Re-run commands as corpus grows or AI models change
5. **Competitive advantage**: No other Arabic platform has this

---

## 3. Architecture

### Component Diagram

```
┌─────────────────────────────────────────────────────────────┐
│           Standalone Page: /arabic-ai-detection             │
│  AiDetectionController → ai_detection/index.html.twig      │
│  (~1,130 lines — inline CSS + JS, SVG score ring, feedback) │
│  ┌───────────────────────────────────────────────────────┐  │
│  │  4-state access check (visitor/subscribed/free/deny)  │  │
│  │  ↓                                                    │  │
│  │  Text Input (RTL textarea, word counter)              │  │
│  │  Language selector: Auto / Arabic / English           │  │
│  │  ↓                                                    │  │
│  │  POST /api/playground/detect-ai  (JSON response)     │  │
│  │  ↓                                                    │  │
│  │  Results Panel:                                       │  │
│  │  • Animated SVG ring gauge (0-100%)                   │  │
│  │  • Verdict badge (5 levels, color-coded)              │  │
│  │  • Confidence indicator (Low/Medium/High)             │  │
│  │  • Sentence-level highlighting (color-coded)          │  │
│  │  • Baseline comparison table (text vs human vs AI)    │  │
│  │  • Key signals list                                   │  │
│  │  • Explanation text                                   │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│              PlaygroundAPIController                         │
│  #[Route('/detect-ai', name: 'detect_ai', methods: POST)]  │
│  • checkAiDetectionAccess()  — 4-state check               │
│  •   Admin → always allowed                                 │
│  •   Subscribed → always allowed                            │
│  •   Non-subscribed + toggle ON → allowed                   │
│  •   Otherwise → 403                                        │
│  • Validate word count (50–10,000)                          │
│  • Call AiDetectionService::detect()                        │
│  • Return JSON response                                     │
└─────────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│              AiDetectionService (269 lines)                  │
│  detect(string $text, User $user, string $language): array  │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ 1. Auto-detect language (Arabic vs Latin char count)  │ │
│  ├────────────────────────────────────────────────────────┤ │
│  │ 2. ArabicTextAnalyzer::analyze($text) → 9 metrics    │ │
│  ├────────────────────────────────────────────────────────┤ │
│  │ 3. Load baselines (arabic_baseline.json +             │ │
│  │    arabic_ai_baseline.json) — Arabic only             │ │
│  │    compareToBaseline() → 7 metric comparisons         │ │
│  │    formatComparisonForPrompt() → text for LLM         │ │
│  ├────────────────────────────────────────────────────────┤ │
│  │ 4. Load & populate detect_ai_prompt.md                │ │
│  │    Inject: {{text}}, {{language}}, {{stats_comparison}}│ │
│  ├────────────────────────────────────────────────────────┤ │
│  │ 5. AzureOpenAIService::chat() — main model, temp=0.1 │ │
│  │    Non-streaming, max_tokens=4096                     │ │
│  ├────────────────────────────────────────────────────────┤ │
│  │ 6. Parse JSON (strips markdown fences, fallback       │ │
│  │    on invalid JSON → score 50 / mixed / low)          │ │
│  ├────────────────────────────────────────────────────────┤ │
│  │ 7. Attach stats + comparison + latency_ms to result   │ │
│  │    Credit deduction handled by chat() internally      │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
                          │
              ┌───────────┴───────────┐
              ▼                       ▼
┌──────────────────────────┐ ┌──────────────────────────────┐
│  arabic_baseline.json    │ │  arabic_ai_baseline.json     │
│  (from 8,730 pre-2022    │ │  (from 100 GPT-4o-mini       │
│   human-written docs)    │ │   generated samples)         │
│  config/ai_prompts/      │ │  config/ai_prompts/          │
└──────────────────────────┘ └──────────────────────────────┘
            ↑ Generated once by:
            │ app:compute-arabic-baseline
            │ app:generate-ai-samples --count=100
```

### Files Created / Modified

| File | Status | Description |
|------|--------|-------------|
| `src/Service/Playground/ArabicTextAnalyzer.php` | **NEW** (503 lines) | Computes 9 linguistic metrics: sentence stats, vocabulary, connector density, first-person, passive, punctuation, burstiness, word/sentence counts |
| `src/Service/Playground/AiDetectionService.php` | **NEW** (269 lines) | Orchestrator: stats → baseline compare → LLM prompt → parse response |
| `src/Command/ComputeArabicBaselineCommand.php` | **NEW** | Scrolls pre-2022 `content` field from ES → `arabic_baseline.json` |
| `src/Command/GenerateAiSamplesCommand.php` | **NEW** | Generates AI rewrites via GPT-4o-mini → `arabic_ai_baseline.json` |
| `config/ai_prompts/arabic_baseline.json` | **NEW** | Human baseline (8,730 docs analyzed) |
| `config/ai_prompts/arabic_ai_baseline.json` | **NEW** | AI baseline (100 samples analyzed) |
| `config/ai_prompts/feature_settings.json` | **OBSOLETE** | Was feature toggles JSON; now superseded by `app_setting` DB table |
| `playground_prompts/detect_ai_prompt.md` | **NEW** (106 lines) | System + User prompt with `{{text}}`, `{{language}}`, `{{stats_comparison}}` |
| `src/Controller/AiDetectionController.php` | **NEW** (~175 lines) | Standalone page controller + visit logging + feedback API (submit, status) + DB feature settings |
| `templates/ai_detection/index.html.twig` | **NEW** (~1,130 lines) | Full standalone page: inline CSS/JS, SVG ring gauge, sentence highlighting, baseline comparison table, feedback widget (nudge pill + CSAT/NPS panel) |
| `src/Controller/PlaygroundAPIController.php` | **MODIFIED** | Added `detectAi()` route + `checkAiDetectionAccess()` private method |
| `src/Service/Playground/UsageMonitorService.php` | **MODIFIED** | Added `ai_detection => 3` to operation costs |
| `src/Controller/PlaygroundAdminController.php` | **MODIFIED** | Added `GET/POST /api/feature-settings` endpoints for admin toggle |
| `templates/admin/playground/dashboard.html.twig` | **MODIFIED** | Added Settings tab with AI detection free access toggle |
| `templates/header.html.twig` | **MODIFIED** | Added nav link with `fa-shield` icon |
| `translations/AiDetection.ar.yml` | **NEW** | ~50 Arabic translation keys (separate domain) — includes SEO meta, feedback keys |
| `translations/AiDetection.en.yml` | **NEW** | ~50 English translation keys — includes SEO meta, feedback keys |
| `translations/UserBundle.ar.yml` | **MODIFIED** | Added `header.ai_detection` key |
| `translations/UserBundle.en.yml` | **MODIFIED** | Added `header.ai_detection` key |
| `translations/Subscribe.ar.yml` | **MODIFIED** | Added `subscribe.feature.ai_detection` key |
| `translations/Subscribe.en.yml` | **MODIFIED** | Added `subscribe.feature.ai_detection` key |
| `src/syndex/.../subscription_new.html.twig` | **MODIFIED** | AI Detection feature listed on Starter, Researcher, Enterprise plans |
| `migrations/Version20260224120000.php` | **NEW** | Creates `ai_detection_visit` table for page visit tracking |
| `migrations/Version20260224150000.php` | **NEW** | Creates `ai_detection_feedback` table for CSAT/NPS feedback |
| `migrations/Version20260224180000.php` | **NEW** | Creates `app_setting` key-value table + seeds `ai_detection_free_access` |

**NOT modified** (deviations from original plan):
- `public/js/playground.js` — standalone page was built instead of playground sidebar panel
- `src/Service/Playground/AzureOpenAIService.php` — existing `chat()` method was reused, no new method needed
- `translations/Playground.ar.yml` / `Playground.en.yml` — separate `AiDetection` domain used instead

---

## 4. Implementation Details

### ArabicTextAnalyzer — Statistical Engine (503 lines)

**Constants:**
- `AI_CONNECTORS` — 26 Arabic hedging/connector phrases that AI overuses
- `FIRST_PERSON` — 20 first-person academic verb forms (نقوم، قمنا، وجدنا، etc.)
- `PASSIVE_MARKERS` — 11 passive/impersonal constructions (تم، يُعد، يُعتبر، etc.)
- `SENTENCE_REGEX` — splits on `.!?؟` with minimum 3-word threshold

**Public Methods:**
- `analyze(string $text): array` — returns 9 metrics: `sentence_stats`, `vocabulary`, `connectors`, `first_person`, `passive`, `punctuation`, `burstiness`, `word_count`, `sentence_count`
- `compareToBaseline(array $stats, array $human, array $ai): array` — compares 7 metrics (avg sentence length, std dev, burstiness, TTR, connector density, first-person, passive) against both baselines, returns per-metric `closer_to` indicator and summary counts
- `formatComparisonForPrompt(array $comparison): string` — generates human-readable text for LLM injection with `← HUMAN` / `→ AI` / `≈` arrows

**Vocabulary Richness:** Uses sliding 100-word window TTR (type-token ratio) to avoid text-length bias — steps by 50, averages all windows.

**Burstiness:** Coefficient of variation of sentence lengths (`stdDev / mean`). Returns 0 for texts with < 3 sentences.

### AiDetectionService — Orchestrator (269 lines)

**Pipeline:**
1. Auto-detect language (Arabic char count vs Latin char count)
2. Run `ArabicTextAnalyzer::analyze()`
3. Load baselines from `config/ai_prompts/` (only for Arabic — English relies on LLM only)
4. Compare text stats to baselines
5. Load prompt from `playground_prompts/detect_ai_prompt.md` (regex-extracts System/User sections)
6. Inject `{{text}}`, `{{language}}`, `{{stats_comparison}}` into user prompt
7. Call `AzureOpenAIService::chat()` (main model, temperature=0.1, max_tokens=4096, operation=`ai_detection`)
8. Parse JSON response — strips markdown code fences, falls back to `score:50 / mixed / low` on parse failure
9. Attach metadata: `stats.text_metrics`, `stats.baseline_available`, `stats.comparison`, `language`, `latency_ms`

### Detection Prompt (106 lines)

**System prompt** instructs the model as an AI detection expert with:
- 7 evaluation signals (perplexity uniformity, burstiness, vocabulary patterns, structural uniformity, voice/personality, stylistic tells, academic-specific)
- Arabic-specific patterns (formal connectors, first-person usage, passive constructions)
- Calibration rules: statistical evidence must be weighted heavily, academic formality alone isn't penalized, mixed content = 40-60%

**User prompt** template with:
- `{{stats_comparison}}` — injected baseline comparison
- `{{text}}` — the text to analyze
- Required JSON output: `score`, `verdict` (5 levels), `confidence`, `explanation`, `key_signals`, `sentences[]`
- Scoring guide: 0-25 human, 26-50 likely human, 51-70 mixed, 71-85 likely AI, 86-100 AI generated
- Sentence limit: at most 30 sentences, flag as "ai" if score > 60

### API Route

**POST** `/api/playground/detect-ai` (in `PlaygroundAPIController.php`)

```
Request:  { "text": "...", "language": "auto|ar|en" }
Response: { "success": true, "detection": { score, verdict, confidence, explanation, key_signals, sentences[], stats, language, latency_ms } }
Errors:   400 (too short/long), 403 (unauthorized), 500 (detection failed)
```

Access check via `checkAiDetectionAccess()`:
- Admin → always allowed
- `SubscribeService::isSubscribed($user)` → always allowed
- Non-subscribed + `app_setting` table `ai_detection_free_access = 'true'` → allowed
- Otherwise → 403

### Standalone Page Controller

`AiDetectionController.php` (~175 lines) at route `/arabic-ai-detection`:
- Reads `app_setting` DB table for `ai_detection_free_access` toggle
- Checks subscription status via `SubscribeService`
- Logs page visit to `ai_detection_visit` table
- Passes `canAnalyze`, `isSubscribed`, `isFreeTierEnabled`, `subscription` to template
- Template handles all 4 UI states (visitor/subscribed/free-access/subscribe-prompt)

**Additional routes:**
- `POST /api/ai-detection/feedback` — submit CSAT + NPS + comment (login required)
- `GET /api/ai-detection/feedback-status` — check if current user has submitted feedback

### Frontend UI (~1,130 lines template)

**Score Display:**
| Score Range | Color | Verdict |
|------------|-------|---------|
| 0-25% | Green `#22c55e` | Human Written |
| 26-50% | Light Green `#84cc16` | Likely Human |
| 51-70% | Yellow `#eab308` | Mixed / Uncertain |
| 71-85% | Orange `#f97316` | Likely AI |
| 86-100% | Red `#ef4444` | AI Generated |

**Components:**
- RTL-aware textarea with live word counter (50–10,000)
- Language dropdown (Auto / Arabic / English)
- Animated SVG circular ring score gauge with color gradient
- Verdict badge with icon
- Scrollable sentence list with per-sentence scores and color-coded highlighting
- Baseline comparison table showing Text vs Human vs AI for each metric
- Key signals display
- Explanation paragraph
- Disclaimer notice

### Feedback Widget

**Sticky nudge pill** (`det-fb-nudge`) + **slide-up panel** (`det-fb-panel`), visible only for logged-in users:
1. `MutationObserver` watches `.det-results` for `class` attribute changes
2. When results div gains `.active` class, `maybeShowNudge()` fires after 2s (or 5s if user already submitted)
3. User dismisses nudge → panel stays hidden; clicks nudge → panel slides up
4. Star rating (1–5 CSAT, visual star icons) + NPS (1–10 button row) + comment textarea
5. Submit → `POST /api/ai-detection/feedback` → thank-you state, then auto-hide

**Bug fix (commit `7dcada7a`):** The observer originally used `attributeFilter: ['style']` but the results div toggles visibility via CSS class (`.active`), not inline style. Fixed to `attributeFilter: ['class']` + `classList.contains('active')`.

### Admin Dashboard Integration

**Settings tab** in `/jim19ud83/playground/dashboard`:
- Toggle switch for AI Detection free access
- Reads/writes `app_setting` DB table (key: `ai_detection_free_access`)
- API endpoints: `GET /jim19ud83/playground/api/feature-settings`, `POST /jim19ud83/playground/api/feature-settings`

**AI Detection tab** (commit `d8f60c34`):
- Visit analytics: daily visits bar chart, unique users line chart, summary cards, top users table
- CSAT/NPS section (commit `9f4a010d`): summary cards, NPS breakdown bar, trend chart, recent entries table
- API: `GET /jim19ud83/playground/api/ai-detection-stats?days=N`

### Subscription Page Integration

AI Detection feature (`subscribe.feature.ai_detection`) listed on:
- Starter plan ($9/500 credits)
- Researcher plan ($19/1,500 credits)
- Enterprise plan ($99/15,000 credits)

---

## 5. Baseline Commands

### `app:compute-arabic-baseline`

Scrolls all pre-2022 documents from `arabic_research` ES index via the `content` field:
- Filters: `createdAt < 2022-01-01`
- Skips documents with content < 200 words (740 skipped)
- Computes per-document stats using `ArabicTextAnalyzer::analyze()`
- Aggregates means across all documents
- Output: `config/ai_prompts/arabic_baseline.json`
- Result: **8,730 documents analyzed**

### `app:generate-ai-samples`

Picks random pre-2022 abstracts, asks GPT-4o-mini to rewrite them as AI-generated academic text:
- Uses `--count=100 --delay=1000` (ms between API calls to avoid rate limits)
- Computes same stats on generated text
- Aggregates into `config/ai_prompts/arabic_ai_baseline.json`
- Result: **100 samples generated and analyzed**

### Run on prod (already executed):

```bash
sudo -u www-data php bin/console app:compute-arabic-baseline --env=prod
sudo -u www-data php bin/console app:generate-ai-samples --count=100 --delay=1000 --env=prod
```

---

## 6. Key Decisions Made

| Decision | Chosen | Rationale |
|----------|--------|-----------|
| Streaming vs. non-streaming | **Non-streaming** `JsonResponse` | Detection needs full JSON; ~3-5 second latency is acceptable |
| Subscription tier | **All tiers** (any subscribed user) | Plus free access toggle for non-subscribed users when enabled by admin |
| Store results in DB | **No** (Phase 1) | Skip for now, add `AiDetectionLog` entity in Phase 5 |
| Playground sidebar vs standalone | **Standalone page** at `/arabic-ai-detection` | Better UX for a complex feature; dedicated URL for marketing/sharing |
| Baseline text field | **`content`** field (not `arabic_abstract`) | Richer data, 6K-57K words per doc; 740 docs with short content were skipped |
| Baseline scope | **Single aggregate** (all disciplines) | Per-field baselines deferred to Phase 2 |
| AI sample count | **100 samples** (not 200) | Sufficient for baseline; reduced from plan to avoid timeout issues |
| AI sample model | **GPT-4o-mini** | Cost-effective for sample generation |
| Translation domain | **Separate `AiDetection`** domain | Cleaner than adding to `Playground` domain |
| Word count method | **Unicode-aware regex** | `preg_match_all('/[\p{Arabic}\w]+/u', $text)` handles Arabic properly |
| Feature settings storage | **DB table** (`app_setting`) | JSON file tracked in git was reset on every deploy |
| Feedback mechanism | **CSAT (1–5 stars) + NPS (1–10)** | Industry-standard satisfaction metrics; nudge pill UX for non-intrusive collection |
| Visit tracking | **Direct SQL insert** (fire-and-forget) | Lightweight, no entity needed; analytics tab queries raw SQL |
| Feedback visibility trigger | **CSS class MutationObserver** | Watch `classList.contains('active')` on results div; reliable for class-toggled visibility |

---

## 7. Database Tables

| Table | Migration | Purpose |
|-------|-----------|----------|
| `ai_detection_visit` | `Version20260224120000` | Page visit tracking for analytics |
| `ai_detection_feedback` | `Version20260224150000` | User CSAT (stars 1–5) + NPS (1–10) + comments |
| `app_setting` | `Version20260224180000` | Key-value feature settings (replaces JSON file) |

All tables use `utf8mb4_unicode_ci` collation and `InnoDB` engine.

---

## 8. Future Enhancements

### Phase 2: Expand Baseline Corpus
- Build separate baselines per academic field (sciences vs. humanities vs. engineering)
- Generate AI samples with different models (Claude, Gemini) for multi-model baseline
- Re-run baseline computation periodically as corpus grows
- Add publication_date-based weighting

### Phase 3: External API Fallback
- Integrate Copyleaks or GPTZero API for English cross-validation
- Show both LLM score and API score for higher confidence

### Phase 4: Batch Detection
- Upload DOCX/PDF for scanning (reuse DocumentParser)
- Process in chunks via Symfony Messenger (async)
- Email results when complete (reuse OCR email pattern)

### Phase 5: Detection History & Reports
- New `AiDetectionLog` entity to store results
- Detection history on user's playground dashboard
- PDF report generation for institutional users

---

## 9. Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|-----------|
| LLM detection accuracy varies | False positives/negatives | Arabic corpus baseline provides hard statistical evidence; disclaimer on results |
| Paraphrased human text flagged | User trust erosion | Baseline calibrated on real academic formality; statistical comparison shown to users |
| Baseline drift as AI improves | AI text may mimic human patterns | Re-run `generate-ai-samples` with newer models periodically |
| Token cost for long texts | High Azure OpenAI usage | 10,000 word cap; 3 credits per request |
| Adversarial evasion | Detection bypassed | Not our threat model — this is a self-check tool |
| Baseline JSON missing on deploy | Detection silently degrades | Service falls back to LLM-only mode; logs warning |
| Feature toggle reset on deploy | Admin loses control | **Fixed**: Migrated from git-tracked JSON to `app_setting` DB table |
| Feedback widget never appears | No feedback data collected | **Fixed**: MutationObserver bug caught in testing; watches CSS class now |
