# Plan: Topic Landing Pages for SEO

> **Status**: Phase 1 Complete — Search Expansion Active  
> **Last Updated**: March 8, 2026  
> **Priority**: Very High  
> **Positioning**: Arabic-first topic intelligence powered by LLM + search feedback loops

## TL;DR
Google Search Console shows nearly all traffic comes from "رسائل ماجستير عن {topic} pdf" queries, but raw ES matching is not enough for intent coverage. Build an **LLM-powered Arabic Topic Intelligence Layer** that converts English/ambiguous queries into high-quality Arabic topic packs (canonical topic, synonyms, related terms, intent variants), then use those packs to power `/topics/{slug}` pages and rescue zero-click queries.

---

## Implementation Progress

### ✅ COMPLETED (March 8, 2026)

| Step | Status | Details |
|------|--------|---------|
| **Step 1: Entities** | ✅ Done | `Topic` + `TopicEnrichment` entities created, migrations applied |
| **Step 2: Enrichment Pipeline** | ✅ Done | `app:enrich-topics` command with priority ranking, JSON schema validation, quality scoring |
| **Step 3: ES Integration** | ✅ Done | Search expansion via `TopicEnrichmentRepository::findBestApprovedForQuery()` integrated into `HomepageController` |
| **Feature Flag** | ✅ Enabled | `topic_enrichment_search_enabled: true` in production |

**Current Data:**
- 49 approved topic enrichments
- 1 rejected enrichment
- 0 pending drafts

**Key Files:**
- `src/Entity/Topic.php` + `src/Entity/TopicEnrichment.php`
- `src/Repository/TopicEnrichmentRepository.php` — `findBestApprovedForQuery()`
- `src/Command/EnrichTopicsCommand.php`
- `src/syndex/AcademicBundle/Controller/HomepageController.php` — `getApprovedTopicExpansionTerms()`

### 🔄 IN PROGRESS

| Step | Status | Next Action |
|------|--------|-------------|
| **Step 4: TopicController** | 🔜 Next | Create `/topics` and `/topics/{slug}` routes |
| **Step 7: Seed Topics** | Partial | Continue running `app:enrich-topics --limit=100` to build coverage |

### 📋 TODO

| Step | Description |
|------|-------------|
| **Step 5** | Topic page template (`templates/topic/show.html.twig`) |
| **Step 6** | Schema.org structured data (CollectionPage + ItemList) |
| **Step 8** | CSAT + zero-click recovery loop |
| **Step 9** | Internal linking (research show pages → topic pages) |
| **Step 10** | Cron job for paper counts |

---

## Current State

- **Tag entity**: `src/syndex/AcademicBundle/Entity/Tag.php` — table `academy_tag` with `id`, `context`, `slug`
- **Tag interests**: `src/Entity/TagInterestsUser.php` — tracks user-topic interest
- **Filter page**: `/filter?tags={tag}` — basic search results for a tag, no curated content
- **No topic pages exist**: There's no `/topics/` route or controller
- **Top GSC queries** (position 4-10, high impressions): الطاقة الشمسية, سلوك المستهلك, إنترنت الأشياء, التنمية الريفية, المدن الذكية, القانون الدولي الإنساني, etc.

### Key Problem to Solve

- ES lexical search alone misses Arabic variants, spelling variants, cross-language terms, and intent phrasing.
- High-impression queries with low clicks suggest result relevance mismatch, not only indexing gaps.
- We have good LLM quota, but need a **smart budget policy** so we spend on high-impact queries first.

---

## Strategic Shift (Core Idea)

Instead of creating static topic pages first, build a reusable **Topic Intelligence pipeline**:

1. Normalize real user queries (GSC + internal search logs)
2. Use LLM to generate Arabic-first topic enrichment (not just translation)
3. Store enriched topic packs for deterministic reuse in ES query building
4. Use CSAT + CTR + zero-click signals to continuously improve topic packs
5. Materialize high-performing packs into topic landing pages

---

## Steps

### Step 1: Create Topic Intelligence Entities (Foundation) ✅ DONE

Create `src/Entity/Topic.php` (as planned) and add `src/Entity/TopicEnrichment.php`:

`Topic` keeps editorial + SEO identity, while `TopicEnrichment` stores model output and versions.

`TopicEnrichment` fields:
- `id`
- `topic` (ManyToOne → `Topic`)
- `sourceQuery` (string) — query that triggered enrichment
- `language` (string, default `ar`)
- `canonicalTopicAr` (string)
- `canonicalTopicEn` (string, nullable)
- `synonymsAr` (json)
- `relatedConceptsAr` (json)
- `queryIntentsAr` (json) — e.g., رسائل ماجستير، أدوات قياس، مراجعة أدبيات
- `negativeTerms` (json, nullable) — terms to de-boost/exclude when noisy
- `qualityScore` (float, nullable)
- `modelName` (string)
- `modelVersion` (string)
- `status` (string: draft/approved/rejected)
- `createdAt`, `updatedAt`

Keep existing `Topic` fields:
- `id` (int, PK)
- `titleAr` (string 255) — Arabic topic name
- `titleEn` (string 255) — English topic name
- `slug` (string 255, unique) — URL-friendly slug
- `descriptionAr` (text, nullable) — Arabic description/intro paragraph
- `descriptionEn` (text, nullable) — English description
- `tag` (ManyToOne → Tag, nullable) — link to existing tag for ES queries
- `esQueryTerms` (json) — additional ES search terms for this topic
- `fieldId` (ManyToOne → Field, nullable) — academic field
- `metaTitleAr`, `metaTitleEn` (string 255) — SEO meta titles
- `metaDescriptionAr`, `metaDescriptionEn` (text) — SEO meta descriptions
- `isActive` (boolean, default true)
- `createdAt` (datetime_immutable)
- `paperCount` (integer, default 0) — cached count, updated by cron

Create migration: `migrations/VersionYYYYMMDDHHMMSS.php`

### Step 2: LLM Enrichment Pipeline (Quota-Smart) ✅ DONE

Create a command + service pipeline:

- `app:enrich-topics --source=gsc|search_logs --limit=200`
- `TopicEnrichmentService` calls existing LLM provider with strict JSON schema output
- Store result in `TopicEnrichment` and link/create `Topic`

**Quota policy (daily budget split):**
- **50%**: zero-click + high-impression queries (highest SEO lift)
- **30%**: high-frequency internal queries with low CTR
- **20%**: exploratory long-tail clustering

**Priority score** per query:

`priority = impressions_weight * gsc_impressions + search_volume_weight * internal_count + pain_weight * (1 - ctr) + zero_click_bonus`

Only enrich top-N by priority daily.

### Step 3: Retrieval Integration (ES + Topic Packs) ✅ DONE

Before running ES query:
1. Detect if query maps to existing `Topic`/`TopicEnrichment`
2. Expand query with `canonicalTopicAr + synonymsAr + relatedConceptsAr`
3. Apply weighted bool query:
   - high boost: canonical phrase
   - medium boost: synonyms
   - low boost: related concepts
4. Apply `negativeTerms` de-boost/exclusions when needed

This keeps runtime deterministic and cheap (LLM runs offline, not per request).

### Step 4: Create TopicController + Pages 🔜 NEXT

Create `src/Controller/TopicController.php`:

**`GET /topics` → `index()`**:
- List all active topics, grouped by field
- Show paper count per topic
- Search/filter topics
- Template: `templates/topic/index.html.twig`

**`GET /topics/{slug}` → `show()`**:
- Fetch Topic entity by slug
- Query ES using approved `TopicEnrichment` pack (canonical + synonyms + intents)
- Query `AcademicQuestion` matching topic's tag
- Query `Community` matching topic's tag
- Fetch trending papers within topic (filter `trending_research_cache` by topic ES query)
- Template: `templates/topic/show.html.twig`

### Step 5: Topic Page Template (Arabic-First)

Create `templates/topic/show.html.twig`:

```
Layout:
┌─────────────────────────────────────────────┐
│ H1: رسائل ماجستير ودكتوراه عن {topic}       │
│ Description paragraph (SEO-optimized)        │
│ Stats: {paperCount} papers · {qaCount} Q&A   │
├───────────────────────┬─────────────────────┤
│ Papers (20/page)      │ Sidebar:            │
│ - research cards      │ - Related topics    │
│ - pagination          │ - Related communities│
│                       │ - Top Q&A questions  │
│                       │ - "Use AI Tools" CTA │
│                       │ - Follow this topic  │
└───────────────────────┴─────────────────────┘
```

### Step 6: Schema.org Structured Data

Add `CollectionPage` + `ItemList` JSON-LD to topic pages:
- `@type: CollectionPage`
- `name`: "رسائل ماجستير عن {topic}"
- `description`: topic description
- `mainEntity`: `ItemList` of `ScholarlyArticle` items

### Step 7: Seed + Generate Initial Topics from GSC Data

Create `app:seed-topics` + enrichment step:
1. Reads top 50 GSC queries (hardcoded or from CSV)
2. Extracts the topic name (strip "رسائل ماجستير عن" and "pdf")
3. Creates initial `Topic`
4. Runs enrichment to generate Arabic canonical pack + variants
5. Stores approved enrichment snapshot

For `Topic` entity:
   - `titleAr` = extracted name
   - `slug` = transliterated or ID-based slug
   - `esQueryTerms` = `["topic name"]`
   - `metaTitleAr` = "رسائل ماجستير ودكتوراه عن {topic} - شمرا أكاديميا"
   - `metaDescriptionAr` = "تصفح أحدث رسائل الماجستير والدكتوراه عن {topic}. أبحاث ودراسات أكاديمية محكّمة مع أدوات ذكاء اصطناعي للتلخيص والترجمة."

Initial topics (from GSC top queries):
إنترنت الأشياء, سلوك المستهلك, المرأة الريفية, التنمية الريفية, المدن الذكية, الطاقة الشمسية, الأداء التنظيمي, النباتات الطبية, الجاحظ, القانون الدولي الإنساني, تقنية النانو, العلاقات الدولية, اليقظة العقلية, الهيكل التنظيمي, رضا العملاء, إعادة الإعمار, العلامة التجارية, المحاسبة الإبداعية, رأس المال البشري, الملكية الفكرية

**Priority Topic Candidates** (GSC data March 2026 — start with these):

| Topic | Impressions | Clicks | CTR | Position | Notes |
|-------|-------------|--------|-----|----------|-------|
| إنترنت الأشياء | 16 | 5 | 31.25% | 3.38 | ✅ Already ranking well |
| النباتات الطبية | 32 | 1 | 3.12% | 10.5 | High volume, low CTR — needs better page |
| التدقيق الداخلي | 23 | 3 | 13.04% | 10 | Good potential |
| اليقظة العقلية/الذهنية | 25 | 3 | 12% | ~8 | Variant spellings — consolidate |
| سلوك المستهلك | 15 | 1 | 6.67% | 6 | Business/marketing topic |
| تقنية النانو | 14 | 1 | 7.14% | 8.79 | STEM topic |
| العلامة التجارية | 11 | 3 | 27.27% | 11.18 | High CTR for position |
| رضا العملاء | 7 | 4 | 57.14% | 6.43 | ⭐ Best CTR — validate with topic page |

### Step 8: CSAT + Zero-Click Recovery Loop

Use existing Search CTR logs + upcoming unified CSAT to continuously fix weak queries.

Daily job `app:topic-quality-loop`:

1. Pull queries with:
   - high impressions + low CTR
   - internal zero-click sessions
   - low CSAT (search context)
2. For each query, attempt:
   - map to existing topic pack, else create new draft pack
   - regenerate synonyms/intents using LLM
   - run offline evaluation vs recent click labels
3. Promote only packs that improve relevance metrics
4. Send top failing queries to admin review queue

Quality KPIs per query cluster:
- CTR uplift
- zero-click rate reduction
- CSAT average increase
- median click position improvement

### Step 9: Internal Linking

- On research `show.html.twig`: add "Browse more on this topic" links to relevant `/topics/{slug}` pages (match paper tags to topics)
- On homepage: add "Explore Topics" section for anonymous users
- On filter page: add "View topic page" link when filtering by tag that has a matching Topic
- In sitemap: add all topic pages to `sitemap.xml`

### Step 10: Cron — Update Paper Counts

Add to `ComputeTrendingCommand` or create `app:update-topic-counts`:
- For each active Topic, query ES with `esQueryTerms`, get total count
- Update `topic.paperCount`
- Run daily

---

## Verification

1. Create 5 test topics manually, verify pages render with correct ES results
2. Run `app:seed-topics` — verify 20 topics created with correct slugs
3. Run enrichment for top 100 queries and verify valid JSON packs + approval status
4. Evaluate offline relevance on sample zero-click queries before/after enrichment
5. Check Google: submit topic pages to Search Console for indexing
6. After 2 weeks: compare impressions/clicks for topic-related queries vs. before
7. Verify structured data with Google Rich Results Test

---

## Decisions

- **Separate entity, not just tag pages**: Topics have SEO metadata, descriptions, and curated content. Tags are auto-generated and messy.
- **Arabic-first**: GSC data shows Arabic queries dominate. English topic pages can come later.
- **LLM offline, not request-time**: Keep latency and cost predictable.
- **Quota by impact, not evenly distributed**: Spend tokens on queries most likely to move CTR/CSAT.
- **Human-in-the-loop approvals**: Auto-draft enrichment, manual approve for top traffic topics.

---

## Suggested Rollout (3 Weeks)

### Week 1 ✅ COMPLETE
- ~~Build `TopicEnrichment` entity + enrichment command + JSON schema validation~~
- ~~Generate first 100 topic packs from GSC + internal logs~~
- **Actual**: 50 topics processed, 49 approved, ES integration live

### Week 2 — CURRENT
- ~~Integrate enriched packs into ES query expansion~~ ✅
- ~~Enable topic pages for top 20 approved topics~~ → **Next: Create TopicController**
- **Immediate tasks:**
  1. Create `src/Controller/TopicController.php` with `/topics` and `/topics/{slug}` routes
  2. Create `templates/topic/index.html.twig` (topic listing)
  3. Create `templates/topic/show.html.twig` (topic detail with ES results)
  4. Add Schema.org structured data
  5. Run `app:enrich-topics --limit=100` to expand coverage

### Week 3
- Turn on `topic-quality-loop` with zero-click + CSAT inputs
- Measure CTR/zero-click deltas and iterate prompts/weights
- Add internal linking from research pages to topic pages
- Submit topic pages to Google Search Console

---

## Quick Commands

```bash
# Process more topics (on prod)
sudo -u www-data php bin/console app:enrich-topics --limit=50 --env=prod

# Check enrichment stats
SELECT status, COUNT(id) FROM topic_enrichment GROUP BY status;

# Bulk approve drafts
UPDATE topic_enrichment SET status = 'approved' WHERE status = 'draft' AND quality_score >= 0.5;

# View approved topics with synonyms
SELECT source_query, canonical_topic_ar, synonyms_ar FROM topic_enrichment WHERE status = 'approved' LIMIT 20;
```