# Plan: Research Chat / Ask AI (BizChat-style Q&A)

> **Status**: ✅ Implemented  
> **Created**: March 18, 2026  
> **Completed**: March 18, 2026  
> **Priority**: High  
> **Dependencies**: PlaygroundSubscription credits system, Elasticsearch arabic_research index, Azure OpenAI GPT-5.2

---

## Overview

Add a conversational "Ask" interface to the homepage search, allowing users to ask research questions in natural language. The system:
1. Uses LLM to extract search keywords from the question
2. Searches the Shamra Arabic research index
3. LLM evaluates which results are truly relevant
4. Generates a grounded answer citing the relevant papers
5. Users can share answers to their feed

## Implementation Summary

### What Was Built

- **RAG Pipeline**: Full retrieval-augmented generation with keyword extraction → ES search → relevance filtering → answer synthesis
- **Multi-language Support**: Prompts in English for consistency, responses in user's detected language (Arabic/English)
- **User Context**: Uses `user.interests` (explicit academic interests) and `studyField` for personalization
- **Thinking Indicator**: Animated "Shamra Academia is thinking" with nerdy emoji (🤓)
- **RTL Support**: Arabic content displays right-to-left with proper header alignment
- **Toast Notifications**: Replaced alerts with elegant sliding toasts with post links
- **Feed Sharing**: Shared posts render as HTML with clickable reference links

### Branding

- Arabic: **شمرا أكاديميا**
- English: **Shamra Academia**

## User Access Rules

| User Type | Access |
|-----------|--------|
| Anonymous | Cannot use chat (redirected to login) |
| Logged-in (no subscription) | Cannot use chat (see upgrade prompt) |
| Trial subscription | ✅ Full access (uses trial credits) |
| Paid subscription (starter+) | ✅ Full access |

**Credit Cost**: 5 credits per question (same as `ask_ai_paper`)

---

## Technical Architecture

### Flow Diagram

```
User Question
     │
     ▼
┌─────────────────────────────────┐
│ 1. Extract Keywords (GPT-5.2)   │
│    - Analyze question           │
│    - Generate 3-5 search terms  │
│    - Consider user context      │
└─────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────┐
│ 2. Search Elasticsearch         │
│    - Query arabic_research      │
│    - Match keywords + user tags │
│    - Return top 20 results      │
└─────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────┐
│ 3. Relevance Filtering (GPT)    │
│    - Evaluate each result       │
│    - Keep only relevant papers  │
│    - Select top 5-8 relevant    │
└─────────────────────────────────┘
     │
     ▼
┌─────────────────────────────────┐
│ 4. Generate Answer (GPT-5.2)    │
│    - Synthesize information     │
│    - Cite papers by title       │
│    - Include paper links        │
└─────────────────────────────────┘
     │
     ▼
User receives grounded answer + can share to feed
```

### User Context Integration

The system includes user context in the LLM prompts:
- `studyField`: User's academic field (e.g., "Computer Science")
- `tagsInterestsUser`: User's interest tags (e.g., ["machine learning", "NLP"])
- These help prioritize relevant results and tailor the answer

---

## Database Schema

### `research_chat` Table

| Column | Type | Description |
|--------|------|-------------|
| id | INT PK AUTO | Chat session ID |
| user_id | INT FK | References fos_user.id |
| title | VARCHAR(255) | Auto-generated from first question |
| created_at | DATETIME | Session start |
| updated_at | DATETIME | Last activity |

### `research_chat_message` Table

| Column | Type | Description |
|--------|------|-------------|
| id | INT PK AUTO | Message ID |
| chat_id | INT FK | References research_chat.id |
| role | ENUM | 'user' or 'assistant' |
| content | TEXT | Message content |
| metadata | JSON | Stores search keywords, cited papers, etc. |
| credits_used | INT | Credits charged for this message |
| created_at | DATETIME | Timestamp |

---

## API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/research-chat` | Start new chat session |
| POST | `/api/research-chat/{id}/message` | Send message to existing chat |
| GET | `/api/research-chat/{id}` | Get chat history |
| GET | `/api/research-chat` | List user's chats |
| DELETE | `/api/research-chat/{id}` | Delete a chat |
| POST | `/api/research-chat/{id}/share` | Share answer to feed |
| POST | `/api/research-chat/message/{messageId}/feedback` | Submit like/dislike feedback |

### POST `/api/research-chat` Request

```json
{
  "question": "ما هي أحدث تقنيات التعلم الآلي في معالجة اللغة العربية؟"
}
```

### POST `/api/research-chat` Response

```json
{
  "id": 123,
  "title": "تقنيات التعلم الآلي في معالجة اللغة العربية",
  "messages": [
    {
      "role": "user",
      "content": "ما هي أحدث تقنيات التعلم الآلي في معالجة اللغة العربية؟"
    },
    {
      "role": "assistant",
      "content": "بناءً على الأبحاث المتاحة في شمرا أكاديميا...",
      "metadata": {
        "searchKeywords": ["التعلم الآلي", "معالجة اللغة الطبيعية العربية"],
        "citedPapers": [
          {"id": 456, "title": "...", "slug": "...", "relevanceScore": 0.92}
        ]
      }
    }
  ],
  "creditsUsed": 5,
  "creditsRemaining": 95
}
```

---

## Files Created/Modified

### New Files (Created)

| File | Purpose | Status |
|------|---------|--------|
| `src/Entity/ResearchChat.php` | Chat session entity | ✅ |
| `src/Entity/ResearchChatMessage.php` | Message entity | ✅ |
| `src/Repository/ResearchChatRepository.php` | Chat queries | ✅ |
| `src/Repository/ResearchChatMessageRepository.php` | Message queries | ✅ |
| `src/Service/ResearchChatService.php` | Core RAG pipeline logic | ✅ |
| `src/Controller/ResearchChatController.php` | API endpoints | ✅ |
| `playground_prompts/research_chat_keywords_prompt.md` | Keyword extraction prompt | ✅ |
| `playground_prompts/research_chat_filter_prompt.md` | Relevance filtering prompt | ✅ |
| `playground_prompts/research_chat_answer_prompt.md` | Answer generation prompt | ✅ |
| `migrations/Version20260318100000.php` | Database migration | ✅ |

### Modified Files

| File | Changes | Status |
|------|---------|--------|
| `src/syndex/AcademicBundle/Resources/views/Default/homepage_new.html.twig` | Added chat tab/UI with thinking indicator, RTL, toast | ✅ |
| `src/syndex/AcademicBundle/Resources/translations/Academia.ar.yml` | Arabic translations (research_chat.*) | ✅ |
| `src/syndex/AcademicBundle/Resources/translations/Academia.en.yml` | English translations | ✅ |
| `src/Service/Playground/UsageMonitorService.php` | Added 'research_chat' operation type | ✅ |

---

## UI/UX Design

### Homepage Search Section

```
┌─────────────────────────────────────────────────────────────┐
│  [🔍 Search] [✨ Ask AI]                                    │  ← Mode toggle tabs
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ اسأل أي سؤال عن الأبحاث العلمية...                  │   │  ← Smart input
│  │                                                     │   │
│  │                                      [إرسال ➤]     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  💡 أمثلة: "ما هي تقنيات الذكاء الاصطناعي في الطب؟"        │  ← Example prompts
│           "اشرح لي أهم نتائج أبحاث تغير المناخ"             │
└─────────────────────────────────────────────────────────────┘
```

### Chat Response Display

```
┌─────────────────────────────────────────────────────────────┐
│ 🎓 شمرا أكاديميا                                            │
│                                                             │
│ بناءً على الأبحاث المتاحة في شمرا أكاديميا، يمكن تلخيص     │
│ أحدث تقنيات التعلم الآلي في معالجة اللغة العربية كالتالي:   │
│                                                             │
│ **1. نماذج المحولات (Transformers)**                       │
│ أظهرت الدراسات [1] أن نماذج BERT العربية...                │
│                                                             │
│ **المراجع المستخدمة:**                                      │
│ [1] تطبيق نماذج BERT على النصوص العربية - جامعة الملك سعود │
│ [2] معالجة اللغة العربية باستخدام التعلم العميق - جامعة...  │
│                                                             │
│ ┌──────────┐  ┌──────────┐  ┌──────────┐                   │
│ │ 📤 مشاركة │  │ 📋 نسخ   │  │ 💬 متابعة │                   │
│ └──────────┘  └──────────┘  └──────────┘                   │
└─────────────────────────────────────────────────────────────┘
```

### Thinking Indicator

```
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│                          🤓                                 │
│                (bouncing animation)                         │
│                                                             │
│              شمرا أكاديميا يفكر ...                         │
│                (animated dots)                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

---

## Implementation Notes

### Prompt Engineering

**Keyword Extraction Prompt** (`playground_prompts/research_chat_keywords_prompt.md`):
- Extract 3-5 Arabic academic keywords
- Consider synonyms and related terms
- Preserve technical terminology
- Return JSON array
- All prompts in English for LLM consistency

**Relevance Filtering Prompt** (`playground_prompts/research_chat_filter_prompt.md`):
- Score each paper 0-1 for relevance to the question
- Consider title, abstract, and field match
- Filter papers below 0.5 relevance
- Return ranked list with reasoning

**Answer Generation Prompt** (`playground_prompts/research_chat_answer_prompt.md`):
- Synthesize information from relevant papers
- Detect user's language and respond in same language
- Cite papers using markdown links `[title](url)`
- Include actionable research directions
- Acknowledge limitations when data is sparse
- Brand: "Shamra Academia AI" / "شمرا أكاديميا"

### User Context

The system uses explicit academic interests (from `user.interests`) rather than visit-based interests (`tagsInterestsUser`) for cleaner personalization:

```
Field: [user's studyField in Arabic/English]
Interests: Machine learning, NLP, Data Science, ...
```

This is passed to the answer generation prompt to tailor responses.

### Error Handling

| Scenario | Response |
|----------|----------|
| No relevant papers found | Return helpful message suggesting search refinement |
| Insufficient credits | Return 402 with upgrade prompt |
| ES timeout | Retry once, then return error |
| LLM timeout | Return partial response if possible |

### Analytics

Track in `playground_usage_log`:
- `operation_type`: 'research_chat'
- `tokens_in`, `tokens_out`
- `credits_used`
- Additional metadata: `question_length`, `papers_cited`, `search_keywords`

---

## Testing Checklist

- [x] Anonymous user sees login prompt when clicking "Ask AI"
- [x] User without subscription sees upgrade prompt
- [x] Trial user can ask questions (credits deducted)
- [x] Paid user can ask questions
- [x] Keywords extracted correctly for Arabic questions
- [x] ES search returns relevant papers
- [x] Answer cites actual papers from index
- [x] Share to feed creates proper post with HTML links
- [x] Chat history persists and loads correctly
- [x] Credits deducted correctly
- [x] RTL support for Arabic content
- [x] Thinking indicator shows during processing
- [x] Toast notifications instead of alerts
- [x] Like button shows smile emoji animation
- [x] Dislike button opens feedback modal
- [x] Feedback saved to database correctly
- [x] Admin DSAT dashboard shows feedback stats
- [ ] Mobile responsive UI (needs testing)
- [ ] Multi-turn conversation (follow-up questions)

---

## Future Enhancements

- **Multi-turn conversation**: Follow-up questions in context *(partially implemented - API supports it)*
- **Paper recommendations**: "Based on your question, you might like..."
- **Citation export**: Export answer with references in APA/MLA
- **Voice input**: Speech-to-text for questions
- **Comparison mode**: "Compare findings from papers X and Y"
- **English research index**: Currently only searches `arabic_research`, could add `english_research`
- **History sidebar**: Show previous chat sessions for logged-in users

---

## Deployment Notes

### Migration

Run after deployment:
```bash
sudo -u www-data php /var/www/html/academia_v2/bin/console doctrine:migrations:migrate --no-interaction --env=prod
```

### Translation Keys

All chat translations use `research_chat.*` prefix to avoid conflicts:
- `research_chat.search_tab`
- `research_chat.chat_tab`
- `research_chat.ai_name`
- `research_chat.thinking`
- `research_chat.view_post`
- etc.

### Credit Cost

5 credits per question (same as `ask_ai_paper` operation).

---

## Development Log

### March 18, 2026

**Session 1: Core Implementation**
- Created database schema (`research_chat`, `research_chat_message` tables)
- Built entities: `ResearchChat`, `ResearchChatMessage`
- Implemented `ResearchChatService` with full RAG pipeline:
  - Keyword extraction from natural language questions (GPT-5.2)
  - Elasticsearch search against `arabic_research` index
  - Relevance filtering to identify truly relevant papers
  - Answer synthesis with proper citations
- Created API controller with endpoints for chat lifecycle
- Added "Ask AI" tab to homepage search UI
- Integrated user context (interests, study field) into prompts

**Session 2: Bug Fixes**
- Fixed duplicate YAML translation keys (renamed to `research_chat.*` prefix)
- Resolved conflicts with existing `search_tab` and `chat_tab` keys

**Session 3: UX Polish**
- Added animated thinking indicator with nerdy emoji (🤓) and "شمرا أكاديميا يفكر..."
- Implemented RTL support for Arabic responses (proper text direction and header alignment)
- Replaced browser alerts with elegant sliding toast notifications
- Added "View Post" link in success toast after sharing

**Session 4: Feed Integration**
- Fixed shared posts to render as HTML (not escaped text)
- Reference links inside shared answers are now clickable in `post_show` template
- Added notification cleanup when posts are deleted (prevents "not found" errors)

**Commits:**
- `17e71466` Add Research Chat (Ask AI) feature with RAG pipeline
- `495b1db7` Fix duplicate YAML keys - rename to research_chat prefix
- `c39d3da4` Improve Research Chat UX: thinking indicator, RTL, toast notifications, HTML formatting
- `f167637e` Render research chat posts with HTML in post_show template
- `66d0fcc5` Delete notifications when post is deleted to prevent 'not found' errors

**Session 5: Feedback System**
- Added like/dislike feedback buttons to assistant messages
- Like: shows smile emoji animation (😊)
- Dislike: opens modal asking for optional comment
- Added feedback fields to `research_chat_message` entity (`feedback`, `feedback_comment`, `feedback_at`)
- Created migration `Version20260318120000.php` for feedback columns
- Added API endpoint `POST /api/research-chat/message/{id}/feedback`
- Added admin dashboard "DSAT" tab with:
  - Overview cards (total, likes, dislikes, satisfaction %, comments)
  - Trend chart (likes vs dislikes over time)
  - Distribution pie chart
  - Disliked responses table with user comments
  - All feedback table
- Added Arabic/English translations for feedback UI
