# Shamra Integration - Elasticsearch Index Management

## Known Issues

### 🐛 Tag Field Format Issue in `arabic_research_test` Index

**Date Discovered:** February 4, 2026

**Problem:**  
The `tag` and `tag_id` fields in the `arabic_research_test` index are incorrectly formatted, which breaks the user activity interest tracking feature.

**Current (Broken) Format in `arabic_research_test`:**
```json
{
  "tag": "EFL, Evaluation, Assessment, Effectiveness, Admission Test",
  "tag_id": null
}
```

**Expected (Correct) Format (as in `arabic_research_restored`):**
```json
{
  "tag": [
    "EFL",
    "Evaluation", 
    "Assessment",
    "Effectiveness",
    "Admission Test"
  ],
  "tag_id": [
    "12345",
    "12346",
    "12347",
    "12348",
    "12349"
  ]
}
```

**Impact:**
- User activity interest tracking does NOT work for ES-only documents
- The PHP code in `ResearchController.php` iterates over `$esTags` expecting an array
- When `tag` is a string, the foreach loop iterates over characters instead of tag names

**Affected Code:**
```php
// src/syndex/AcademicBundle/Controller/ResearchController.php (lines 737-748)
$esTags = $elastic_research->getTags();
if ($esTags) {
    foreach ($esTags as $tag) {  // <-- expects array, fails with string
        if ($tag instanceof \App\syndex\AcademicBundle\Entity\Tag) {
            $tagInterestsUserService->add($tag, $user);
        } elseif (is_string($tag)) {
            $tagInterestsUserService->add($tag, $user);
        }
    }
}
```

**Fix Required:**
When indexing documents into `arabic_research_test`, ensure:
1. `tag` field is an **array of strings** (not comma-separated string)
2. `tag_id` field is an **array of tag IDs** (not null)

**Example fix in Python indexing script:**
```python
# Before indexing
if isinstance(doc.get("tag"), str):
    doc["tag"] = [t.strip() for t in doc["tag"].split(",")]
```

---

## Environment Configuration

The active ES index is configured in `.env`:
```
ELASTIC_ARABIC_RESEARCH_INDEX=arabic_research_test
```

To switch to the correctly formatted index:
```
ELASTIC_ARABIC_RESEARCH_INDEX=arabic_research_restored
```

---

## TODO

- [ ] Fix tag/tag_id format in `arabic_research_test` index
- [ ] Verify interest tracking works after fix
- [ ] Update indexing scripts to ensure correct format going forward
