Raw: 공통 어노테이터 스키마

## 공통 어노테이터 스키마

### 적용 범위

본 문서는 Datamaker의 모든 어노테이터(이미지, 비디오, 텍스트, 프롬프트, PCD 등)가 **공통적으로 따라야 하는 데이터 모델**을 정의합니다.

각 어노테이터별 스펙은 이 스키마를 상속하고, 도구별(툴별) 고유 필드를 추가로 확장합니다.

### 핵심 원칙

- **classification 구조는 tool(도구)와 무관**하며, **관리자(운영자) 페이지에서 직접 정의**한 class, attributes, options로 구성됩니다.

- **attributes, options 등 모든 하위 구조 역시 사용자가 직접 정의**합니다.
- 실제 어노테이션 데이터의 classification 필드는 **트리 구조를 flatten(평탄화)한 key-value 쌍**으로 저장됩니다.
- 특정 class가 선택된 경우, **필수 하위 속성(종속성)은 프로젝트별 dm-schema 확장 JSON Schema**로 관리됩니다.

### 전체 구조 개요 (Mermaid)

```mermaid
flowchart TD
    Schema["관리자 정의 분류 스키마 (트리)"]
    Data["실제 어노테이션 데이터 (Flatten)"]
    Schema -->|Flatten 규칙| Data
    Schema -->|종속성 규칙| JSONSchema["dm-schema 확장 JSON Schema"]
    Data -->|유효성 검증| JSONSchema
```

### 데이터 흐름
```
┌─────────────────────────────────────────────────────────────┐
│                    annotatorData (최상위)                    │
├─────────────────────────────────────────────────────────────┤
│  extra          → 에셋별 추가 메타데이터                      │
│  annotations    → 어노테이션 메타 정보 (id, tool, 분류 등)   │
│  annotationsData → 어노테이션 실제 좌표/데이터                │
│  relations      → 어노테이션 간 관계                         │
│  annotationGroups → 어노테이션 그룹 정보                     │
│  assignmentId   → 작업 식별자                                │
└─────────────────────────────────────────────────────────────┘
```

### 공통 데이터 모델

#### 1. 최상위 구조

모든 어노테이션 작업은 하나의 JSON 객체로 저장되며, 최상위 키는 어노테이터 종류와 관계없이 동일합니다.

| 키 | 타입 | 설명 |
| --- | --- | --- |
| `extra` | `Record<string, unknown>` | 에셋별 메타데이터 |
| `relations` | `Record<string, RelationItem[]>` | 어노테이션 간 관계 |
| `annotations` | `Record<string, AnnotationBase[]>` | 에셋 단위의 어노테이션 목록 |
| `annotationsData` | `Record<string, AnnotationDataItem[]>` | 어노테이션 실제 좌표/데이터 |
| `annotationGroups` | `Record<string, AnnotationGroupItem[]>` | 어노테이션 그룹화 |
| `assignmentId` | `string` | 작업 식별자 |

#### 2. 공통 스키마 구조

#### 2.1 최상위 구조

```typescript
type AnnotatorData = {
  extra: Record<AssetId, unknown>
  annotations: Record<AssetId, AnnotationBase[]>
  annotationsData: Record<AssetId, AnnotationDataItem[]>
  relations: Record<AssetId, RelationItem[]>
  annotationGroups: Record<AssetId, AnnotationGroupItem[]>
  assignmentId: number | string
}

// AssetId 예시: "image_1", "video_1", "text_1", "pcd"
```

#### 2.2 AnnotationBase (공통 메타 정보)

소스코드 `src/app/core/lib/data.js`에서 확인된 구조:

```typescript
type AnnotationBase = {
  id: string // 10자 랜덤 문자열 (예: "Cd1qfFQFI4")
  tool: string // 사용된 도구 코드
  isLocked: boolean // 편집 잠금 여부 (기본값: false)
  isVisible: boolean // 화면 표시 여부 (기본값: true)
  isValid?: boolean // 유효성 여부 (기본값: false)
  isDrawCompleted?: boolean // 그리기 완료 여부
  classification: ClassificationObject | null
  label?: string[] // 분류 기반 생성된 라벨 배열

  // Sequential Data 전용
  sequenceIndex?: number // 시퀀스 인덱스
  instanceId?: string // 인스턴스 ID (자동 생성)
}
```

#### 2.3 RelationItem (관계 객체)

```typescript
type RelationItem = {
  id: string // 소스ID + 타겟ID 조합 (예: "Cd1qfFQFI4AUjPgaMzQa")
  tool: 'relation' // 항상 "relation" 고정
  isLocked: boolean
  isVisible: boolean
  isValid?: boolean
  annotationId: string // 출발(소스) 어노테이션 ID
  targetAnnotationId: string // 도착(타겟) 어노테이션 ID
  classification: ClassificationObject | null
  label?: string[]
}
```

#### 2.4 AnnotationGroupItem (그룹 객체)

```typescript
type AnnotationGroupItem = {
  id: string
  tool: 'annotationGroup' // 항상 "annotationGroup" 고정
  isLocked: boolean
  isValid?: boolean
  annotationList: GroupMemberItem[]
  classification: ClassificationObject | null
}

type GroupMemberItem = {
  annotationId: string
  children: GroupMemberItem[] // 계층 구조 지원
}
```

#### 2.5 AnnotationDataBase (공통 데이터 필드)

`annotationsData` 배열 내 각 항목이 가질 수 있는 공통 필드입니다:

```typescript
type AnnotationDataBase = {
  id: string // AnnotationBase.id와 1:1 매칭

  // 데이터 압축 (전역 공통 필드)
  isCompressed?: boolean // 압축 여부
  compressionFormat?: CompressionFormat // 압축 포맷
}

type CompressionFormat =
  | 'rle' // Run-Length Encoding (현재 지원)
```

#### 2.6 데이터 압축 공통 규격

> ⚠️ **적용 범위**: 현재 Image Annotator의 `segmentation` 도구에서 사용 중이며, 향후 모든 어노테이터에서 공통적으로 적용될 예정입니다.

##### 압축 필드 명세

| 필드                | 타입      | 필수   | 설명                                                     |
| ------------------- | --------- | ------ | -------------------------------------------------------- |
| `isCompressed`      | `boolean` | 조건부 | 데이터 압축 여부. `true`일 경우 `compressionFormat` 필수 |
| `compressionFormat` | `string`  | 조건부 | 압축 알고리즘 식별자. `isCompressed: true`일 때 필수     |

##### 압축 포맷 종류

| 포맷   | 상태       | 설명                  | 적용 대상                 |
| ------ | ---------- | --------------------- | ------------------------- |
| `rle`  | ✅ 사용 중 | Run-Length Encoding   | 연속된 인덱스/픽셀 데이터 |

##### 압축 데이터 처리 흐름

```
┌────────────────┐      ┌─────────────┐      ┌─────────────┐
│   원본 데이터  │ ──▶ │  인코딩     │ ──▶ │  저장/전송  │
│ (pixel_indices)│      │ (RLE 등)    │      │ (압축 상태) │
└────────────────┘      └─────────────┘      └─────────────┘
                                                 │
┌───────────────┐      ┌─────────────┐           │
│   사용 가능   │ ◀── │  디코딩     │ ◀─────────┘
│ (복원된 배열) │      │ (압축 해제) │
└───────────────┘      └─────────────┘
```

##### 압축 적용 예시

**압축 전 (Raw):**

```json
{
  "id": "seg_001",
  "pixel_indices": [100, 101, 102, 103, 104, 200, 201, 202]
}
```

**압축 후 (RLE):**

```json
{
  "id": "seg_001",
  "pixel_indices": [100, 5, 200, 3],
  "isCompressed": true,
  "compressionFormat": "rle"
}
```

> RLE 형식: `[시작인덱스, 연속개수, 시작인덱스, 연속개수, ...]`


### Classification 및 Label 처리

#### Classification 구조

관리자가 정의한 트리 구조의 분류 스키마는 **1-depth key-value 쌍**으로 평탄화됩니다:

```json
{
  "class": "boundingbox",
  "text": "설명 텍스트",
  "multiple": ["option1", "option2"],
  "single_radio": "selected_option",
  "single_dropdown": "dropdown_value"
}
```

#### Label 자동 생성

`classification` 기반으로 `label` 배열이 자동 생성됩니다:

```javascript
if (annotation.classification) {
  annotation.label = getLabel(variables, annotation.id, annotation.tool, annotation.classification)
}
```

### 데이터 전처리 및 후처리

#### beforeAction Hook

모든 어노테이터는 `definedHooks.beforeAction`을 통해 저장 전 데이터 정리를 수행합니다:

```javascript
definedHooks: {
  beforeAction: (annotatorData, variables) => {
    return dataGrooming(annotatorData, variables)
  }
}
```

#### 어노테이터별 특수 처리

| 어노테이터 | 전처리 내용                                                            |
| ---------- | ---------------------------------------------------------------------- |
| Image      | `dataGrooming` 실행                                                    |
| Video      | `bakeInterpolatedFrames` (bounding_box 키프레임 보간) + `dataGrooming` |
| Text       | `dataGrooming` 실행                                                    |
| PCD        | 빈 `points` 배열 가진 `3d_segmentation` 삭제 + `dataGrooming`          |
| Prompt     | `dataGrooming` 실행                                                    |
| Audio      | `dataGrooming` 실행                                                    |