You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

390 lines
14 KiB
Markdown

# 2.4 Leaderboard Service (Port 8083) - Detailed Implementation Plan
## Summary
Implement the Leaderboard Service as the read-optimized ranking and statistics service for game outcomes, consistent with implementation decisions used in `2.1` (Question Bank), `2.2` (User), and `2.3` (Game Session).
Runtime stack and conventions:
- Fiber HTTP service with shared bootstrap and observability.
- PostgreSQL persistence with `EnsureSchema(ctx)` startup DDL.
- Redis for read-optimized ranking cache and top-10 acceleration.
- Shared `backend/shared` packages for auth, errors, validation, logging, tracing, metrics, and readiness.
Scope boundary:
- Modify only `backend/services/leaderboard-service/**`.
- Do not modify `backend/services/*` other than leaderboard-service.
- Do not modify `backend/shared/**`.
## Decisions Reused from 2.1, 2.2, and 2.3
1. Service composition pattern:
- `internal/infra/config.FromEnv()`
- logger/metrics/tracer initialization in `cmd/main.go`
- repository initialization + `EnsureSchema(ctx)` at startup
- `/health`, `/ready`, `/metrics` registration
- route registration via `internal/interfaces/http/routes.go`
2. Persistence and state pattern:
- PostgreSQL as source of truth.
- Redis as optional performance layer, non-fatal when unavailable.
- service remains functional on PostgreSQL when Redis is down.
3. Error and transport pattern:
- domain/application errors mapped via shared `httputil.SendError`.
- standard response envelope style (`success`, `data`) used by existing services.
4. Inter-service integration approach:
- HTTP adapters with application interfaces so transport can evolve later without domain changes.
- explicit DTOs for upstream/downstream contracts.
5. Test pyramid:
- unit tests for ranking/statistics logic.
- HTTP integration tests with in-memory doubles/fakes.
- optional DB-backed integration tests gated by environment.
## Objectives
1. Provide public leaderboard query endpoints:
- top 10 scores
- player ranking and history
- global statistics
2. Provide internal score ingestion endpoint for completed sessions.
3. Ensure deterministic ranking:
- sort by score descending, then completion duration ascending, then completed_at ascending.
4. Maintain historical score records for analytics and auditability.
5. Deliver production-ready observability, readiness checks, and test coverage.
## API Endpoints
- `GET /leaderboard/top10`
- `GET /leaderboard/players/:id`
- `GET /leaderboard/stats`
- `POST /leaderboard/update` (internal command endpoint)
## Auth and Access Rules
1. Query endpoints:
- `GET /leaderboard/top10`, `GET /leaderboard/stats` are public read endpoints.
- `GET /leaderboard/players/:id` requires auth; user can read own detailed history.
- Admin role may read any player history.
2. Update endpoint:
- `POST /leaderboard/update` is internal-only and requires service/admin auth middleware.
- Reject anonymous or non-privileged callers.
## Inter-Service Contracts
### Game Session dependency
Purpose: ingest canonical final session outcomes.
Contract for `POST /leaderboard/update` request:
- `session_id` (string, required)
- `player_id` (string, required)
- `player_name` (string, required)
- `total_score` (int, required, >= 0)
- `questions_asked` (int, required, >= 0)
- `questions_correct` (int, required, >= 0)
- `hints_used` (int, required, >= 0)
- `duration_seconds` (int, required, >= 0)
- `completed_at` (RFC3339 timestamp, required)
- `completion_type` (`completed|timed_out|abandoned`, required)
Idempotency decision:
- deduplicate on `session_id` (unique).
- repeated update for same `session_id` returns success with existing persisted record.
### User Service dependency
Purpose: optional profile hydration fallback for player display fields.
Decision:
- leaderboard update request is authoritative for `player_name` to avoid hard runtime coupling.
- optional future enrichment can call `GET /users/:id`, but is not required for step `2.4`.
## Domain Model
Aggregates:
- `LeaderboardEntry` (one per completed session)
- `PlayerRankingSnapshot` (derived read model)
Value objects:
- `Rank` (positive integer)
- `SuccessRate` (0..100 percentage)
- `CompletionType`
Domain services:
- `RankingService` (ordering and tie-breaks)
- `StatisticsService` (global aggregates)
Core invariants:
1. Each `session_id` can be ingested once.
2. Score cannot be negative.
3. Questions counts cannot be negative.
4. `questions_correct <= questions_asked`.
5. Rank ordering rule is deterministic:
- score desc
- duration asc
- completed_at asc
6. Top10 response always returns at most 10 entries.
## Data Model (PostgreSQL)
### `leaderboard_entries`
- `id UUID PRIMARY KEY`
- `session_id VARCHAR(64) NOT NULL UNIQUE`
- `player_id VARCHAR(128) NOT NULL`
- `player_name VARCHAR(50) NOT NULL`
- `score INT NOT NULL`
- `questions_asked INT NOT NULL`
- `questions_correct INT NOT NULL`
- `hints_used INT NOT NULL DEFAULT 0`
- `duration_seconds INT NOT NULL`
- `success_rate NUMERIC(5,2) NOT NULL`
- `completion_type VARCHAR(16) NOT NULL`
- `completed_at TIMESTAMPTZ NOT NULL`
- `created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()`
Indexes:
- `(score DESC, duration_seconds ASC, completed_at ASC)`
- `(player_id, completed_at DESC)`
- `(completion_type, completed_at DESC)`
- `(created_at DESC)`
### `leaderboard_player_stats`
Pre-aggregated player read model for fast rank/profile reads.
- `player_id VARCHAR(128) PRIMARY KEY`
- `player_name VARCHAR(50) NOT NULL`
- `games_played INT NOT NULL DEFAULT 0`
- `games_completed INT NOT NULL DEFAULT 0`
- `total_score BIGINT NOT NULL DEFAULT 0`
- `best_score INT NOT NULL DEFAULT 0`
- `avg_score NUMERIC(10,2) NOT NULL DEFAULT 0`
- `avg_success_rate NUMERIC(5,2) NOT NULL DEFAULT 0`
- `total_questions BIGINT NOT NULL DEFAULT 0`
- `total_correct BIGINT NOT NULL DEFAULT 0`
- `best_duration_seconds INT NULL`
- `last_played_at TIMESTAMPTZ NULL`
- `updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()`
Indexes:
- `(best_score DESC, best_duration_seconds ASC, last_played_at ASC)`
- `(last_played_at DESC)`
Note:
- all write updates to `leaderboard_player_stats` happen transactionally with `leaderboard_entries` insert in the same repository method.
## Redis Usage
Key prefix: `lb:`
Keys:
- `lb:top10:v1` -> serialized top10 payload (TTL cache)
- `lb:rank:{player_id}` -> cached player rank snapshot
- `lb:stats:global:v1` -> cached global stats payload
- `lb:zset:scores` -> sorted set score index (optional acceleration, non-authoritative)
Rules:
1. PostgreSQL remains source of truth.
2. On successful update ingestion:
- invalidate `lb:top10:v1`
- invalidate `lb:stats:global:v1`
- invalidate `lb:rank:{player_id}`
- best-effort Redis operations; failures logged and counted, not fatal.
3. If Redis is unavailable:
- query endpoints compute from PostgreSQL and still return success.
- readiness check marks Redis as down but optional.
## Endpoint Behavior (Decision Complete)
### `POST /leaderboard/update`
Input: final session outcome payload (see contract above).
Flow:
1. authenticate internal caller.
2. validate input fields and invariants.
3. start transaction.
4. check existing by `session_id`.
5. if existing:
- return existing normalized entry response (`200`, idempotent success).
6. insert into `leaderboard_entries`.
7. upsert/refresh `leaderboard_player_stats` aggregate.
8. commit transaction.
9. invalidate related Redis caches.
10. emit structured log + metrics counter.
Output:
- persisted leaderboard entry summary including computed `success_rate`.
### `GET /leaderboard/top10`
Query params:
- optional `completion_type` filter (`completed|timed_out|abandoned`)
- optional `window` (`24h|7d|30d|all`, default `all`)
Flow:
1. attempt Redis cache hit for matching key variant.
2. on miss, query PostgreSQL ordered by ranking rule.
3. compute rank values (1..N), cap at 10.
4. cache result with short TTL.
Output:
- ordered top list with rank, player_id, player_name, score, questions_asked, success_rate, duration_seconds, completed_at.
### `GET /leaderboard/players/:id`
Auth:
- self or admin.
Query params:
- `page` (default 1)
- `page_size` (default 20, max 100)
Flow:
1. auth and ownership/admin check.
2. fetch player aggregate from `leaderboard_player_stats`.
3. fetch paginated history from `leaderboard_entries`.
4. compute current global rank from ordering criteria against all players using `best_score` then tie-breakers.
Output:
- player summary:
- current_rank, games_played, best_score, avg_score, avg_success_rate, total_score
- paginated history list.
### `GET /leaderboard/stats`
Flow:
1. attempt Redis cache hit.
2. on miss, aggregate from PostgreSQL.
Returned stats:
- `total_games`
- `total_players`
- `avg_score`
- `avg_success_rate`
- `max_score`
- `score_p50`, `score_p90`, `score_p99`
- `updated_at`
## Package Layout
- `backend/services/leaderboard-service/cmd/main.go`
- `backend/services/leaderboard-service/internal/infra/config/config.go`
- `backend/services/leaderboard-service/internal/domain/leaderboard/`
- `backend/services/leaderboard-service/internal/application/leaderboard/`
- `backend/services/leaderboard-service/internal/infra/persistence/ent/`
- `backend/services/leaderboard-service/internal/infra/state/`
- `backend/services/leaderboard-service/internal/interfaces/http/`
- `backend/services/leaderboard-service/tests/`
## Configuration
Service-specific:
- `LEADERBOARD_PORT` (default `8083`)
- `LEADERBOARD_TOP_LIMIT` (default `10`)
- `LEADERBOARD_PLAYER_HISTORY_DEFAULT_LIMIT` (default `20`)
- `LEADERBOARD_PLAYER_HISTORY_MAX_LIMIT` (default `100`)
- `LEADERBOARD_CACHE_TTL` (default `60s`)
- `LEADERBOARD_UPDATE_REQUIRE_AUTH` (default `true`)
Optional integration:
- `GAME_SESSION_BASE_URL` (default `http://localhost:8080`) for future backfill tooling
- `UPSTREAM_HTTP_TIMEOUT` (default `3s`)
Shared:
- `POSTGRES_*`, `REDIS_*`, `TRACING_*`, `METRICS_*`, `LOG_LEVEL`, `ZITADEL_*`
## Implementation Work Breakdown
### Workstream A - Bootstrap, config, wiring
1. Add `internal/infra/config/config.go` env parsing.
2. Wire logger/metrics/tracer in `cmd/main.go`.
3. Initialize postgres + redis clients.
4. Initialize repository and `EnsureSchema(ctx)`.
5. Register `/health`, `/ready`, `/metrics`.
6. Build auth middleware and register routes.
### Workstream B - Domain and application
1. Define domain entities, value objects, and domain errors.
2. Implement ranking and statistics calculation services.
3. Implement application use-cases:
- `UpdateScore`
- `GetTop10`
- `GetPlayerRanking`
- `GetGlobalStats`
### Workstream C - Persistence
1. Implement repository interfaces and SQL-backed repository.
2. Add `EnsureSchema(ctx)` DDL for both tables and indexes.
3. Implement transactional ingestion:
- insert entry
- upsert player stats
4. Implement top10/history/stats query methods.
### Workstream D - HTTP interface
1. Add request/response DTOs with validation tags.
2. Implement handlers with shared error mapping.
3. Apply ownership/admin checks for player history endpoint.
4. Keep response envelope consistent with existing services.
### Workstream E - Cache and read optimization
1. Add Redis cache adapter for top10/stats/rank snapshots.
2. Implement cache keys, invalidation, and graceful fallback.
3. Add counters for cache hit/miss and invalidation failures.
### Workstream F - Testing
1. Unit tests:
- ranking tie-break correctness
- success-rate calculation
- stats aggregate math
- update idempotency behavior
2. HTTP integration tests:
- update + top10 happy path
- duplicate update idempotency
- player endpoint auth guards
- stats endpoint and metrics availability
3. Optional DB-backed tests (env-gated):
- schema creation
- transactional consistency
- unique `session_id` constraint
## Error Handling Contract
- `400`: invalid input or invariant violation
- `401`: missing/invalid auth for protected endpoints
- `403`: ownership/admin violation
- `404`: player not found (for player ranking endpoint)
- `409`: conflicting update (non-idempotent invalid duplicate payload scenario)
- `500`: unexpected internal failures
## Observability
1. Structured logs:
- include `session_id`, `player_id`, endpoint, and operation outcome.
- avoid PII-heavy payload logging.
2. Metrics:
- `leaderboard_updates_total{status}`
- `leaderboard_top10_requests_total{cache}`
- `leaderboard_player_requests_total{status}`
- `leaderboard_stats_requests_total{cache}`
- `leaderboard_update_latency_seconds`
3. Tracing:
- endpoint -> application -> repository/cache spans.
- include attributes for cache hit/miss and DB query class.
## Delivery Sequence (3-4 Days)
1. Day 1: bootstrap/config + schema + repository scaffolding.
2. Day 2: `POST /leaderboard/update` transactional ingestion + cache invalidation.
3. Day 3: query endpoints (`top10`, `players/:id`, `stats`) + auth checks.
4. Day 4: tests, observability hardening, bugfix buffer.
## Verification Commands
From `backend/services/leaderboard-service`:
```bash
go test ./...
go vet ./...
```
Optional workspace-level check from `backend`:
```bash
go test ./...
```
## Definition of Done
1. All four endpoints implemented and route protection rules enforced.
2. Ranking order and tie-break behavior matches functional requirement.
3. Update ingestion is idempotent by `session_id`.
4. `/health`, `/ready`, `/metrics` functional with meaningful checks.
5. Redis cache fallback behavior works when Redis is unavailable.
6. `go test ./...` and `go vet ./...` pass for leaderboard-service.
7. No code changes outside `backend/services/leaderboard-service/**`.
## Assumptions and Defaults
1. Inter-service transport remains HTTP in phase 2.
2. Leaderboard consistency can be eventual within seconds.
3. `POST /leaderboard/update` is called by internal trusted workflow after session termination.
4. No changes are made in other services or `backend/shared` during this step.