When we launched Cognaxa’s analytics dashboard to our first enterprise client — a university with 3,000 active students and 45 courses — the teacher dashboard loaded in 12 seconds. That’s not a typo. Twelve seconds to render a page that teachers check dozens of times a day.
This is the story of how we got that down to under 100ms, and the architectural decisions that made it sustainable at 10x the scale.
The Problem: N+1 Queries and Missing Indexes
Our first analytics implementation was the classic “it works in development” anti-pattern. The teacher dashboard needed to show:
- Course-level completion rates
- Per-quiz score distributions
- Question-level difficulty analytics
- Student progress across modules
The naive implementation looked something like this:
// The slow way — N+1 queries everywhere
const courses = await db.query(
'SELECT * FROM courses WHERE author_id = $1',
[teacherId]
);
for (const course of courses) {
const quizzes = await db.query(
'SELECT * FROM quizzes WHERE course_id = $1',
[course.id]
);
for (const quiz of quizzes) {
const attempts = await db.query(
'SELECT * FROM student_quiz_attempts WHERE quiz_id = $1',
[quiz.id]
);
// ... compute stats per attempt
}
}
With 45 courses, 120 quizzes, and thousands of attempts, this was generating 400+ database queries per page load. Each query individually was fine (5-20ms), but the cumulative round-trip time was devastating.
Fix 1: Query Consolidation
The first optimization was obvious: stop the N+1 pattern. We consolidated the multi-query approach into single queries with JOINs and aggregations:
-- Single query for quiz performance overview
SELECT
q.id AS quiz_id,
q.title,
c.title AS course_title,
COUNT(DISTINCT sqa.student_id) AS total_students,
COUNT(DISTINCT sqa.id) AS total_attempts,
ROUND(AVG(
COALESCE(sqa.manual_graded_score, sqa.auto_graded_score)
), 2) AS avg_score,
PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY COALESCE(sqa.manual_graded_score, sqa.auto_graded_score)
) AS median_score
FROM quizzes q
JOIN courses c ON c.id = q.course_id
LEFT JOIN student_quiz_attempts sqa
ON sqa.quiz_id = q.id AND sqa.status = 'submitted'
WHERE c.author_id = $1
GROUP BY q.id, q.title, c.title
ORDER BY c.title, q.title;
This brought us from 400+ queries down to 5 queries per dashboard load. Page load dropped from 12s to around 2s.
Better, but not enterprise-grade.
Fix 2: Strategic Indexing
Running EXPLAIN ANALYZE revealed that several of our aggregation queries were doing sequential scans on large tables:
-- Before: Sequential scan on student_quiz_attempts (slow)
-- Seq Scan on student_quiz_attempts
-- (cost=0.00..18234.00 rows=95000)
-- After: Index scan (fast)
CREATE INDEX idx_sqa_quiz_status
ON student_quiz_attempts(quiz_id, status);
CREATE INDEX idx_sqa_student_quiz
ON student_quiz_attempts(student_id, quiz_id);
CREATE INDEX idx_courses_author
ON courses(author_id) WHERE status = 'published';
Targeted indexes on the most common query patterns brought our query times from 200-800ms to consistently under 30ms.
Fix 3: Redis Read-Through Cache
Even with optimized queries, recalculating analytics for every page load is wasteful. Most of this data changes infrequently — new quiz submissions come in throughout the day, but the aggregate picture changes slowly.
We implemented a Redis read-through cache:
// Cache service with automatic invalidation
class CacheService {
async getOrSet(
key: string,
fetcher: () => Promise<unknown>,
ttlSeconds: number = 300
): Promise<unknown> {
// Try cache first
const cached = await this.redis.get(key);
if (cached) return JSON.parse(cached);
// Cache miss — fetch from database
const data = await fetcher();
// Store in cache with TTL
await this.redis.set(
key, JSON.stringify(data), 'EX', ttlSeconds
);
return data;
}
async invalidate(pattern: string): Promise<void> {
const keys = await this.redis.keys(pattern);
if (keys.length > 0) {
await this.redis.del(...keys);
}
}
}
Our cache key schema encodes the tenant and resource:
courses:tenant:{tenantId} — course list (TTL: 5m)
course_structure:tenant:{tenantId}:course:{id} — full tree (TTL: 5m)
quiz_analytics:tenant:{tenantId}:quiz:{id} — performance (TTL: 2m)
students:tenant:{tenantId} — student list (TTL: 5m)
Cache invalidation happens on write operations: when a quiz is submitted, we invalidate the relevant analytics keys. This gives us eventually-consistent analytics with a maximum staleness of 2-5 minutes — perfectly acceptable for dashboards.
The Results
| Metric | Before | After | Improvement |
|---|---|---|---|
| Dashboard load time | 12,000ms | 85ms (cached) / 250ms (cold) | 140x |
| Database queries per load | 400+ | 5 | 80x |
| P95 API response time | 2,800ms | 120ms | 23x |
| Redis cache hit rate | N/A | 92% | — |
Lessons Learned
- Profile first, optimize second: Without
EXPLAIN ANALYZEand proper query logging, we would have optimized the wrong things - Caching is not a substitute for good queries: We needed both — fast queries AND caching. A slow query cached is still slow on a cache miss
- Cache invalidation is the hard part: Getting the invalidation granularity right (per-quiz vs. per-course vs. per-tenant) required careful thought about data relationships
- RLS adds overhead, but it’s worth it: PostgreSQL RLS policies add around 5-10% query overhead. We accept this as the cost of database-level security
What’s Next
We’re exploring materialized views for the heaviest analytics queries — pre-computed aggregations that refresh on a schedule rather than being computed on-demand. For institutions with 50,000+ students, this will be essential to maintain sub-second response times.
Performance engineering is never finished. But today, our teachers see their dashboards in under 100ms, and that’s what matters.
Scaling challenges with your current LMS? Let’s talk — we love solving performance problems.