Tech Innovation 4 min read

Optimizing Our Data Pipeline: From 12-Second Queries to Sub-100ms Responses

How we redesigned Cognaxa's analytics pipeline using Redis caching, query optimization, and strategic denormalization to achieve enterprise-grade performance at scale.

F
Founders

When we launched Cognaxa’s analytics dashboard to our first enterprise client — a university with 3,000 active students and 45 courses — the teacher dashboard loaded in 12 seconds. That’s not a typo. Twelve seconds to render a page that teachers check dozens of times a day.

This is the story of how we got that down to under 100ms, and the architectural decisions that made it sustainable at 10x the scale.

The Problem: N+1 Queries and Missing Indexes

Our first analytics implementation was the classic “it works in development” anti-pattern. The teacher dashboard needed to show:

  • Course-level completion rates
  • Per-quiz score distributions
  • Question-level difficulty analytics
  • Student progress across modules

The naive implementation looked something like this:

// The slow way — N+1 queries everywhere
const courses = await db.query(
  'SELECT * FROM courses WHERE author_id = $1',
  [teacherId]
);

for (const course of courses) {
  const quizzes = await db.query(
    'SELECT * FROM quizzes WHERE course_id = $1',
    [course.id]
  );

  for (const quiz of quizzes) {
    const attempts = await db.query(
      'SELECT * FROM student_quiz_attempts WHERE quiz_id = $1',
      [quiz.id]
    );
    // ... compute stats per attempt
  }
}

With 45 courses, 120 quizzes, and thousands of attempts, this was generating 400+ database queries per page load. Each query individually was fine (5-20ms), but the cumulative round-trip time was devastating.

Fix 1: Query Consolidation

The first optimization was obvious: stop the N+1 pattern. We consolidated the multi-query approach into single queries with JOINs and aggregations:

-- Single query for quiz performance overview
SELECT
  q.id AS quiz_id,
  q.title,
  c.title AS course_title,
  COUNT(DISTINCT sqa.student_id) AS total_students,
  COUNT(DISTINCT sqa.id) AS total_attempts,
  ROUND(AVG(
    COALESCE(sqa.manual_graded_score, sqa.auto_graded_score)
  ), 2) AS avg_score,
  PERCENTILE_CONT(0.5) WITHIN GROUP (
    ORDER BY COALESCE(sqa.manual_graded_score, sqa.auto_graded_score)
  ) AS median_score
FROM quizzes q
JOIN courses c ON c.id = q.course_id
LEFT JOIN student_quiz_attempts sqa
  ON sqa.quiz_id = q.id AND sqa.status = 'submitted'
WHERE c.author_id = $1
GROUP BY q.id, q.title, c.title
ORDER BY c.title, q.title;

This brought us from 400+ queries down to 5 queries per dashboard load. Page load dropped from 12s to around 2s.

Better, but not enterprise-grade.

Fix 2: Strategic Indexing

Running EXPLAIN ANALYZE revealed that several of our aggregation queries were doing sequential scans on large tables:

-- Before: Sequential scan on student_quiz_attempts (slow)
-- Seq Scan on student_quiz_attempts
--   (cost=0.00..18234.00 rows=95000)

-- After: Index scan (fast)
CREATE INDEX idx_sqa_quiz_status
  ON student_quiz_attempts(quiz_id, status);

CREATE INDEX idx_sqa_student_quiz
  ON student_quiz_attempts(student_id, quiz_id);

CREATE INDEX idx_courses_author
  ON courses(author_id) WHERE status = 'published';

Targeted indexes on the most common query patterns brought our query times from 200-800ms to consistently under 30ms.

Fix 3: Redis Read-Through Cache

Even with optimized queries, recalculating analytics for every page load is wasteful. Most of this data changes infrequently — new quiz submissions come in throughout the day, but the aggregate picture changes slowly.

We implemented a Redis read-through cache:

// Cache service with automatic invalidation
class CacheService {
  async getOrSet(
    key: string,
    fetcher: () => Promise<unknown>,
    ttlSeconds: number = 300
  ): Promise<unknown> {
    // Try cache first
    const cached = await this.redis.get(key);
    if (cached) return JSON.parse(cached);

    // Cache miss — fetch from database
    const data = await fetcher();

    // Store in cache with TTL
    await this.redis.set(
      key, JSON.stringify(data), 'EX', ttlSeconds
    );
    return data;
  }

  async invalidate(pattern: string): Promise<void> {
    const keys = await this.redis.keys(pattern);
    if (keys.length > 0) {
      await this.redis.del(...keys);
    }
  }
}

Our cache key schema encodes the tenant and resource:

courses:tenant:{tenantId}                         — course list (TTL: 5m)
course_structure:tenant:{tenantId}:course:{id}    — full tree (TTL: 5m)
quiz_analytics:tenant:{tenantId}:quiz:{id}        — performance (TTL: 2m)
students:tenant:{tenantId}                        — student list (TTL: 5m)

Cache invalidation happens on write operations: when a quiz is submitted, we invalidate the relevant analytics keys. This gives us eventually-consistent analytics with a maximum staleness of 2-5 minutes — perfectly acceptable for dashboards.

The Results

MetricBeforeAfterImprovement
Dashboard load time12,000ms85ms (cached) / 250ms (cold)140x
Database queries per load400+580x
P95 API response time2,800ms120ms23x
Redis cache hit rateN/A92%

Lessons Learned

  1. Profile first, optimize second: Without EXPLAIN ANALYZE and proper query logging, we would have optimized the wrong things
  2. Caching is not a substitute for good queries: We needed both — fast queries AND caching. A slow query cached is still slow on a cache miss
  3. Cache invalidation is the hard part: Getting the invalidation granularity right (per-quiz vs. per-course vs. per-tenant) required careful thought about data relationships
  4. RLS adds overhead, but it’s worth it: PostgreSQL RLS policies add around 5-10% query overhead. We accept this as the cost of database-level security

What’s Next

We’re exploring materialized views for the heaviest analytics queries — pre-computed aggregations that refresh on a schedule rather than being computed on-demand. For institutions with 50,000+ students, this will be essential to maintain sub-second response times.

Performance engineering is never finished. But today, our teachers see their dashboards in under 100ms, and that’s what matters.


Scaling challenges with your current LMS? Let’s talk — we love solving performance problems.

#performance #Redis #PostgreSQL #caching #optimization
Enjoyed this post?

Engineering notes and shipping updates from the Genfinish team — one email, every other week.

Subscribe →