AI & Industry Pulse 4 min read

Integrating LLMs Into Education Without Compromising Data Privacy

How Cognaxa leverages large language models for AI grading, question generation, and intelligent tutoring — while keeping institutional data secure and private.

R
Ravi Mehta
Lead AI Engineer

The promise of AI in education is enormous: personalized learning paths, instant feedback on essays, intelligent question generation, and tutoring that scales to thousands of students. But for enterprise clients — universities, corporate training programs, government institutions — the question isn’t can you do AI? It’s can you do AI without exposing our data?

At Cognaxa, we’ve built our AI features with a clear principle: AI augments educators, never replaces them. And your data never leaves your control.

The AI Feature Set

Before diving into the privacy architecture, here’s what our AI engine powers:

  1. AI Essay Grading — Evaluate long-form responses with confidence scores, rubric alignment, and detailed feedback
  2. Question Generation — Generate quiz questions from topic descriptions, with configurable difficulty levels
  3. Intelligent Tutoring — A chatbot that can answer student questions about course content
  4. Lesson Summarization — Auto-generate summaries from lesson content for study guides

Each of these features touches sensitive data: student responses, course IP, assessment content. Getting the privacy architecture wrong would be a dealbreaker for every enterprise customer.

Our Approach: No Training on Your Data

Let’s be explicit about what we don’t do:

  • We don’t fine-tune models on customer data
  • We don’t store prompts or completions in third-party systems beyond the API call lifecycle
  • We don’t share data between tenants — even anonymized

Our AI pipeline uses inference-only API calls to language models. The interaction model is:

User Action → Cognaxa Server → Prompt Construction → LLM API → Response → Grading Logic → Database

The key property: the LLM never sees raw student identifiers. Our prompt construction layer strips PII before sending content for evaluation.

Prompt Engineering for Education

Generic LLM prompts produce generic results. We’ve invested heavily in education-specific prompt engineering:

// Simplified AI grading prompt structure
const gradingPrompt = {
  system: `You are an expert academic grader. Evaluate the following 
    student response against the rubric provided. Return a JSON object 
    with: score (0-100), confidence (0-1), feedback (string), and 
    rubric_alignment (object mapping each criterion to a score).`,
  user: `
    Question: ${question.text}
    Rubric: ${question.rubric}
    Max Points: ${question.points}
    Student Response: ${sanitizedResponse}
  `,
};

const result = await llm.complete(gradingPrompt);
const grading = parseGradingResponse(result);

// AI grade is always a suggestion — teacher has final authority
await db.query(
  `UPDATE student_quiz_attempts 
   SET ai_suggested_score = $1, ai_feedback = $2, ai_confidence = $3
   WHERE id = $4`,
  [grading.score, grading.feedback, grading.confidence, attemptId]
);

Notice that the AI score is stored as a suggestion, not a final grade. Teachers always have override authority. This is a deliberate design choice — AI should reduce grading workload, not remove human judgment from the loop.

Multi-Model Strategy

We don’t lock ourselves (or our customers) to a single AI provider:

FeaturePrimary ModelFallback
Essay GradingGemini 1.5 ProOpenRouter (Claude)
Question GenerationGemini 1.5 Flash
Tutoring ChatbotGemini 1.5 Pro
SummarizationGemini 1.5 Flash

This multi-model approach gives us:

  • Resilience: If one provider has an outage, critical features keep working
  • Cost optimization: Use faster/cheaper models for less complex tasks
  • Future flexibility: New models can be evaluated and swapped without application changes

Data Flow Isolation

For multi-tenant deployments, AI requests are isolated at every layer:

  1. Tenant-scoped API keys: Each tenant can optionally bring their own AI API keys
  2. Request isolation: AI requests carry the tenant context — no cross-tenant prompt contamination
  3. Audit logging: Every AI interaction is logged with the tenant ID, feature used, and timestamp
  4. Rate limiting: Per-tenant AI usage limits prevent runaway costs

The Confidence Score

One of our most valuable AI features isn’t the grading itself — it’s the confidence score. Every AI-graded response comes with a 0-1 confidence value:

  • High confidence (above 0.85): The AI is fairly certain about the grade. These can be auto-accepted in bulk
  • Medium confidence (0.5–0.85): Worth a quick teacher review
  • Low confidence (below 0.5): Flagged for manual grading

This allows teachers to focus their time on the responses where human judgment matters most, rather than reviewing every single submission.

What We’re Building Next

  • RAG-based tutoring: Grounding chatbot responses in actual course content using retrieval-augmented generation
  • Plagiarism detection: AI-powered similarity analysis across submissions
  • Learning path recommendations: Personalized course suggestions based on performance patterns

The AI landscape moves fast. Our architecture is designed to adopt new capabilities without compromising the privacy guarantees that enterprise customers require. Because in education, the most intelligent system is the one students and institutions can trust.


Want to see our AI grading in action? Schedule a demo and we’ll walk you through the full pipeline.

#LLM #AI #privacy #grading #Gemini
RM
Ravi Mehta
Lead AI Engineer

Lead AI Engineer at Genfinish. Runs the evaluation pipeline for Cognaxa's AI grading and proctoring subsystems. Writes about responsible AI in assessment, evaluation rubrics, and the operational realities of running multi-model inference at scale.

Enjoyed this post?

Engineering notes and shipping updates from the Genfinish team — one email, every other week.

Subscribe →