The Challenge
When building AI systems that need to remember conversations, we face a critical problem:
How do you process hundreds or thousands of messages without losing important context?
Traditional approaches either summarise everything at once (losing nuance) or process messages individually (losing context). We needed something better for our memory graph system.
Our Solution: Rolling Window Summarisation with Context Injection
At IXO, we’ve developed a message summarisation system that combines the best of both worlds — efficient processing with perfect context preservation. Here’s how it works.
The Big Picture
Think of reading a book chapter by chapter, but always remembering what happened before. That’s exactly what our rolling window approach does:
- Split messages into digestible chunks (windows of 10 messages)
- Summarise each window while keeping the previous summary as context
- Chain the summaries together so nothing gets lost
- Output compressed, information-rich messages ready for memory graph construction
How It Works: Step by Step
Step 1: Window Creation
When you receive 50 messages, instead of processing them all at once, we split them into windows:
- Window 1: Messages 1–10
- Window 2: Messages 11–20
- Window 3: Messages 21–30
- And so on…
Step 2: Context Injection (The Secret Sauce)
Here’s where it gets interesting. When processing Window 2, we don’t just look at messages 11–20. We prepend Window 1’s summary:
[Window 1 Summary (4 msgs)] + [Messages 11-20 (10 msgs)] → Summarize → [Window 2 Summary (4 msgs)]
This creates a context chain that flows through your entire conversation.
No information gets lost between windows.
Step 3: AI-Powered Compression
For each window, we use a carefully crafted LLM prompt that focuses on:
✅ Information Preservation — Capture ALL key facts, decisions, and answers
✅ Entity Extraction — Keep specific names, numbers, dates, technical terms
✅ Context Continuity — Maintain the conversation flow
✅ Role Attribution — Preserve who said what
✅ Temporal Flow — Keep events in logical sequence
Step 4: Structured Output
The LLM returns structured JSON that matches our Message schema exactly:
{
"messages": [
{
"content": "Dense, information-rich summary...",
"role_type": "user",
"role": "Alice",
"timestamp": "2024-10-07T10:30:00Z",
"name": "conversation_summary_window_1"
},
...
]
}
The Results
Default Configuration: 10 messages per window → 4 summary messages
Context Retention: 100% (nothing lost between windows)
Processing Speed: Async LLM calls with 10,000 token limit per window
Error Resilience: Graceful fallbacks if summarisation fails
Why This Matters for Memory Graphs
Our memory graph system extracts entities and relationships from conversations. This summarisation approach is perfect because it:
- Preserves Entity Information — Names, places, projects stay intact
- Maintains Relationships — “Alice works with Bob on Project X” connections survive
- Keeps Temporal Context — When things happened matters for the graph
- Reduces Processing Load — Fewer messages = faster graph construction
- Enables Scale — Handle massive conversation histories efficiently
Smart Features Under the Hood
Adaptive Prompting
The system knows when it’s processing the last window and adjusts its prompt:
- Intermediate windows: “Maintain context for next window (Window N)”
- Final window: “This is the FINAL summary — ensure complete standalone context”
Multi-Speaker Awareness
Automatically detects conversation patterns:
- Single speaker: “Maintain single speaker context”
- Multiple speakers: “Preserve distinct speaker perspectives (user vs assistant)”
Configurable Thresholds
await summarizer.summarize_messages_rolling(
messages=messages,
window_size=10, # Messages per window (default: 10)
summary_size=4, # Target summaries per window (default: 4)
threshold=10 # Min messages to trigger (default: 10)
)
Skip summarisation for short conversations (below threshold), customise compression for your use case.
Error Handling That Just Works
Software fails. Networks hiccup. LLMs occasionally return weird responses. We handle it:
- ✅ Comprehensive logging at debug, info, warning, and error levels
- ✅ Fallback to original messages if any window summarisation fails
- ✅ Graceful degradation per window (one failure doesn’t break everything)
- ✅ Timestamp parsing with ISO 8601 format validation
Real-World Example
Input: 50 messages from a project discussion
Process:
- Window 1: 10 messages → 4 summaries
- Window 2: 4 (previous) + 10 (new) = 14 messages → 4 summaries
- Window 3: 4 (previous) + 10 (new) = 14 messages → 4 summaries
- Window 4: 4 (previous) + 10 (new) = 14 messages → 4 summaries
- Window 5: 4 (previous) + 10 (new) = 14 messages → 4 summaries
Output: 20 summary messages (all window summaries combined)
Context Preserved: Complete project context, all decisions, all participants
What could be added?
- Dynamic window sizing based on conversation complexity
- Semantic chunking instead of fixed-size windows
- Multi-modal support for images and code snippets
- Custom summarisation strategies per use case (technical docs vs casual chat)
Conclusion
Building AI systems that remember requires more than just storing messages — it requires intelligent compression that preserves what matters. Our rolling window approach with context injection solves this elegantly, enabling memory graphs that scale while maintaining perfect context.
The result? AI systems that truly understand and remember conversations, no matter how long they get.