Press Enter to search

Intelligent Message Summarisation

A rolling window summarisation system that gives AI the ability to remember - preserving full conversational context across thousands of messages. It's how AI gains structured, scalable memory within the IXO Memory Graph Framework, turning information into understanding.

a month ago • 4 min read

By Youssef Hany

Intelligent Message Summarisation

0:00

/796.277302

The Challenge

When building AI systems that need to remember conversations, we face a critical problem:

How do you process hundreds or thousands of messages without losing important context?

Traditional approaches either summarise everything at once (losing nuance) or process messages individually (losing context). We needed something better for our memory graph system.

Our Solution: Rolling Window Summarisation with Context Injection

At IXO, we’ve developed a message summarisation system that combines the best of both worlds — efficient processing with perfect context preservation. Here’s how it works.

The Big Picture

Think of reading a book chapter by chapter, but always remembering what happened before. That’s exactly what our rolling window approach does:

Split messages into digestible chunks (windows of 10 messages)
Summarise each window while keeping the previous summary as context
Chain the summaries together so nothing gets lost
Output compressed, information-rich messages ready for memory graph construction

How It Works: Step by Step

Step 1: Window Creation

When you receive 50 messages, instead of processing them all at once, we split them into windows:

Window 1: Messages 1–10
Window 2: Messages 11–20
Window 3: Messages 21–30
And so on…

Step 2: Context Injection (The Secret Sauce)

Here’s where it gets interesting. When processing Window 2, we don’t just look at messages 11–20. We prepend Window 1’s summary:

[Window 1 Summary (4 msgs)] + [Messages 11-20 (10 msgs)] → Summarize → [Window 2 Summary (4 msgs)]

This creates a context chain that flows through your entire conversation.

No information gets lost between windows.

Step 3: AI-Powered Compression

For each window, we use a carefully crafted LLM prompt that focuses on:

✅ Information Preservation — Capture ALL key facts, decisions, and answers

✅ Entity Extraction — Keep specific names, numbers, dates, technical terms

✅ Context Continuity — Maintain the conversation flow

✅ Role Attribution — Preserve who said what

✅ Temporal Flow — Keep events in logical sequence

Step 4: Structured Output

The LLM returns structured JSON that matches our Message schema exactly:

{
"messages": [
{
"content": "Dense, information-rich summary...",
"role_type": "user",
"role": "Alice",
"timestamp": "2024-10-07T10:30:00Z",
"name": "conversation_summary_window_1"
},
...
]
}

The Results

Default Configuration: 10 messages per window → 4 summary messages

Context Retention: 100% (nothing lost between windows)

Processing Speed: Async LLM calls with 10,000 token limit per window

Error Resilience: Graceful fallbacks if summarisation fails

Why This Matters for Memory Graphs

Our memory graph system extracts entities and relationships from conversations. This summarisation approach is perfect because it:

Preserves Entity Information — Names, places, projects stay intact
Maintains Relationships — “Alice works with Bob on Project X” connections survive
Keeps Temporal Context — When things happened matters for the graph
Reduces Processing Load — Fewer messages = faster graph construction
Enables Scale — Handle massive conversation histories efficiently

Smart Features Under the Hood

Adaptive Prompting

The system knows when it’s processing the last window and adjusts its prompt:

Intermediate windows: “Maintain context for next window (Window N)”
Final window: “This is the FINAL summary — ensure complete standalone context”

Multi-Speaker Awareness

Automatically detects conversation patterns:

Single speaker: “Maintain single speaker context”
Multiple speakers: “Preserve distinct speaker perspectives (user vs assistant)”

Configurable Thresholds

await summarizer.summarize_messages_rolling(
messages=messages,
window_size=10, # Messages per window (default: 10)
summary_size=4, # Target summaries per window (default: 4)
threshold=10 # Min messages to trigger (default: 10)
)

Skip summarisation for short conversations (below threshold), customise compression for your use case.

Error Handling That Just Works

Software fails. Networks hiccup. LLMs occasionally return weird responses. We handle it:

✅ Comprehensive logging at debug, info, warning, and error levels
✅ Fallback to original messages if any window summarisation fails
✅ Graceful degradation per window (one failure doesn’t break everything)
✅ Timestamp parsing with ISO 8601 format validation

Real-World Example

Input: 50 messages from a project discussion

Process:

Window 1: 10 messages → 4 summaries
Window 2: 4 (previous) + 10 (new) = 14 messages → 4 summaries
Window 3: 4 (previous) + 10 (new) = 14 messages → 4 summaries
Window 4: 4 (previous) + 10 (new) = 14 messages → 4 summaries
Window 5: 4 (previous) + 10 (new) = 14 messages → 4 summaries

Output: 20 summary messages (all window summaries combined)

Context Preserved: Complete project context, all decisions, all participants

What could be added?

Dynamic window sizing based on conversation complexity
Semantic chunking instead of fixed-size windows
Multi-modal support for images and code snippets
Custom summarisation strategies per use case (technical docs vs casual chat)

Conclusion

Building AI systems that remember requires more than just storing messages — it requires intelligent compression that preserves what matters. Our rolling window approach with context injection solves this elegantly, enabling memory graphs that scale while maintaining perfect context.

The result? AI systems that truly understand and remember conversations, no matter how long they get.