Every LLM has a context window limit. As conversations grow, you face truncation, redundant tokens, or brittle hand-management. neuromem solves this automatically — scoring every message for importance, summarizing old turns, and pruning only the least important content. Your agent keeps its working memory sharp.
from neuromem import ContextManager
cm = ContextManager(token_budget=8000)
cm.add_system("You are a helpful assistant.")
cm.add_user("What is quantum entanglement?")
cm.add_assistant("Quantum entanglement is...")
cm.add_user("Can it enable FTL communication?")
# Returns a pruned, budget-aware message list
messages = cm.get_messages()
TypeScript
import { ContextManager } from "neuromem";
const cm = new ContextManager({ tokenBudget: 8000 });
await cm.addSystem("You are a helpful assistant.");
await cm.addUser("What is quantum entanglement?");
await cm.addAssistant("Quantum entanglement is...");
const messages = await cm.getMessages(); // pruned, ready for API
OpenAI Drop-in Wrapper
from neuromem.integrations.openai import ContextAwareOpenAI
client = ContextAwareOpenAI(
openai_client=openai.OpenAI(),
model="gpt-4o",
token_budget=8000,
system_prompt="You are a financial analysis assistant.",
)
reply = client.chat("What's the P/E ratio for AAPL?")
reply = client.chat("Compare it to the sector average.")
# History is auto-managed
# API Reference
ContextManager
Parameter
Type
Default
Description
token_budget
int
4096
Max context tokens
auto_prune
bool
True
Prune automatically on add
prune_threshold
float
0.9
Budget fraction that triggers auto-prune
always_keep_last_n
int
4
Always keep this many recent turns
MessageScorer
Parameter
Type
Default
Description
recency_decay
float
0.05
Exponential decay rate
keyword_boost
float
0.25
Score boost for critical keyword hits
relevance_weight
float
0.3
Weight of semantic relevance component
critical_override
bool
True
System messages always score 1.0
Pruner
Parameter
Type
Default
Description
token_budget
int
4096
Hard token ceiling
min_score_threshold
float
0.3
Below this = candidate for summarization
always_keep_last_n
int
4
Recent turns protected from pruning
summarize_before_prune
bool
True
Try summarization before hard drops
Scoring Formula
score = 0.35 x recency
+ 0.20 x role_weight
+ 0.15 x length_signal
+ 0.30 x semantic_relevance
+ keyword_hits x 0.25 (capped at 1.0)
# System messages always receive score = 1.0 and are never pruned.