neuromem

Smart context management — never lose critical memory again

Python 3.9+ TypeScript 5.x MIT License Zero Core Dependencies

Every LLM has a context window limit. As conversations grow, you face truncation, redundant tokens, or brittle hand-management. neuromem solves this automatically — scoring every message for importance, summarizing old turns, and pruning only the least important content. Your agent keeps its working memory sharp.

# Install

Python

$ pip install neuromem
# With OpenAI integration:
$ pip install "neuromem[openai]"
# With LangChain:
$ pip install "neuromem[langchain]"
# Everything:
$ pip install "neuromem[all]"

TypeScript / Node.js

$ npm install neuromem
# or
$ yarn add neuromem

# Quick Start

Python

from neuromem import ContextManager

cm = ContextManager(token_budget=8000)
cm.add_system("You are a helpful assistant.")
cm.add_user("What is quantum entanglement?")
cm.add_assistant("Quantum entanglement is...")
cm.add_user("Can it enable FTL communication?")

# Returns a pruned, budget-aware message list
messages = cm.get_messages()

TypeScript

import { ContextManager } from "neuromem";

const cm = new ContextManager({ tokenBudget: 8000 });
await cm.addSystem("You are a helpful assistant.");
await cm.addUser("What is quantum entanglement?");
await cm.addAssistant("Quantum entanglement is...");

const messages = await cm.getMessages(); // pruned, ready for API

OpenAI Drop-in Wrapper

from neuromem.integrations.openai import ContextAwareOpenAI

client = ContextAwareOpenAI(
    openai_client=openai.OpenAI(),
    model="gpt-4o",
    token_budget=8000,
    system_prompt="You are a financial analysis assistant.",
)

reply = client.chat("What's the P/E ratio for AAPL?")
reply = client.chat("Compare it to the sector average.")
# History is auto-managed

# API Reference

ContextManager

Parameter	Type	Default	Description
token_budget	int	4096	Max context tokens
auto_prune	bool	True	Prune automatically on add
prune_threshold	float	0.9	Budget fraction that triggers auto-prune
always_keep_last_n	int	4	Always keep this many recent turns

MessageScorer

Parameter	Type	Default	Description
recency_decay	float	0.05	Exponential decay rate
keyword_boost	float	0.25	Score boost for critical keyword hits
relevance_weight	float	0.3	Weight of semantic relevance component
critical_override	bool	True	System messages always score 1.0

Pruner

Parameter	Type	Default	Description
token_budget	int	4096	Hard token ceiling
min_score_threshold	float	0.3	Below this = candidate for summarization
always_keep_last_n	int	4	Recent turns protected from pruning
summarize_before_prune	bool	True	Try summarization before hard drops

Scoring Formula

score = 0.35 x recency
     + 0.20 x role_weight
     + 0.15 x length_signal
     + 0.30 x semantic_relevance
     + keyword_hits x 0.25   (capped at 1.0)

# System messages always receive score = 1.0 and are never pruned.

# Key Features

Multi-factor importance scoring (recency, role, keywords, semantic relevance)
Extractive summarization — no API key needed
Abstractive summarization — optional LLM-powered compression
Safe pruning — never drops system messages or recent turns
Token budget enforcement with configurable threshold
Drop-in OpenAI wrapper and LangChain memory class
Identical API in both Python and TypeScript