neuromem

Smart context management — never lose critical memory again

Python 3.9+ TypeScript 5.x MIT License Zero Core Dependencies

Every LLM has a context window limit. As conversations grow, you face truncation, redundant tokens, or brittle hand-management. neuromem solves this automatically — scoring every message for importance, summarizing old turns, and pruning only the least important content. Your agent keeps its working memory sharp.

# Install

Python

$ pip install neuromem
# With OpenAI integration:
$ pip install "neuromem[openai]"
# With LangChain:
$ pip install "neuromem[langchain]"
# Everything:
$ pip install "neuromem[all]"

TypeScript / Node.js

$ npm install neuromem
# or
$ yarn add neuromem

# Quick Start

Python

from neuromem import ContextManager

cm = ContextManager(token_budget=8000)
cm.add_system("You are a helpful assistant.")
cm.add_user("What is quantum entanglement?")
cm.add_assistant("Quantum entanglement is...")
cm.add_user("Can it enable FTL communication?")

# Returns a pruned, budget-aware message list
messages = cm.get_messages()

TypeScript

import { ContextManager } from "neuromem";

const cm = new ContextManager({ tokenBudget: 8000 });
await cm.addSystem("You are a helpful assistant.");
await cm.addUser("What is quantum entanglement?");
await cm.addAssistant("Quantum entanglement is...");

const messages = await cm.getMessages(); // pruned, ready for API

OpenAI Drop-in Wrapper

from neuromem.integrations.openai import ContextAwareOpenAI

client = ContextAwareOpenAI(
    openai_client=openai.OpenAI(),
    model="gpt-4o",
    token_budget=8000,
    system_prompt="You are a financial analysis assistant.",
)

reply = client.chat("What's the P/E ratio for AAPL?")
reply = client.chat("Compare it to the sector average.")
# History is auto-managed

# API Reference

ContextManager

ParameterTypeDefaultDescription
token_budgetint4096Max context tokens
auto_pruneboolTruePrune automatically on add
prune_thresholdfloat0.9Budget fraction that triggers auto-prune
always_keep_last_nint4Always keep this many recent turns

MessageScorer

ParameterTypeDefaultDescription
recency_decayfloat0.05Exponential decay rate
keyword_boostfloat0.25Score boost for critical keyword hits
relevance_weightfloat0.3Weight of semantic relevance component
critical_overrideboolTrueSystem messages always score 1.0

Pruner

ParameterTypeDefaultDescription
token_budgetint4096Hard token ceiling
min_score_thresholdfloat0.3Below this = candidate for summarization
always_keep_last_nint4Recent turns protected from pruning
summarize_before_pruneboolTrueTry summarization before hard drops

Scoring Formula

score = 0.35 x recency
     + 0.20 x role_weight
     + 0.15 x length_signal
     + 0.30 x semantic_relevance
     + keyword_hits x 0.25   (capped at 1.0)

# System messages always receive score = 1.0 and are never pruned.

# Key Features