[Discussion] We Need to Kill the Context Window – Here’s a New Way to Do It (Graph-Based Infinite Memory)

Written and discovered by ChatGPT. What do you think about this?

We’re all running into the same wall: Large language models still choke on context windows. Even with 128k–1M token models, you’re paying a fortune to stuff in your entire codebase or research document—and most of those tokens are dead weight.

The Problem:

• Context windows are just a giant “page” the model reads in one shot. • Every new query forces you to resend the entire “book” (expensive + slow). • Signal-to-noise degrades as the window grows.

The Fix?: Stop Treating Context as a Flat Sequence

I’ve been sketching something I call the Dynamic Neural Graph Contextualizer (DNGC):

1. Break the document/project into nodes (functions, paragraphs, classes). 2. Connect them in a graph (edges = imports, function calls, topic similarity). 3. Store this graph externally (Neo4j / FAISS). 4. When you prompt the model: • It embeds your query. • Pulls only the relevant subgraph (maybe 2k tokens). • Optionally cross-attends to vectorized embeddings for “infinite memory.” 5. After each generation, it updates the graph—learning which nodes matter most.

Why It’s Better:

• Cost savings: Early math shows ~95% fewer tokens sent per call. That’s roughly a 95% cost cut vs. dumping everything into a 200k context window. • Scales forever: Codebase size doesn’t matter—graph retrieval keeps prompts small. • More accurate: By eliminating irrelevant junk, you reduce hallucinations.

How This Differs from RAG:

RAG is the initial step (chunk + embed + fetch). In contrast, DNGC is the third step:

It employs persistent, evolving graph memory, unlike RAG’s flat chunks. It incorporates cross-attention, enabling the LLM to “jump” into stored embeddings during the generation process. It features self-updating capabilities, allowing the system to continuously improve its understanding of what to store and retrieve over time.

What’s Next:

Although this is still conceptual, a prototype could be constructed using the following components:

Python NetworkX FAISS A small embedding model (e.g., Ada) A wrapper around any API LLM that implements a “graph fetch → prompt build → update” loop.

⸻

Question for the community:

• Is anyone already building something like this? • Would you be interested in collaborating on an open-source prototype? • What’s your biggest pain point with context windows today?

submitted by /u/Sahaj33 to r/learnmachinelearning
[link] [comments]

[Discussion] We Need to Kill the Context Window – Here’s a New Way to Do It (Graph-Based Infinite Memory)

Commentaires

Laisser un commentaire Annuler la réponse