We’re all running into the same wall: Large language models still choke on context windows. Even with 128k–1M token models, you’re paying a fortune to stuff in your entire codebase or research document—and most of those tokens are dead weight.
The Problem:
• Context windows are just a giant “page” the model reads in one shot. • Every new query forces you to resend the entire “book” (expensive + slow). • Signal-to-noise degrades as the window grows.
The Fix?: Stop Treating Context as a Flat Sequence
I’ve been sketching something I call the Dynamic Neural Graph Contextualizer (DNGC):
1. Break the document/project into nodes (functions, paragraphs, classes). 2. Connect them in a graph (edges = imports, function calls, topic similarity). 3. Store this graph externally (Neo4j / FAISS). 4. When you prompt the model: • It embeds your query. • Pulls only the relevant subgraph (maybe 2k tokens). • Optionally cross-attends to vectorized embeddings for “infinite memory.” 5. After each generation, it updates the graph—learning which nodes matter most.
Why It’s Better:
• Cost savings: Early math shows ~95% fewer tokens sent per call. That’s roughly a 95% cost cut vs. dumping everything into a 200k context window. • Scales forever: Codebase size doesn’t matter—graph retrieval keeps prompts small. • More accurate: By eliminating irrelevant junk, you reduce hallucinations.
How This Differs from RAG:
RAG is the initial step (chunk + embed + fetch). In contrast, DNGC is the third step:
It employs persistent, evolving graph memory, unlike RAG’s flat chunks. It incorporates cross-attention, enabling the LLM to “jump” into stored embeddings during the generation process. It features self-updating capabilities, allowing the system to continuously improve its understanding of what to store and retrieve over time.
What’s Next:
Although this is still conceptual, a prototype could be constructed using the following components:
Python NetworkX FAISS A small embedding model (e.g., Ada) A wrapper around any API LLM that implements a “graph fetch → prompt build → update” loop.
⸻
Question for the community:
• Is anyone already building something like this? • Would you be interested in collaborating on an open-source prototype? • What’s your biggest pain point with context windows today?
submitted by /u/Sahaj33 to r/learnmachinelearning
[link] [comments]
Laisser un commentaire