The Surprising Impact of Memory Order on LLM Responses

Date

June 9, 2025

Author

Jinkun Chen

What We Discovered: LLMs Have "Path Dependence"
Our Experimental Setup
How We Measured This Effect
The Most Interesting Findings
Why This Matters
What's Next?
The Takeaway

Ever notice how the order of information affects how you think about a problem? Turns out, large language models (LLMs) have the same quirk - and it's more significant than we expected!

What We Discovered: LLMs Have "Path Dependence"

We've been experimenting with memory-augmented LLMs, and we stumbled upon something fascinating: the order in which we feed memory snippets to the model dramatically changes its output - even when the actual information is identical.

I call this "path dependence" - the LLM's journey through information matters just as much as the information itself.

Semantic similarity heatmap showing how different memory paths affect output similarity

Look at this heatmap above - it shows how responses vary when we change the order of memory slots. The darker areas show greater differences in outputs. Pretty striking, right?

Our Experimental Setup

Here's how we tested this:

We created a set of memory slots containing factual information (e.g., about FDR's New Deal and Reagan's economic policies)
We arranged these slots in different sequences (paths)
We asked GPT-4 identical questions with these different memory paths
We measured the differences in outputs using embedding similarity

For example, one of our memory slots looked like this:

[Memory Slot] FDR launched the New Deal to combat the Great Depression. It involved government intervention and public programs.

And we'd change the order of 4-5 such slots to see how it affected the response to questions like "Compare FDR's New Deal with Reagan's economic policy."

How We Measured This Effect

We tracked several metrics:

Semantic differences using sentence embeddings (all-MiniLM-L6-v2)
Structural changes including word count, sentence count, and average sentence length
Quality scores using GPT-4 to evaluate structure, reasoning, and conclusion clarity (0-10 scale)

Check out this visualization of our results:

PCA style clusters showing different writing styles emerging from different memory paths

Each dot represents an LLM response, clustered by writing style. Notice how they form distinct groups? Different memory paths actually pushed the LLM toward different writing styles!

The Most Interesting Findings

Here's what jumped out at us:

First impressions matter: When we put FDR-related slots first, responses tended to frame comparisons from a government intervention perspective. When Reagan-related slots came first, responses emphasized market-based approaches.
Style shifting: Some memory paths consistently produced verbose responses (avg. 150+ words), while others led to more concise writing (avg. 100-120 words).
Quality remained solid: Despite variations, the responses maintained good quality scores (8-10 range) across different paths.
Noise tolerance: Adding irrelevant memory slots (like facts about the moon landing) increased variability but didn't completely derail responses.

This scatter plot shows the relationship between similarity scores and structural features:

Scatter plot showing relationship between similarity scores and structural features

Why This Matters

If you're building or using LLM systems, this has real implications:

Consistency challenges: If you need reliable, predictable outputs, you need to think about memory ordering
Design opportunities: This could be used creatively to generate diverse perspectives on the same information
User awareness: Users should understand that the way they present information to LLMs affects what they get back

What's Next?

We're exploring several directions:

Building memory routing systems that can provide more consistent outputs
Testing whether some models are more "path dependent" than others
Finding ways to use this property as a feature rather than a bug

The Takeaway

The next time you interact with an LLM assistant, remember that the order of information matters - a lot! This isn't just an academic curiosity; it's a fundamental aspect of how these systems process information.

As we continue to integrate LLMs into more aspects of our lives, understanding these quirks becomes increasingly important. Path dependence isn't just a technical issue - it's a reminder that LLMs, like humans, don't process information in a purely logical, order-independent way.

What do you think? Have you noticed this effect in your interactions with LLMs?