How Many R’s in Strawberry? Solving the LLM Counting Problem with Neurosymbolic AI#
Fabricio Ceolin
Independent Researcher
Abstract#
The question “How many R’s are in strawberry?” has become a viral test exposing a fundamental limitation of Large Language Models: despite impressive reasoning capabilities, LLMs consistently fail at character counting—often answering “2” when the correct answer is 3. This failure stems from tokenization, where models process subword units rather than individual characters. This article demonstrates a neurosymbolic solution using The Edge Agent (TEA) framework that achieves 100% accuracy by combining LLM-based natural language parsing with Prolog for precise symbolic counting. Our approach shows remarkable robustness: even when the LLM produces malformed output, the Prolog layer correctly counts characters, demonstrating the power of symbolic systems as a reliability backstop for neural networks.
Keywords: Neurosymbolic AI, Large Language Models, Character Counting, Prolog, Tokenization Limitations
1. Introduction#
“How many R’s are in the word strawberry?”
This simple question has become the internet’s favorite way to expose the limitations of Large Language Models. Despite their impressive capabilities in reasoning, coding, and conversation, LLMs consistently fail at this elementary character-counting task — often answering “2” or even “10” when the correct answer is 3.
This article demonstrates how neurosymbolic AI solves this problem by combining the natural language understanding of LLMs with the precise symbolic computation of Prolog. Using The Edge Agent (TEA), we’ll build an agent that:
Uses an LLM to parse the question and extract the word as individual characters
Uses Prolog to count character occurrences with mathematical certainty
2. The Problem: Why Can’t LLMs Count Letters?#
Let’s test this with a small LLM (llama3.2:1b):
Q: How many times does the letter "r" appear in "strawberry"?
llama3.2:1b: "10" ❌
Correct: "3" ✓
The word “strawberry” contains: s-t-r-a-w-b-e-r-r-y
The letter “r” appears at positions 3, 8, and 9 — that’s three times.
Why do LLMs get this wrong? Because they don’t process text character-by-character. Instead, they use tokenization, which groups characters into subword units:
"strawberry" → ["straw", "berry"] (typical tokenization)
The model never “sees” individual letters — it sees tokens. Asking an LLM to count characters is like asking someone to count individual bricks in a photograph of a wall: they can estimate, but they can’t reliably count.
3. The Neurosymbolic Solution#
Our approach separates the task into what each system does best:
Component |
Strength |
Task |
|---|---|---|
LLM |
Language understanding |
Parse the question, extract word as character list |
Prolog |
Symbolic computation |
Count exact character occurrences |
flowchart LR
A[Input Question] --> B[LLM: Extract Characters]
B --> C["char(1,s). char(2,t). char(3,r)..."]
C --> D[Prolog: Count Matches]
D --> E["Answer: 3"]
style D fill:#90EE90
4. Prerequisites#
Install Ollama and Model#
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start server and pull model
ollama serve &
ollama pull llama3.2:1b
Install The Edge Agent#
# Download AppImage (includes Prolog)
wget https://github.com/fabceolin/the_edge_agent/releases/latest/download/tea-0.8.17-x86_64.AppImage -O tea
chmod +x tea
5. The Agents#
Agent 1: LLM-Only Approach#
First, let’s see how an LLM-only solution fails:
# examples/strawberry/count-llm-nothink.yaml
name: count-letters-llm-nothink
state_schema:
word: str
letter: str
answer: str
nodes:
- name: count
uses: llm.call
with:
provider: "ollama"
model: "llama3.2:1b"
messages:
- role: user
content: |
How many times does the letter "{{ state.letter }}" appear in the word "{{ state.word }}"?
Answer with just the number.
output: llm_response
- name: format
run: |
llm_resp = state.get("llm_response", {})
if isinstance(llm_resp, dict):
content = llm_resp.get("content", "unknown")
else:
content = str(llm_resp)
return {
"answer": f"The letter '{state['letter']}' appears {content.strip()} time(s) in '{state['word']}'"
}
edges:
- from: __start__
to: count
- from: count
to: format
- from: format
to: __end__
Agent 2: Neurosymbolic Approach#
Now the neurosymbolic solution that actually works:
# examples/strawberry/count-neurosymbolic.yaml
name: count-letters-neurosymbolic
state_schema:
word: str
letter: str
characters: str
count: int
positions: list
answer: str
nodes:
# Step 1: LLM extracts characters as Prolog facts
- name: extract_chars
uses: llm.call
with:
provider: "ollama"
model: "llama3.2:1b"
messages:
- role: system
content: |
Convert a word into Prolog character facts.
Format: char(Position, Letter).
Positions start at 1. Letters must be lowercase atoms.
Example for "hello":
char(1, h). char(2, e). char(3, l). char(4, l). char(5, o).
Output ONLY the facts, one per line, no other text.
- role: user
content: |
Convert "{{ state.word }}" to character facts.
output: llm_response
# Step 1.5: Extract content from LLM response
- name: parse_response
run: |
llm_resp = state.get("llm_response", {})
if isinstance(llm_resp, dict):
characters = llm_resp.get("content", "")
else:
characters = llm_resp
return {"characters": characters}
# Step 2: Prolog counts the occurrences with certainty
- name: count_prolog
language: prolog
run: |
% Load the character facts from LLM output
state(characters, CharsText),
tea_load_code(CharsText),
% Get the target letter as an atom
state(letter, LetterStr),
atom_string(Letter, LetterStr),
% Count all matching characters
findall(Pos, char(Pos, Letter), Positions),
length(Positions, Count),
% Return the count
return(count, Count),
return(positions, Positions).
# Step 3: Format the answer
- name: format
run: |
count = state.get("count", 0)
word = state.get("word", "")
letter = state.get("letter", "")
positions = state.get("positions", [])
pos_str = ", ".join(str(p) for p in positions) if positions else "none"
return {
"answer": f"The letter '{letter}' appears {count} time(s) in '{word}' at position(s): {pos_str}"
}
edges:
- from: __start__
to: extract_chars
- from: extract_chars
to: parse_response
- from: parse_response
to: count_prolog
- from: count_prolog
to: format
- from: format
to: __end__
6. Running the Agents#
Test: The Classic “Strawberry” Question#
LLM-Only (llama3.2:1b):
./tea run examples/strawberry/count-llm-nothink.yaml \
--input '{"word": "strawberry", "letter": "r"}'
Result:
{
"answer": "The letter 'r' appears 10 time(s) in 'strawberry'"
}
Neurosymbolic (same llama3.2:1b model):
./tea run examples/strawberry/count-neurosymbolic.yaml \
--input '{"word": "strawberry", "letter": "r"}'
Result:
{
"characters": "char(1, s). char(2, r). char(3, w). char(4, a). char(5, t). char(6, r). char(7, y). char(8, b). char(9, e). char(10, r).",
"count": 3,
"positions": [2, 6, 10],
"answer": "The letter 'r' appears 3 time(s) in 'strawberry' at position(s): 2, 6, 10"
}
Results Comparison#
Approach |
Model |
Answer |
Correct? |
|---|---|---|---|
LLM-Only |
llama3.2:1b |
10 |
❌ |
Neurosymbolic |
llama3.2:1b |
3 |
✅ |
7. The Remarkable Robustness of Neurosymbolic AI#
Notice something interesting in the neurosymbolic output above? The LLM scrambled the character positions:
LLM produced:
s, r, w, a, t, r, y, b, e, r(wrong order!)Actual word:
s, t, r, a, w, b, e, r, r, y
The small model completely misordered the characters. But the neurosymbolic approach still got the correct count because:
The LLM extracted the right characters (even in wrong order)
Prolog counted how many
char(_, r)facts existThe count is correct regardless of position accuracy
This demonstrates the power of separation of concerns: even when the LLM makes transformation errors, the symbolic reasoning layer produces correct results for the counting task.
8. How It Works#
Step 1: LLM Character Extraction#
The LLM’s only job is to convert the word into structured Prolog facts:
Input: "strawberry"
Output: char(1, s). char(2, t). char(3, r). char(4, a). char(5, w).
char(6, b). char(7, e). char(8, r). char(9, r). char(10, y).
This is a task LLMs handle reasonably well — structured data transformation. The key insight is that we’re not asking the LLM to count; we’re asking it to extract characters.
Step 2: Prolog Counting#
Prolog then counts with mathematical certainty:
% Find all positions where char(Pos, r) is true
findall(Pos, char(Pos, r), Positions),
% Positions = [3, 8, 9] (or wherever 'r' appears)
% Count the list length
length(Positions, Count).
% Count = 3
There’s no guessing, no statistical inference — just logical deduction. Prolog counts facts, not estimates.
9. Why This Matters#
The strawberry problem isn’t just a party trick — it represents a class of failures that affect real applications:
Application |
Character-Level Task |
Risk of LLM Failure |
|---|---|---|
Data Validation |
Check if email has exactly one “@” |
High |
Password Rules |
Count uppercase/lowercase/digits |
High |
Text Processing |
Count word occurrences |
Medium |
Code Analysis |
Count brackets/parentheses |
High |
Compliance |
Verify document has N required fields |
Medium |
Any application requiring exact counting is a candidate for neurosymbolic augmentation.
10. Note on Reasoning Models#
Modern reasoning models (like OpenAI’s o1/o3 or DeepSeek with thinking enabled) perform significantly better on counting tasks because they use internal chain-of-thought reasoning. However:
They’re slower due to reasoning token overhead
They’re more expensive (more tokens = higher cost)
They’re not always available in edge/offline deployments
The neurosymbolic approach works with any model, including tiny 1B parameter models, because the precise computation is offloaded to Prolog.
11. The Broader Pattern#
This article demonstrates a general pattern for building reliable AI agents:
Identify the failure mode: Where does the LLM consistently fail?
Separate concerns: What requires language understanding vs. precise computation?
Choose the right tool: Prolog for logic, Lua for math, LLM for language
Compose the pipeline: Connect components through TEA’s state graph
The same pattern applies to:
Mathematical word problems: LLM extracts equations, symbolic solver computes
Date calculations: LLM parses dates, code computes differences
Graph traversal: LLM extracts relationships, Prolog finds paths
Constraint satisfaction: LLM understands requirements, solver finds solutions
12. Conclusion#
The question “How many R’s in strawberry?” has become a meme because it exposes a fundamental truth: LLMs are not computers. They’re sophisticated pattern matchers trained on text, and they process language through tokenization that obscures character-level details.
Neurosymbolic AI provides a practical solution:
Let LLMs do what they’re good at (language understanding)
Let symbolic systems do what they’re good at (precise computation)
Use frameworks like TEA to compose them into reliable pipelines
The next time someone asks you to build an AI agent that needs to count, calculate, or reason precisely, remember: the answer isn’t a bigger model — it’s a smarter architecture.
13. Try It Yourself#
# Clone the examples
git clone https://github.com/fabceolin/the_edge_agent.git
cd the_edge_agent
# Download TEA
wget https://github.com/fabceolin/the_edge_agent/releases/latest/download/tea-0.8.17-x86_64.AppImage -O tea
chmod +x tea
# Pull the model
ollama pull llama3.2:1b
# Run the neurosymbolic agent
./tea run examples/strawberry/count-neurosymbolic.yaml \
--input '{"word": "strawberry", "letter": "r"}'
14. References#
The Edge Agent (TEA) - Neurosymbolic AI framework
Ollama - Local LLM runtime
Ollama Thinking Mode - Enable/disable reasoning
SWI-Prolog - Logic programming language
Why AI Can’t Count the R’s in Strawberry - Scientific American