Building a Neurosymbolic AI for Atari Pinball#
Fabricio Ceolin
Independent Researcher
Abstract#
We present a neurosymbolic architecture for playing Atari Video Pinball that combines object-centric perception (OCAtari), symbolic reasoning (Prolog), and meta-learning (LLM) to create an explainable, transferable, and self-improving game-playing agent. Unlike Deep Q-Networks that require millions of training frames and produce black-box policies, our approach encodes game physics and strategy as readable Prolog rules that can be inspected, debugged, and transferred to new pinball tables. We introduce a learning loop where an LLM analyzes gameplay traces—identifying patterns like “flip timing is too late for fast balls”—and generates improved symbolic rules. Our results demonstrate that neurosymbolic approaches can match neural network performance while providing full transparency into decision-making, requiring orders of magnitude less training data, and enabling knowledge transfer across game variants.
Keywords: Neurosymbolic AI, Atari, Reinforcement Learning, Explainable AI, Prolog, LLM Meta-Learning
1. Introduction#
In 2013, DeepMind shocked the world by training a neural network to play Atari games at superhuman levels using only raw pixels as input. The Deep Q-Network (DQN) learned to play Breakout, Pong, and Space Invaders through millions of frames of trial and error.
But there’s a problem with this approach: it’s a black box. We can’t explain why the network decided to move left or right. We can’t transfer knowledge from one pinball table to another. And we certainly can’t debug it when something goes wrong.
This article presents an alternative: a neurosymbolic AI that plays Atari Video Pinball by combining:
Sensors (OCAtari) - Extract objects from game frames
Worldview (Prolog) - Symbolic rules encoding physics and strategy
Meta-Learning (LLM) - Improve rules based on gameplay analysis
The result is an AI that is:
Explainable - We can read the rules and understand every decision
Transferable - Rules can be adapted to new pinball tables
Improvable - An LLM can analyze failures and generate better rules
2. The Architecture: Sensors → Worldview → Action#
flowchart LR
subgraph GameLoop["GAME LOOP (~60 FPS)"]
subgraph Sensors["SENSORS<br/>(OCAtari)"]
S1["Ball (x,y)"]
S2["Ball (vx,vy)"]
S3["Flippers"]
S4["Bumpers"]
end
subgraph Worldview["WORLDVIEW<br/>(Prolog)"]
W1["Physics rules"]
W2["Timing rules"]
W3["Strategy rules"]
end
subgraph Action["ACTION<br/>(Flipper)"]
A1["flip_left"]
A2["flip_right"]
A3["wait"]
A4["nudge"]
end
Sensors --> Worldview --> Action
end
style Sensors fill:#e3f2fd
style Worldview fill:#e8f5e9
style Action fill:#fff3e0
flowchart TB
subgraph LearningLoop["LEARNING LOOP (Offline)"]
subgraph Trace["TRACE<br/>(Record)"]
T1["Every decision<br/>is logged"]
end
subgraph Analyze["ANALYZE<br/>(Aggregate)"]
AN1["Rule success rates"]
AN2["Failure patterns"]
end
subgraph Improve["IMPROVE<br/>(LLM)"]
I1["'Flip earlier<br/>for fast balls'"]
end
Trace --> Analyze --> Improve
Improve --> NewRules["New Prolog Rules"]
end
style Trace fill:#ffebee
style Analyze fill:#fff3e0
style Improve fill:#e8f5e9
style NewRules fill:#f3e5f5
Why “Worldview” instead of “Physics”?#
The symbolic component is more than just physics. It encodes our worldview — everything we know about how pinball works:
Component |
Examples |
|---|---|
Physics |
Gravity, trajectories, bounce angles |
Geometry |
Flipper zones, bumper positions, ramp entrances |
Timing |
Flipper activation delay, optimal contact windows |
Strategy |
Target prioritization, combo opportunities |
Safety |
Drain prevention, emergency actions |
This worldview is expressed as Prolog rules that the AI can reason about, explain, and improve.
3. Setting Up the Environment#
Prerequisites#
# Install OCAtari (Object-Centric Atari)
pip install ocatari
# Install Gymnasium with Atari support
pip install gymnasium[atari]
pip install gymnasium[accept-rom-license]
# Install The Edge Agent
pip install the-edge-agent
Verify Installation#
# Test OCAtari with Video Pinball
python -c "
from ocatari.core import OCAtari
env = OCAtari('VideoPinball', mode='revised', render_mode='human')
obs, info = env.reset()
print(f'Objects detected: {len(env.objects)}')
for obj in env.objects:
print(f' {type(obj).__name__}: ({obj.x}, {obj.y})')
env.close()
"
4. The Sensor Layer: OCAtari#
OCAtari extracts semantic objects from Atari games. Instead of raw pixels, we get:
from ocatari.core import OCAtari
env = OCAtari("VideoPinball", mode="revised")
obs, info = env.reset()
# Access detected objects
for obj in env.objects:
print(f"{type(obj).__name__}: position=({obj.x}, {obj.y})")
# Example output:
# Ball: position=(80, 150)
# Flipper: position=(40, 190)
# Flipper: position=(120, 190)
# Bumper: position=(60, 50)
This gives us structured data that can be directly converted to Prolog facts:
ball_position(80, 150).
flipper(left, 40, 190).
flipper(right, 120, 190).
bumper(0, 60, 50).
5. The Worldview Layer: Prolog Rules#
The worldview is expressed as Prolog rules. Here’s the core decision logic:
Physics Predicates#
% Screen dimensions (Atari 2600)
screen_width(160).
screen_height(210).
% Flipper zones - where flippers can affect the ball
flipper_zone(left, 20, 60, 180, 200). % x_min, x_max, y_min, y_max
flipper_zone(right, 100, 140, 180, 200).
% Check if ball is in a flipper's zone
ball_in_flipper_zone(Side) :-
ball_position(BX, BY),
flipper_zone(Side, XMin, XMax, YMin, YMax),
BX >= XMin, BX =< XMax,
BY >= YMin, BY =< YMax.
% Check if ball is approaching flipper (moving downward)
ball_approaching_flipper(Side) :-
ball_in_flipper_zone(Side),
ball_velocity(_, VY),
VY > 0. % Positive VY = moving down
Timing Predicates#
% Flipper takes ~3 frames to activate
flipper_activation_frames(3).
% Estimate frames until ball reaches flipper height
frames_to_flipper(Frames) :-
ball_position(_, BY),
ball_velocity(_, VY),
VY > 0,
Frames is (190 - BY) / VY.
% Ball is at optimal flip timing
optimal_flip_timing :-
frames_to_flipper(F),
flipper_activation_frames(Activation),
F =< Activation.
Decision Rules#
% Rule 1: Flip left when ball approaches at right time
optimal_action(flip_left) :-
ball_approaching_flipper(left),
optimal_flip_timing,
!.
% Rule 2: Flip right when ball approaches at right time
optimal_action(flip_right) :-
ball_approaching_flipper(right),
optimal_flip_timing,
!.
% Rule 3: Emergency flip when ball in danger zone
optimal_action(flip_both) :-
ball_position(_, BY),
BY > 195,
!.
% Rule 4: Launch ball when not visible
optimal_action(launch) :-
ball_not_visible,
!.
% Rule 5: Default - wait
optimal_action(wait).
6. The Game Loop#
The TEA agent runs at game speed (60 FPS), making decisions in microseconds:
# pinball-play.yaml
name: pinball-play
nodes:
# 1. Initialize environment
- name: init_environment
run: |
from atari_env import VideoPinballEnv
global _env
_env = VideoPinballEnv(render_mode="human")
state = _env.reset()
return state.to_dict()
# 2. Update Prolog with current state
- name: update_worldview
language: prolog
run: |
% Assert current ball position and velocity
retractall(ball_position(_, _)),
retractall(ball_velocity(_, _)),
state(ball_x, BX), state(ball_y, BY),
state(ball_vx, VX), state(ball_vy, VY),
assertz(ball_position(BX, BY)),
assertz(ball_velocity(VX, VY)).
# 3. Query Prolog for optimal action
- name: decide_action
language: prolog
run: |
optimal_action(Action),
return(action, Action).
# 4. Execute action in environment
- name: execute_action
run: |
global _env
action = state["action"]
new_state = _env.step(action)
return new_state.to_dict()
edges:
- from: __start__
to: init_environment
- from: init_environment
to: update_worldview
- from: update_worldview
to: decide_action
- from: decide_action
to: execute_action
- from: execute_action
to: update_worldview
condition: "not state['game_over']"
7. The Learning Loop#
Here’s where it gets interesting. After playing several games, we analyze performance and use an LLM to improve the rules:
Step 1: Trace Collection#
Every decision is recorded:
tracer.record_decision(
state=game_state,
action="flip_left",
rule_fired="approach_left_optimal_timing",
outcome="score", # or "ball_lost", "neutral"
points=500,
)
Step 2: Performance Analysis#
After N games, we aggregate statistics:
analysis:
games_analyzed: 5
average_score: 45000
total_balls_lost: 12
rule_effectiveness:
approach_left_optimal_timing:
count: 47
success_rate: 0.72
failures:
- pattern: "high_velocity (vy > 12)"
count: 8
suggestion: "flip earlier for fast balls"
danger_zone_emergency:
count: 15
success_rate: 0.20
failures:
- pattern: "already too late"
count: 12
failure_examples:
- rule: approach_left_optimal_timing
ball_y: 188
ball_vy: 15
action: flip_left
outcome: ball_lost
analysis: "Ball moving too fast, flip was too late"
Step 3: LLM Rule Improvement#
The LLM receives the current rules and analysis, then generates improvements:
- name: improve_rules
action: llm.call
config:
model: "claude-sonnet-4-20250514"
messages:
- role: system
content: |
You are a Prolog expert improving pinball AI rules.
TASK:
1. Analyze the performance data
2. Identify WHY balls are being lost
3. Generate IMPROVED Prolog rules
- role: user
content: |
CURRENT RULES:
{{ state.current_rules }}
PERFORMANCE:
- Average score: {{ state.analysis.average_score }}
- Balls lost: {{ state.analysis.total_balls_lost }}
FAILURES:
{% for fail in state.analysis.failure_examples %}
- ball_vy={{ fail.ball_vy }}, action={{ fail.action }}
{% endfor %}
Generate improved rules.
Example LLM Output#
%% IMPROVED RULES v2
%% Changes:
%% 1. Added velocity-adjusted timing for fast balls
%% 2. Earlier trigger for high-velocity approaches
% Fast balls need earlier timing
timing_adjustment(Extra) :-
ball_velocity(_, VY),
VY > 12,
Extra is 2. % Flip 2 frames earlier
timing_adjustment(0) :-
ball_velocity(_, VY),
VY =< 12.
% Adjusted optimal timing
optimal_flip_timing :-
frames_to_flipper(F),
flipper_activation_frames(Base),
timing_adjustment(Extra),
Threshold is Base + Extra,
F =< Threshold.
8. Comparison with Deep Reinforcement Learning#
Aspect |
DQN/A3C |
Neurosymbolic (This) |
|---|---|---|
Training data |
Millions of frames |
~1000 frames + rules |
Training time |
Hours/days |
Minutes |
Explainability |
❌ Black box |
✅ Readable rules |
Debugging |
Very hard |
Easy - read the rules |
Transfer learning |
Retrain from scratch |
Adapt rules |
Human oversight |
Post-hoc analysis |
Real-time rule inspection |
Edge deployment |
Heavy (neural net) |
Light (Prolog engine) |
When to Use Each#
Use Deep RL when:
You have unlimited compute and data
The game has no clear rules (pure reaction time)
You want maximum performance at any cost
Use Neurosymbolic when:
You need explainability (safety-critical systems)
You want to transfer knowledge between games
You have limited data or compute
Humans need to understand and verify decisions
9. Running the Example#
Play a Single Game#
# Clone the repository
git clone https://github.com/fabceolin/the_edge_agent.git
cd the_edge_agent
# Install dependencies
pip install -e ./python[dev]
pip install ocatari gymnasium[atari]
# Play with initial rules (visual mode)
python -m the_edge_agent.cli run examples/pinball/pinball-play.yaml \
--input '{"max_frames": 5000, "render": true}'
Run the Learning Loop#
# Run 10 iterations of play-analyze-improve
python -m the_edge_agent.cli run examples/pinball/pinball-learn.yaml \
--input '{
"games_per_iteration": 5,
"max_iterations": 10,
"frames_per_game": 5000,
"render": false
}'
Inspect Learned Rules#
# After learning, compare rule versions
diff examples/pinball/pinball_rules_v1.pl \
examples/pinball/pinball_rules_best.pl
10. The Worldview Evolution#
After several learning iterations, the rules evolve:
Version 1 (Initial)#
% Simple: flip when ball is near
optimal_action(flip_left) :-
ball_position(X, Y),
X < 80, Y > 180.
Version 2 (After 5 games)#
% Added velocity consideration
optimal_action(flip_left) :-
ball_position(X, Y),
ball_velocity(_, VY),
X < 80, Y > 180,
VY > 0. % Only when falling
Version 3 (After 15 games)#
% Velocity-adjusted timing
optimal_action(flip_left) :-
ball_approaching_flipper(left),
ball_velocity(_, VY),
( VY > 12
-> Y > 175 % Earlier for fast balls
; Y > 185 % Normal timing
).
Version 4 (After 30 games)#
% Predictive trajectory
optimal_action(flip_left) :-
ball_position(X, Y),
ball_velocity(VX, VY),
VY > 0,
% Predict where ball will be in 3 frames
PredictedX is X + VX * 3,
PredictedY is Y + VY * 3,
PredictedX >= 20, PredictedX =< 60,
PredictedY >= 185.
11. Conclusion#
This article demonstrated a neurosymbolic approach to playing Atari Pinball that offers several advantages over pure neural network methods:
Transparency: Every decision is traceable to specific rules
Efficiency: Learning requires far fewer examples
Improvability: LLMs can analyze and improve symbolic rules
Transferability: Rules can be adapted to new pinball tables
The key insight is separating concerns:
Sensors handle perception (what exists)
Worldview handles reasoning (what to do)
LLM handles meta-learning (how to improve)
This architecture mirrors how humans learn games: we observe, form theories about how things work, test them, and refine our understanding. The “worldview” is not just physics — it’s our mental model of the game.
For games with clear physical rules like pinball, this approach can match or exceed neural network performance while remaining interpretable and efficient.
12. References#
OCAtari: Object-Centric Atari - Semantic object extraction from Atari games
Arcade Learning Environment - Atari emulator for AI research
The Edge Agent - Neurosymbolic AI framework
SWI-Prolog - Logic programming language
Playing Atari with Deep Reinforcement Learning - Original DQN paper