[Action
Options]
1) north, move one step north.
2) south, move
one step south.
3) east, move one step east.
4) west,
move one step west.
5) stay, do nothing.
6) interact,
interact with an object.
[Action Output Instruction]
Think
step-by-step about your plan and output your idea, and then
output 'EXECUTE
', then give exactly one action per
agent, put each on a new line.
Example#1: Since agent1 is
holding the plate facing the cooking station, the soup is
already cooked, agent1 should use the interact action to
serve the soup, and agent0 should grab the top onion.
EXECUTE
NAME
Agent0 ACTION north
NAME Agent1 ACTION interact
Example#2:
The soup is not ready yet and agent1 has the plate in his
hand, so agent1 is not moving at this turn and agent0 should
go to the nearest place to get the onion.
EXECUTE
NAME
Agent0 ACTION east
NAME Agent1 ACTION stay
[Task
Information]
This is overcooked environment. Two agents
need to collaborate to cook soup and deliver the cooked soup
to the service desk to earn a reward.
Each soup needs 2
onions, pick up 2 onions and put them into the cooking table
for cooking,
when the cooking table shows the number of
2, the soup is finished, and you need to take a plate of
soup and deliver
it to the service desk, which can get a
bowl of soup reward, you need to think and give the actions
of two agents, to ensure that
after 30 interactions can
get a high enough reward.
[Layout Information]
The
size of the room is a 5 × 5 grid, and the overall layout is:
XXXPX
O X P
O X X
D X X
XXXSX
[Character meaning]
The
letter X stands for table, P for cooking station, O and o
stand for onion, D and d for plates, and S for service desk.
When the onion or dish is on the table or being held by
agent, an o or d will be added after its corresponding
character.
When the onion is placed on the cooking table,
it will be denoted as p{ø, p{øø means that there are two
onions on the cooking table.
And when the cooking table
cooks the soup, it will show how long it has been cooked,
such as p{ø20 means that it has been cooked in 20 time
steps.
The numbers 1 and 0 represent the agent, and the
direction arrow ↑ ↓ ← → represents the direction the agent
is facing.Each object occupies a grid size, and the agent
moves one grid distance at a time.
[Current Env state]:
X X X P X
O →1d X ←0 P{øø✓
O X X
D X X
X X X S X
Cook station local (0, 3) (1, 4),
there are 2 onions on it, soup has been cooked
Onions
local: (1, 0) (2, 0)
Dishes local: (3, 0)
Server desk
local: (4, 3)
Agent0 local: (1, 3), direction: west,
hold: nothing
Agent1 local: (1, 1), direction: east,
hold: plate
[Previous Round History]
== Previous
Round# ==
[Chat History]
[Agent0]:
To continue our efficient teamwork and serve
the cooked soup, we should have Agent1 move east to place the plate
while Agent0 do noting this time. Here's the plan:
Agent1
will move east to place the plate.
Agent0 will stay to wait for Agent1 to place the plate.
EXECUTE
NAME Agent0 ACTION stay
NAME
Agent1 ACTION east
[Evaluation
Score]:
The advantage score is high enough.
[Executed
Action]
Agent0 action: STAY
Agent1 action: EAST
==
Current Round ==
[Current Chat]
[Agent0]:
o continue our efficient teamwork and serve
the cooked soup, Agent1 should do nothing because the soup is cooked.
Meanwhile, Agent0 should pick up the dish to delivery the soup.
Here's the plan:
Agent1 will stay.
Agent0
will interact to pick up the dish to delivery the soup.
EXECUTE
NAME Agent0 ACTION interact
NAME
Agent1 ACTION stay
[Evaluation
Score]:
[Improve Feedback]
Agent0's previous response:
NAME
Agent0 ACTION interact
NAME Agent1 ACTION stay
The
advantage score score is -0.561, it is too low to run, the
answer may contain a misunderstanding of the state of the
environment, you must change your action to raise the
advantage score over -0.5!
The score is low and the plan
need to be changed!
[Improve Feedback]
Agent0's
previous response:
NAME Agent0 ACTION interact
NAME
Agent1 ACTION stay
The advantage score score is -0.561,
it is too low to run, the answer may contain a
misunderstanding of the state of the environment, you must
change your action to raise the advantage score over -0.5!