-
Notifications
You must be signed in to change notification settings - Fork 599
List of tasks
Demo of reward for damaging mobs
Run hit_test.py. A simple agent with a diamond pickaxe randomly walks around an arena, which is filled with pigs and sheep. The spawners were put inside an animation object, which moves them out of range of the player after a short period of time. Otherwise they will just keep spawning and as soon as the agent kills a sheep, it will be replaced. When the agent attacks a sheep it gets a reward of 1. Alternatively when it attacks a pig it gets a reward of -1. Notice that the agent hits the sheep and often tries to avoid the pigs, in order to maximize the reward.
These lines:
los = ob[u'LineOfSight']
type=los["type"]
if type == "Sheep":
agent_host.sendCommand("attack 1")
agent_host.sendCommand("attack 0")
use the line-of-sight observation
to help the agent to determine when to hit and when not to hit.
The agent also prioritise going to areas with the largest sheep population. (Using the nearby-entities observation
-see lines 214-239.)This, again, would maximize the reward.
When the mission ends and before the next mission starts, a reward signal comes in-
Then the platform will automatically start the next mission.
Demo of rewards for building
The agent in build_test.py does the followings to complete the build challenges:
-observe the source grid and determine the state of each block, using the use the continuous turn/pitch commands and the ObservationFromRay
.
-strafing to the destination grid and using the discrete use command / inventory hotbar commands to reproduce the source grid.
In order for this to happen, the exact number of required blocks is placed into the agent's inventory, using the DrawingDecorator
/ BuildBattleDecorator
/ RewardForStructureCopying
.
The agent also gets the reward of 1 per block and the reward of 1000 for completion, while the platform automatically starts the next mission.
Build_test.py tests the following:
- Build battle decorator / reward producer
- Drawing decorator
- Inventory initialisation
- Discrete use command
- Continuous turn/pitch/strafe commands
- Observation from ray
- Inventory observations / hotkey commands
- Parsing the MissionEnded XML message
Demo of mob_spawner block
Run mob_fun.py. The agent walks around an arena filled with mobs and 20 apples that are placed randomly.
The agent receives a reward of 100 per apple collected, and it's mission is to collect all the apples while dodging the mobs in order to stay alive.
By default, the mobs don't move around at random, they hunt and attack the agent, eventually killing it. This means the agent needs to move continuously to stay alive; it prioritise going to areas with most apples-using the ObservationFromNearbyEntities
and the followings (also see lines 154-171)
):
def getBestAngle(entities, current_yaw, current_health):
'''Scan through 360 degrees, looking for the best direction in which to take the next step.'''
us = findUs(entities)
scores=[]
while current_yaw < 0:
current_yaw += 360
while current_yaw > 360:
current_yaw -= 360
The platform automatically starts each time the agent fails to dodge the mobs and dies, until it finally solves the mission and gets a larger reward.
Demonstrating use of basic crafting
Run craft_work.py. The agent receives a mission to make the rabbit stew. This requires the agent to walk around and collect/craft the items needed, which is a bowl, a cooked rabbit, a carrot, a mushroom and a baked potato. The agent receives positive rewards for collecting items and negative rewards for discarding items(see XML). Before crafting, the agent is commanded to check the inventory (eg. see line 264). To craft the items before making the final product(which is the rabbit stew), three craft commands are required:
- craft 1 cooked_rabbit = 1 rabbit + fuel
- craft 1 baked_potato = 1 potato + fuel
- craft 4 bowls = 3 planks
These can then be used to make the rabbit stew:
1 rabbit_stew = 1 x cooked_rabbit + 1 x carrot + 1 x baked_potato + 1 x brown_mushroom + 1 x bowl (agent receives reward of 1000, mission ends)