What if we set GPT-4 free in Minecraft?
Comments
> 9) Do not write infinite loops or recursive functions.
> Sometimes GPT-4 will write an infinite loop that runs forever.
>You are a helpful assistant that tells me the next immediate task to do in Minecraft. My ultimate goal is to discover as many diverse things as possible, accomplish as many diverse tasks as possible and become the best Minecraft player in the world.
>8) Tasks that require information beyond the player's status to verify should be avoided. For instance, "Placing 4 torches" and "Dig a 2x1x2 hole" are not ideal since they require visual confirmation from the screen. All the placing, building, planting, and trading tasks should be avoided. Do not propose task starting with these keywords
>7) Use `exploreUntil(bot, direction, maxDistance, callback)` when you cannot find something. You should frequently call this before mining blocks or killing mobs. You should select a direction at random every time instead of constantly using (1, 0, 1).
>9) Do not write infinite loops or recursive functions.
You can really imagine the sorts of pitfalls the agent fell into that induced the authors to add these stipulations.
The Minecraft videos are impressive.
Nethack (https://www.nethack.org/) has been used for AI development in the past and more recently:
http://shelf2.library.cmu.edu/Tech/9997774.pdf
https://portfolios.cs.earlham.edu/wp-content/uploads/2018/12...
https://arxiv.org/abs/2211.00539
https://proceedings.neurips.cc/paper/2020/hash/569ff987c643b...
https://github.com/facebookresearch/nle
https://ojs.aaai.org/index.php/AIIDE/article/view/12923
I am curious how well Voyager would do in Nethack.
I'd like to see a visual/language model/AI that learns to play minecraft as an actual inhabitant of the game. i.e. processing visual input, recognising objects, working out whats going on, learning how to move around. Learning how to make food and avoid monsters. It would be an 'Embodied AI' within the world of Minecraft.
The language part would allow us to talk to this being. You could ask it things like:
"Do you prefer to make a house, or dig a cave?"
"How do you feel, when you hear a monster outside at night?"
As long as they're all still "special" single-purpose systems (LLM is about processing and responding to language input for example, CV / Computer Vision models specialize in operating on visual or image inputs, etc.), that's all they'll ever be, no matter how good they get at pretending they're more.