In every step, an agent receives an observation and takes action in response to the observation. The agent can use any decision-making algorithm to make the choice of the action as long as the format of the action conforms with the prescribed format. In the base environment of Civrealm, an action is a tuple consisting of the type of actor, the index of the actor, and the name of the action, respectively. The type of actor specifies which type of actor the agent wants to control in this step. This type could be:
unit - The 'unit' actor handles many fine-grained operations. They can be categorized into three main types: engineering actions, which handle tasks like city construction, planting, mining, and more; movement actions, including moving, transportation, embarking, and so on; and military actions, such as attacking, fortifying, bribing, etc.
city - The 'city' actor develops and manages cities. Their actions include unit production, building construction, city worker assignment, and more.
dipl - The 'dipl' actor is in charge of diplomacy actions including trading technologies, negotiating ceasefires, forming alliances, etc.
gov - The 'gov' actor allows the agent to change the government type to gain corresponding political benefits, adjust tax rates to balance economic expansion and citizen happiness, etc.
tech - The 'tech' actor sets immediate or long-term goals for their technology research.
The index of the actor specifies which actor of the specified type the agent wants to control. The name of the action denotes the specific action the agent requires the specified actor to take. For instance, if the agent wants to let a unit whose index is 111 plant trees, the action will be of the format ('unit', 111, 'plant').
The observations returned by the environment include the indexes of the actors who can take actions in this step and the names of their actions that are allowed to be taken in the current state.
For more details about the actions that each type of actor can take, please refer to the following tables.
Unit Actions
Action Class
Action Description
Go to a target tile
the target tile
Enter a hut in the target tile for random events
Embark on a target boat mooring in an ocean tile
the target boat unit
Disembark from a target boat mooring in an ocean tile
Unload all units carried by the transporter unit
Board a boat mooring in the city of the current tile
Deboard a boat mooring in the city of the current tile
Fortify in the current tile
Attack the unit in a target tile
the target tile
Bribe a unit of other players to join us
the target unit
Conquer a city belongs to other players
the target city
Sabotage a city belongs to other players
Steal technology from a city belongs to other players
Mine in the current tile
Irrigate in the current tile
Build road in the current tile
Build railroad in the current tile
Plant trees in the current tile
Build a city in the current tile
Build airbase in the current tile
Build fortress in the current tile
Transform the terrain of the current tile
Pillage an infrastructure in the current tile
Cultivate the forest in the current tile into a plain
Upgrade the unit
Disband the unit itself to save cost
Keep the current activity in this turn
Set the unit's home city as the city in the current tile
Join the city in the current tile (increase city population)
Sell goods in the target city's marketplace
the target city
Investigate a target city belongs to other players
Establish embassy in a target city belongs to other players
Establish a trade route from the unit's home city to the target city
City Actions
Action Class
Action Description
Choose a working tile for city
the target tile
Do not work on a tile
Buy building or unit
Change the type of a specialist
type of the target specialist
Sell a building
the target building
Construct a building
Produce a unit
the target unit
Diplomacy Actions
Action Class
Action Description
Start a negotiation
target player ID
End a negotiation
Accept treaties
Cancel a treaty
Cancel sharing vision
Add a basic clause
target player ID + target basic clause type
Add a trading tech clause
target player ID + giver ID + target technology ID
Add a trading gold clause
target player ID + giver ID + how much gold
Add a trading city clause
target player ID + giver ID + target city ID
Remove a clause
target player ID + parameters of the target clause
In the full-game, each player acts as a civilization leader. The objective of players is to guide their civilization from its humble beginnings to the greatest civilization. Civilizations evolve through eras from the Bronze Age to the Space Age and the number of controllable objects (units, cities, diplomatic relations, etc.) explodes as the game progresses. In addition, each decision made typically carries a multi-faceted impact, encompassing both long-term strategic consequences and short-term tactical outcomes. It is worth noting that a favorable tactical outcome may not necessarily translate into a positive strategic outcome. For instance, the immediate construction of a city at the beginning of the game can yield greater resources in the early stages (a tactical advantage). In contrast, settling in a resource-rich area after thorough exploration may result in substantial resource accumulation over the long haul (a strategic advantage).
Besides the long decision-making horizon, multi-faceted decision impacts, and huge state-action spaces, the full-game exhibits additional characteristics that elevate its complexity:
Imperfect info. Players typically only gain the information discovered by their own units and cities, resulting in partially observable states. Players may also obtain others’ vision by diplomatic actions.
Stochastic. The dynamics of the environment is stochastic. Moreover, there exist random events and crises that can disrupt plans, forcing players to adapt on the fly and make tough decisions.
Multi-goal. There are multiple victory paths, i.e., (1) military: conquering all other civilizations; (2) science: being the first civilization that launches a spacecraft destined for Alpha Centauri; and (3) time: obtaining the highest score, computed based on criteria such as civilization size, wealth, cultural accomplishments, and scientific advancements, before reaching a predetermined number of turns, in case the first two conditions are not met. These paths necessitate a delicate balance between economic expansion, military development, diplomatic influence, cultural achievements, and technological research, which poses a high requirement for learning and reasoning.
Dynamic space. As the game unfolds, players continuously produce new units, construct additional cities, engage in battles, and conquer other players. Consequently, the state and action space of a player undergo dynamic changes throughout the gameplay. Designing an effective decision model for the agent to adapt to this evolving space presents a significant challenge.
Multi-agent. Multiple players can interact with one another, including hand-crafted AI players provided by Freeciv on the server side. CivRealm allows multiple agents to connect to the same game simultaneously, facilitating self-play training.
General-sum. Players are free to form alliances or wage war against others, rendering the full game a general-sum game that necessitates considerations of both cooperative and competitive strategies.
Changing players. The number of players can fluctuate during a game due to factors like revolts or civilization conquests, introducing new players controlled by built-in AI or removing existing ones. Such changes often result in significant alterations to the state-action space.
Communication. Players can communicate explicitly using two types of communication: diplomatic actions (e.g., adding a clause) and natural language chat through a chat box. This feature enriches player interactions and enables LLM agents to fully leverage their natural language processing capabilities.
To implement an agent that plays the full-game, it is important to understand the content of the observations returned by the environment and the format of the actions required by the environment. Please check Observations and Actions for more details.
+The environment provides essential information about the game state, with the most crucial details being observations and info.
The info returned includes details about the current turn of the game and the actions available to the agent at this step. The dictionary "info['available_actions']" encompasses keys such as 'unit', 'city', 'dipl', 'gov', and 'tech', signifying the types of actors the agent can control. Within each actor type, sub-dictionaries exist, where the keys represent possible actions for the respective actor type, and the values are boolean indicators of action availability. Using this information, the agent can make informed decisions by selecting appropriate actions from the available set.
The observations returned reflects the game state in the current time step. It is a dictionary whose keys correspond to different aspects of the decision-making in the game. The keys of observations useful for training include:
map - Overview on the map, i.e., status (known, visible, etc.), terrain types and potential extras.
unit - Overview of the status (health, activity, moves left, etc.) of units. Using this key on observation (i.e., observations['unit']) will retrieve a dictionary whose keys are the indexes of units, and the content under each unit index is the status of the corresponding unit.
city - Overview of the status of cities (improvements, production, etc.). Using this key on observation (i.e., observations['city']) will retrieve a dictionary whose keys are the indexes of cities, and the content under each city index is the status of the corresponding city.
player - Overview of the status of players. Using this key on observation (i.e., observations['player']) will retrieve a dictionary whose keys are the indexes of players, and the content under each player index is the status of the corresponding player. The status of each player conveys information about multiple aspects, including diplomacy, government, and technology.
For additional details regarding the observations, kindly consult the tables provided below. It's important to note that information pertaining to diplomacy, government, and technology within the observations' 'player' data is described separately.
There exist other keys besides the above keys. However, the information under those keys is irrelevant to training, so we do not describe them here.
Observations of Map
Value domains
Basic map
[0, 2]
Size: M * N
Type of terrain
[0, 13]
Owner of tiles
[0, 255]
0 or 1
34 layers of size M * N
6 layers of size M * N for 6 output types
Units and city on each tile
Unit owner
[0, 255]
Size: M * N
City owner
Unit distribution
52 layers of size M * N for 52 unit types
Observations of Unit
Value domains
Common unit field
[0, M]
[0, N]
[0, 255]
Player the unit belongs to
[0, 65535]
Health point of the unit
Produce cost
Cost needed to produce this type of unit
0 or 1
Whether the unit is veteran
Can transport
Whether the unit can transport other units
Unit type
[0, 51]
One of 52 unit types
Obsoleted by
The unit type this unit can upgrade to
Attack strength
[0, 65535]
Affect the attack success rate
Defense strength
The damage of a successful attack
My unit field
Unit ID
[0, 32767]
The ID of the unit
Moves left
Actions the unit can take in this turn
Home city
City that supports this unit
Upkeep shield
Resources needed to support this unit
Upkeep gold
Upkeep food
Observations of City
Value domains
Common city field
City name
The name of city
[0, M]
[0, N]
[0, 255]
Player this city belongs to
The size of this city
My city field
City ID
[0, 32767]
The ID of the city
Food stock
The food stock of the city
Shield stock
The shield stock of the city
Granary size
The granary size of the city
Buy cost
Cost to buy the undergoing production
Turns to complete
Number of turns to finish the current production
Resource outputs in each turn
City waste
The waste of the city
City corruption
The corruption of the city
City pollution
The pollution of the city
Growth in
Number of turns for city population to grow
[0, 2]
City state: disorder, peace, etc.
Production kind
[0, 1]
Unit or building
Production value
[0, 67]
Unit or building type being produced
People angry
[0, 127]
Number of people of each mood
People unhappy
People content
People happy
Surplus food
[-32768, 32767]
The surplus of the resource
Surplus gold
Surplus shield
Surplus trade
Can build unit
0 or 1
Binary vectors corresponding to units or buildings
Can build building
Having Buildings
Last completion turn
Turn Number when the city completed the last production
Observations of Diplomacy
Common player field
Player ID
[0, 255]
The ID of player
The ID of team
The name of the player
Is alive
0 or 1
Whether the player is alive or not
[0, 65535]
The score of the player
Turns alive
How many turns the player has lived for
[0, 559]
The nation of the player
Embassy text
Describe if there are embassies between players
Describe players’ attitudes to others
My player field
0 or 1
Peaceful or Combat
Diplomacy state
[0, 6]
A categorical vector of my diplomacy states with other players: armistice, war, ceasefire, etc.
Observations of Government
Common government fields
Government ID
[0, 6]
The ID of the government
Government name
The name of the government
My government fields
Goal government
[0, 6]
The goal of revolution
[0, 65535]
Gold in treasury
Revolution finishes
Number of turns for current revolution to complete
[0, 100]
Government investment for each aspect. Sum to 100.
\ No newline at end of file
diff --git a/advanced_materials/game_settings.html b/advanced_materials/game_settings.html
new file mode 100644
index 0000000..4ab6ade
--- /dev/null
+++ b/advanced_materials/game_settings.html
@@ -0,0 +1,2080 @@
The game setting is specified in the default_settings.yaml file under the civrealm folder. To overwrite a setting, you can do either of the following:
Directly change the value in the default_settings.yaml file. Or
Use the --setting argument. For example, to set the maximum number of turns to 5 for a quick test, you can use the following command:
The details of the game setting are as follows:
Basic Settings for Game Play
Default value
The user name used to log in the game. Spaces and underscores are not allowed.
The maximum number of turns per game. The game will end after the turn number reaches this limit.
The URL that hosts the game.
The port to be connected by the client. Used in the single-thread mode. For parallel running, the client port is chosen from available ports.
Whether to start a multiplayer game or single-player game.
Whether to use the hotseat mode in a single-player game.
Whether to wait for an observer to join before starting the game. If set to True, the game will not start until a user observes the game through the browser.
The duration in seconds before considering the server as timed out. In pytest, we automatically set it to be 5 in
Sometimes we perform an invalid action and cannot receive the response to wait_for_packs. We wait for wait_for_timeout seconds and clear the wait_for_packs to prevent the process from sticking.
Sometimes, the server is stuck for unknown reasons and will not return a begin_turn packet. We wait for begin_turn_timeout seconds before we close the environment. This configuration should be carefully set when playing with human or AI agents because they may take more than 60 seconds for each of their turns. In those cases, the server will send a begin_turn packet only after they finish their turns.
Whether in pytest mode. In pytest, we automatically set it as True in
The number of episode scores maintained in the ParallelTensorEnv.
The number of AI players to be initialized when a game starts.
The maximum number of players allowed to join a game.
Whether to start multiple clients for self-play. When it is true, the following clients will add an increasing index to their username for login to the same game. Note that when running pytest, we automatically set this as False. By doing so, different tests connecting to the same port will raise an exception and force one test to re-select a random new port.
The minimum number of players needed for starting a game.
Decides whether one can take control/observe a player. Please check the details of this setting in the Freeciv instruction.
Whether to allow an AI to control a player when the previous player disconnects.
Whether to end the game when some players succeed under victory conditions. Options: enabled, disabled.
Victory condition options: SPACERACE|ALLIED|CULTURE. If a certain victory condition is not set, the calculation logic for that victory condition will be skipped.
Whether to open the chatbox in the web interface. Options: enabled, disabled.
The ruleset to be used.
The type of runner.
Settings for parallel running
Default value
Number of environments running simultaneously to sample experience.
Number of epochs to run. Each epoch runs batch_size_run environments.
The port used by the first environment; other parallel environments use ports following this port.
Settings for debug
Default value
Records the game state and available actions at every step. Warning: generates many log files if True. Turned off by default.
Take screenshots during playing. wait_for_observer flag should be set to True if enabling screenshots. You can execute the 'update_javascript_for_clean_screenshot' command first for generating cleaner screenshots. Warning: generates many log files if True. Turned off by default.
Global view screenshot.
The sleep time after each turn.
The headless mode for the browser when take_screenshot is true.
The window size (width) for screenshot.
The window size (height) for screenshot.
Get webpage image data by locating elements by ID on the web page. wait_for_observer flag should be set to 'True' if enabling screenshots. Options include, but are not limited to: ['cities_tab', 'tech_tab', 'players_tab', 'civ_tab', 'map_tab'].
If true, auto save the game at the beginning of every turn. The save will be deleted at the end of that turn unless the program finds issues in that turn.
If true, will save the game when using KeyboardInterrupt. Note that when using this, autosave should be disabled. Otherwise, the game will be saved at the beginning of every turn and cannot be saved again when using KeyboardInterrupt.
Password used to log in to the Freeciv-web account.
The name of the saved game to be loaded.
Whether to use randomly generated seeds for running games. If True, the following random seeds (mapseed, gameseed, agentseed) are ignored.
The seed for generating a map. The same seed leads to the same map.
The seed for fixing the behavior of random outputs.
The seed for fixing the action sequence when the game/map is fixed.
Whether to print debug information for tensor env.
The path to the directory that stores the log files. Use null for the default path 'civrealm/logs'.
Welcome to Civrealm LLM Agent! This documentation will guide you through the process of building LLM agents in CivRealm Environment. We will first provide an overview of CivRealm LLM Env, followed by instruction on how to use the civrealm-llm-baseline repository to build llm agents Mastaba and BaseLang on this environment.
🌏 Civrealm LLM Environment
The Civrealm LLM Environment is a LLM environment wrapped
+upon Civrealm Base Env specifically designed for building LLM agents. This environment
provides observations of each actor in natural language
provides valid actions of each actor in natural language
restricts valid actions in order to reduce meaningless actions
executes actions described by natural language
Besides, a LLM wrapper is open to customize your own environment.
Observations and actions in natural language are stored in llm_info as:
llm_info is a Dict consisting of 6 subspaces with keys "player", "city", tech", "unit", "dipl", and "gov". Subspace of "unit" is a Dict with keys of unit_id, and subspace of "city" is a Dict with keys of city_id, describing "name", "available_actions", and local "observations" of the corresponding unit and city. Subspaces of "player", tech", "dipl", and "gov" are currently empty, and will be completed in the future, meaning that LLM Env only support controlling units and cities by natural language at this stage.
Read llm_info of "unit 121" by:
Choose an existing unit or city
You should read llm_info of an currently existing unit or city
BaseLang consists of three components: observation, reasoning, and commands. For observation, a 5 * 5 tile-based observation is employed, centered on each unit's location, optimizing tactical information provision while accommodating strategic depth. The reasoning module mimics AutoGPT and outputs in three stages: thought, reasoning, and command. Commands empower the agent with the choice between "manual and history search" and "final decision" commands, enabling data retrieval from a vector database or selecting available actions to execute based on environmental and historical context. Finally, individual LLMs are assigned to each unit, with their context histories, to facilitate detailed planning.
To facilitate cooperation between independent entities, Mastaba introduces a hierarchical structure, organizing LLM workers, observations, and decision-making into a pyramid-like structure.
LLM Workers.
+Within Mastaba, LLM workers are structured as two layers. At the pinnacle is the "advisor", tasked with overseeing all other LLM workers. The advisor monitors the holistic nationwide perspective, including unit counts, city metrics, and enemy player information. At the operational level, Mastaba maintains LLM workers that resemble BaseLang's structure.
+Mastaba adopts a pyramid-like map view, condensing data from a 15 * 15 tile region into 9 blocks, each spanning 5 * 5 tiles. This design enables entities to grasp information within a broader range while managing prompt loads effectively, thereby elevating map awareness.
+Mastaba's decision-making workflow follows its agent structure. The advisor initiates each turn with a nationwide assessment, encompassing cities, units, and potential threats. It generates suggestions during each turn and communicates them to other LLM workers, who independently select actions for their entities. Additionally, workers possess the capability to query a vector database for knowledge, enabling informed decisions based on manual or stored experiences.
🏃 Using civrealm-llm-baseline Repository
The civrelam-llm-baseline repository is a collection of code and utilities that provide a baseline implementation for building llm agents. It includes two agents: Mastaba and BaseLang, in the Civrealm LLM Environment.
🏌️ Getting Started
To get started, follow these steps:
Clone the civrealm-llm-baseline repository from GitHub and enter the directory:
Set environment variables. The following groups are independent. Set only one group and use that group. OpenAI GPT is preferred.
# Group 1
+export OPENAI_API_TYPE=<api-type> # e.g. 'azure'
+export OPENAI_API_VERSION='<openai-api-version>'
+export OPENAI_API_BASE=<openai-api-base> # e.g. ''
+export OPENAI_API_KEY=<openai-api-key>
+export DEPLOYMENT_NAME=<deployment-name> # e.g. 'gpt-35-turbo-16k'
# Group 2
+export AZURE_OPENAI_API_TYPE=<azure-openai-api-type> # e.g. 'azure'
+export AZURE_OPENAI_API_VERSION=<azure-openai-api-version> # e.g. '2023-05-15'
+export AZURE_OPENAI_API_BASE=<azure-openai-api-base>
+export AZURE_OPENAI_API_KEY=<azure-openai-api-key>
# Group 3
+export LOCAL_LLM_URL=<local-llm-url> # You may choose to use local LLM.
+# Pinecone
+export MY_PINECONE_API_KEY=<pinecone-api-key> # Necessary. Free account is enough.
+export MY_PINECONE_ENV=<pinecone-env> # e.g. 'gcp-starter'
In this guide, we introduced the CivRealm LLM Environment and explained
+how to use the civrealm-llm-baseline repository to build llm agents on this environment. We encourage you to experiment with different LLM frameworks to further enhance your agent's performance.
The general pipeline of creating new mini-game is as follows:
graph TD
+ A(<a href="#be-clear-about-what-to-do">Be clear about \nwhat to do<a>) --> D(<a href="#use-gtk-tool-to-generate-basic-archive">Use gtk tool to generate basic archive</a>)
+ D --> F(<a href="#set-mini-game-messages-by-lua-script">Set mini-game messages by lua script</a>)
+ F --> E(<b><a href="#large-batch-auto-random-generation">Large batch auto random generation</a></b>)
+ E --> H{<a href="#mini-game-validation">Mini-Game Validation</a>}
+ H --> |Not Pass| J{Bug from \nCivrealm}
+ J --> |Yes| K[Contribute bugfix for Civrealm]
+ J --> |No| D
+ H --> |Pass| I[Create new Mini-Game successfully]
Be clear about what to do
The basic design mechanisms of mini-game are:
Single Goal. Don't consider multiple learning objectives at the same time, otherwise the game will become less mini after more influences are introduced.
Feasible Action. In the huge space of action, be clear about which actions are relevant to your goal, and avoid too many unrelated or paradoxical actions in actionable actions.
Computable Reward. In addition to the final score at the end of the game, the reward for each step or turn can be defined and calculated.
At the beginning of designing a mini-game, you have to answer the following questions:
What type of the mini-game do you want to design?
When does the mini-game end?
How to calculate the reward for each step?
How to set the difficulty of the game?
These questions will be given appropriate suggestions to some extent below.
Use gtk tool to generate basic archive
The tool of freeciv-gtk is provided by freeciv official team to help us design the very basic version of each mini-game. Please follow the instructions in (( to install the tool and run it, specify the game settings and ruleset, which would be like:
After start a new game, use 'Edit -> Edit Mode' to design the scenario as you expect and then finish making an initial version of savegame. Save the edited scenario so that you can further edit or load it in the game. After that, you can continue to add messages and generate random maps based on it, which is introduced as followed.
Set mini-game messages by lua script
Donot modify sav file directly in general. Because the fields in the sav file have dependencies on each other, if you modify a field without noticing some other fields that need to be modified at the same time, it will cause the server to load the sav file unsuccessfully.
The lua script is used to send mini-game messages to the agent. Before adding the lua script for basic sav file, you need to understand the archive format of freeciv and how it defines the game state internally.
The suffix of the game archive file is .sav, and usually compressed as a compressed file with .xz or .zst suffix. If the archive file is compressed, you need to use the corresponding component to decompress it to get the sav file.
In the sav file, there are many key-value structures to describe the current game state. Here, We list the main tags and their explanations:
A set of definition rules for common elements, including activity, technology, etc.
The base state values of the game, such as turn, launch order and year.
The script of lua. At the inherent or designed trigger points of the game, obtain the internal data of the game, calculate the custom game state values, and send out event messages.
The setting of freeciv server.
The global map of world, and distribution information of resources, cities, and land development.
The game status of a player with an id of 0, including information such as how many units and cities the player0 have.
The scores of a player with an id of 0, including information such as total score and unhappy degree.
The progress of research for each player.
Here, we focus on the implementation of script tag. In the sav file, the format of script as below:
+code=${lua code}$
{lua code} is the code of lua language that implements to send mini-game messages.
Firstly, you need to consider which trigger points to set during the game in order to change the status value of the mini-game, and set up the end conditions of the game.
+All trigger action functions can be referred to the Lua Reference manual. We list the common trigger action functions as follows:
(return) type
function name/variable
(Number turn, Number year)
Trigger at each turn begining.
(City city)
Trigger at city built.
(Unit unit, Player loser, String reason)
Trigger at unit lost.
(City city, Player loser, Player destroyer)
Trigger at city destroyed.
In addition, we developed the following trigger action function to enhance the perception of the freeciv-server game process:
(return) type
function name/variable
(Player player)
Trigger at game started. The `game_started` supports to display the welcome message at the beginning of the game, if you use the `turn_begin` to set turn=1 to display the welcome message, it will not take effect, because the game thinks that it is already in the current turn running state, and will not trigger the judgment of the `turn_begin`, although this function can be achieved by setting the technique of phase=1 additionally, but the setting will cause other players to act first, which will bring unexpected problems.
(Player player)
Trigger at game ended. Since freeciv-server has many internal conditions for ending the game, all the end states of the game can be recycled by using game_ended. If game ended, set mini-game `status`=1(MinitaskGameStatus.MGS_END_GAME).
(City city)
Trigger at activity finished by worker.
(City city)
Trigger at activity started by worker.
Secondly, calculate the mini-score and mini-goal.
Taking mini-game battle as an example, the formula for calculating the mini-score is as follows:
The larger the mini_score is, the more units of human player survives, the better, and the more units of ai player is destroyed, the better. The mini-goal is setting to
It means that if you want to satisfy mini_score>=mini_goal to succeed, you need to destroy all units of ai player.
Finally, wrap your message of mini-game and send it out throught E.SCRIPT event. The event function is:
notify.event(nil, nil, E.SCRIPT, _(${message}))
Large batch auto random generation
The auto random generation is supported by the civrealm-sav module. To implement a new mini-game dependently, you should inherit class SavTaskGenerator. For example,
The tools contains map_opunit_op, player_op, game_op, etc. The main functions of tools are as follows:
function name
Randomly generate mini-game map by random walk with modifying the terrain, resource and shape of land.
Randomly set location for units.
Assignment the name of mini-game.
Set game status initially.
Use these functions to help you to implement large batch auto random generation of mini-game.
Mini-Game Validation
Check your mini-game inside freeciv-web, and test the mini-game to follow the section Play mini-game as a random agent. If the tests pass, congratulations on completing the task for creating new mini-game.
Based on the richness of terrain resources, the comparison of unit quantities, and other information, we designed the difficulty level of the mini-game.
In the mini-game, the player’s current victory status can be represented as: failure, success, and unknown. The unknown state signifies that the game has not yet concluded, while the determination of failure and success only occurs after the game ends.
Due to the multifaceted aspects of a full game, including economic expansion, military development, diplomatic negotiations, cultural construction, and technological research, we have devised mini games to address each component individually. Each mini-game is designed with specific objectives, varying difficulty levels, step-based rewards, and an overall game score. The designed details could be found in the paper.
By the end of this tutorial, you will be able to use API to play the mini-game.
Load Mini-Game by freeciv-web
+Prepare Dataset For Freeciv-web version == 1.3
Before you start the mini-game, you need to load the mini-game designed archives into the server’s laoding archive path.
The steps are as follows:
Step 1: find your used version on the releases page, and download the data files for the mini-game to your local path such as /tmp/minigame/
Step 2: copy the data files, and extract them into the corresponding docker savegame path. If the docker image is freeciv-web, and the tomcat version is 10, then execute the following commands:
To load the mini-game sav file MINIGAME_FILE_NAME by the freeciv-web service, follow these steps:
Login by the Player name minitask, and click the Customize Game button;
Enter the command /load MINIGAME_FILE_NAME in the input box at the bottom;
Click the Start Game button to start the mini-game.
Initialize Random Mini-Game
civrealm/FreecivMinitask-v0 is the environment of mini-game. When the mini game is launched, its internal design will randomly select a game of any type and any difficulty.
The messages of mini-game are passed from the server to the agent at each trigger point by lua script setting. The general json structure of message is:
The task is used to label the source of message. The task for messages from mini-game is set to be minitask.
The final element of metrics records the final game review status for each trigger action, which is actually used in civrealm. In the dict structure of metrics elements, we can define other useful auxiliary information
The metrics.mini_score is used to record the agent's mini-game score.
The metrics.mini_goal is used to record the agent's mini-game goal, which is to set the game victory score threshold.
The metrics.max_turn is limited to a certain number of turns. If the maximum number of turns is exceeded, failure is returned in civrealm.
The metrics.is_mini_success is used to record the player succeed status of player, which is the same as success defined of minitask info in civrealm. If succeed, it requires that mini_score>=mini_goal.
Play mini-game as a random agent
Generally speaking, it is difficult for random agents to win the battle and diplomacy mini-game, and in the development mini-game, the game victory condition will be met with a certain probability.
In the log, We can see that each step displays some fields from the above definitions as Definition of Mini-game messages, and some are auxiliary fields designed by mini-game itself such as human_leader_alive.
Play mini-game as a AI-assistant agent
The AI-assistant agent only supports development_build_city.
To engage in a dialogue with the rule-based AI assistant integrated within the freeciv server, please configure the following command:
The comprehensive script for invoking the AI assistant within the minigame setting is outlined below:
Civrealm supports parallel training to speed up the learning process. Concretely, we use Ray to implement the parallel training function in the FreecivParallelEnv class located in src/civrealm/envs/ To initialize the parallel running environments, simply create multiple FreecivParallelEnvobjects:
+# Number of envs that run simultaneously
+# Initialize envs
To reset the environments to get initial observations, we can call the reset() method of each FreecivParallelEnvobject. Each FreecivParallelEnvobject will run its reset process simultaneously and we can retrieve the results based on the result ids.
After the environments are reset, we call the step() method of each FreecivParallelEnvobject to play the game. Similar to reset, each FreecivParallelEnvobject runs its step process simultaneously and we can retrieve the results based on the result ids. After all parallel running environments are terminated, we call the close() method of FreecivParallelEnvobjects to close them.
+# Store whether an env has terminated
+# Store whether an env has closed its connection
+# Start the parallel running
+# key: index of result_ids, value: id of env
+# Make a decision and send action for each parallel environment
+print(f"Env ID: {i}, turn: {info['turn']}, action: {action}")
+# Get the result of each environment one by one
+# The num_returns=1 ensures ready length is 1.
+# Get the env id corresponding to the given result id
+print(f'Env ID: {env_id}, reward: {rewards[env_id]}, done: {dones[env_id]}')
+# Close the terminated environment
+# Move onto the next timestep
For a complete example, you may refer to the ParallelRunner class located in src/civrealm/runners/ You can run src/civrealm/ to test ParallelRunner. In addition, you can also regard the ParallelTensorEnv class located in src/civrealm/envs/ as an example. The TensorBaselineEnv class in the civtensor package uses the ParallelTensorEnv class to achieve parallel training.
Welcome to Civrealm Tensor Agent!
+This documentation will guide you through the process of training tensor-based agents,
+specifically using the Proximal Policy Optimization (PPO), in the Civrealm Environment.
+We will first provide an overview of the Civrealm Tensor Env,
+followed by instructions on how to use the civrealm-tensor-baseline repository
+to train a PPO agent on this environment.
🌏 Civrealm Tensor Environment
The Civrealm Tensor Environment is a reinforcement learning environment wrapped
+upon Civrealm Base Env specifically designed for training agents using tensor-based algorithms.
+This environment
offer immutable spaces for mutable observation and actions,
provides flattened, tensorized observation and action spaces,
restrict available actions in order to reduce meaningless actions,
offers delta game score as a basic reward for RL agents,
provide parallel environments with batched inputs and outputs for efficient training,
and various modular wrappers which are open to customize your own environment.
Start a parallel tensor environment with 8 parallel FreecivTensor envs:
# Training Fullgame
+# Training Minitasks
+# env = gymnasium.make("civtensor/TensorBaselineEnv-v0", parallel_number=8,task="development_build_city normal")
The observation space is a gymnasium.Dict() consisting of 9 subspaces with keys listed below.
Observations can be immutable and mutable.
Immutable Obs: map, player, rules.
hey have fixed dimensions through the game-play.
+Immutable Observations
(84, 56, 3)
(84, 56, 14)
(84, 56, 34)
(84, 56, 6)
(84, 56, 1)
(84, 56, 1)
(84, 56, 52)
(84, 56, 1)
Mutable Obs: unit, city, others_unit, others_city, others_player, dipl.
The number of units, cities, and other mutable observations are constantly changing.
+Nevertheless, we truncate or pad mutable entities to a fixed size.
+Mutable Observations for a Single Entity
Dimension per Entity
In tensor environment, the complete action space is
The actor_type indicate which actor type this action belongs to, the value \(\in [0\dots5]\) indicating city,unit,gov,dipl,tech,end-turn respectively.
For a mutable type $mutable, ${mutable}_id indicates the position of the unit to take this action in the list of entities. For example, unit_id=0 might indicate a Settler located at a specific tile.
${actor_type}_action_type is an action index which can be translated into a specific action, for example goto_8 or stop_negotiation.
Although the full action space is a Cartesion product of 9 subspaces, the actor_type will determine which category of entity should execute this action, and ${actor_type}_id will determine which entity should execute a specific action ${actor_type}_action_type.
Thus it suffices for the env to only look at 3 tuples: (actor_type, ${actor_type}_id, ${actor_type}_action_type), and it's legitimate to pass a 3-tuple if their values and types are compatible.
+Action Space Details
🤖 Network Architecture for a Tensor Agent
To effectively handle multi-source and variable-length inputs,
+we draw inspiration from AlphaStar
+and implement a serialized hierarchical feature extraction and action selection approach,
+as shown above.
+This method involves generating layered actions and predicting value function outputs,
+and our neural network architecture comprises three main components:
+representation learning, action selection, and value estimation.
+At the representation level, we adopt a hierarchical structure.
+In the lower layer, we extract controller features using various models like
+MLP, Transformer, and CNN, depending on whether the input is a
+single vector, sequence, or image-based.
+These extracted features are then fed into a transformer to facilitate attention across different entities,
+creating globally meaningful representations.
+we utilize an RNN to combine the current-state features with the memory state,
+enabling conditional policy decisions based on the state history.
Action selection.
+At the action selection level, we leverage the learned representations to make decisions.
+In the actor selection module,
+we determine the primary action category to be executed,
+including options like unit, city, government, technology, diplomacy, or termination.
+we employ a pointer network to select the specific action ID to be executed,
+followed by the determination of the exact action to be performed.
Value estimation. To enable the use of an actor-critic algorithm,
+we incorporate a value prediction head after the representation learning phase.
+This shared representation part of the network benefits both the actor and critic,
+enhancing training efficiency.
+We use the Proximal Policy Optimization (PPO)
+algorithm to train the agent.
+To mitigate the on-policy sample complexity of PPO,
+we harness Ray for parallelizing tensor environments,
+optimizing training speed and efficiency.
🏃 Using civrealm-tensor-baseline Repository
The civrelam-tensor-baseline repository is a collection of code and utilities
+that provide a baseline implementation for training reinforcement learning agents
+using tensor-based algorithms.
It includes an implementation of the PPO algorithm,
+which we will use to train our agents in the Civrealm Tensor Environment.
🏌️ Getting Started
To get started, follow these steps:
Clone the civrealm-tensor-baseline repository from GitHub and enter the directory:
Install the required dependencies by running:
+ PPO baseline for fullgame
In default, this would start a runner with the config specified in civrealm-tensor-baseline/civtensor/configs/.
OR Train PPO baseline for minitasks:
In default, this would start a sequence of runners each with a minitask config specified in civrealm-tensor-baseline/examples/run_configs.
+Either will start the training process, allowing the agent to interact with the environment,
+collect experiences, and update its policy using the PPO algorithm.
Monitor the training progress and evaluate the agent's performance,
+ using the provided metrics and visualization tools in
+ the civrealm-tensor-baseline repository.
+# where $game_type = fullgame or minitask
+The output of the last command should return a url.
+Visit this url with your favorite web browser, and you can view your agent performance in real time.
+You have successfully set up the Civrealm Tensor Agent and
+started training a PPO agent on the Civrealm Tensor Environment,
+using the civrealm-tensor-baseline repository.
Runner Configuration
The default configs reside in civtensor/configs/
freeciv_tensor_env.yaml defines environment-related properties. You may specify which task to run by specifying task_name.
Acceptable task_name are "fullgame" or "$minitask_type $minitask_difficulty".
For available mini-task types and difficulties, please check minigame
ppo.yaml defines environment-related properties. You may specify which task to run by specifying task_name.
+Details of ppo.yaml
+# whether to use the specified seed
+# seed
+# whether to use CUDA
+# whether to set CUDA deterministic
+# arg to torch.set_num_threads
+# number of parallel environments for training data collection
+# number of total training steps
+# number of steps per environment per training data collection
+# logging interval
+# evaluation interval
+# whether to use ValueNorm
+# whether to use linear learning rate decay
+# whether to consider the case of truncation when an episode is done
+# if set, load models from this directory; otherwise, randomly initialise the models
+# whether to use evaluation
+# number of parallel environments for evaluation
+# number of episodes per evaluation
+# whether to use render
+# number of episodes to render
+# hidden dimension
+# hidden dimension for rnn
+# number of heads in transformer
+# number of layers in transformer
+# dropout probability
+# number of rnn layers
+# initialization method for network parameters, choose from xavier_uniform_, orthogonal_, ...
+# gain of the output layer of the network.
+# length of data chunk; only useful when use_recurrent_policy is True; episode_length has to be a multiple of data_chunk_length
+# actor learning rate
+# eps in Adam
+# weight_decay in Adam
+# ppo parameters
+# number of epochs for actor update
+# whether to use clipped value loss
+# clip parameter
+# number of mini-batches per epoch
+# coefficient for entropy term in actor loss
+# coefficient for value loss
+# whether to clip gradient norm
+# max gradient norm (0.5?)
+# whether to use Generalized Advantage Estimation (GAE)
+# discount factor
+# GAE lambda
+# whether to use huber loss
+# huber delta
+# logging directory
In this guide, we introduced the Civrealm Tensor Environment and explained
+how to use the civrealm-tensor-baseline repository to train a PPO agent on this environment.
We encourage you to explore the various features and customization options available,
+and experiment with different reinforcement learning algorithms to
+further enhance your agent's performance. Happy training!
+""" CivRealm environment for mini-game. """
+f"Not supported type as {minitask_pattern}. The suppported list is {MinitaskType.list()}!")
+f"Not supported id as {minitask_id}. The suppported range is [0, {max_id}]!")
+f"Not supported diffculty as {minitask_level}. The suppported list is {MinitaskDifficulty.list()}!")
+fc_logger.warning(f"Randomly selected minitask {minitask}!")
+ Reset the mini-game environment as fully random game or specific game.
+ Parameters
+ ----------
+ seed : int
+ Random seed for game.
+ options : dict
+ Env configuration.
+ minitask_pattern : dict
+ Assignment the following fields to return a specified game:\n
+ `type`: the type of mini-game, see the available options MinitaskType;\n
+ `level`: the difficulty of mini-game, see the available options MinitaskDifficulty;\n
+ `id`: the id of mini-game, the available range is 0 to MAX_ID.\n
+ If a field is not assigned a value, the field will be randomly selected within the feasible domain.
+ max_id : int
+ The max id of mini-game.
+ """
+ In addition to the game termination judgment of the full game,
+ the mini-game has additional conditions for the end of the game process.
+ """
+fc_logger.debug(f"mini-env step action: {action}")
+ minitask_has_terminated()
In addition to the game termination judgment of the full game,
+the mini-game has additional conditions for the end of the game process.
+ Source code in src/civrealm/envs/
+ In addition to the game termination judgment of the full game,
+ the mini-game has additional conditions for the end of the game process.
+ """
+ Reset the mini-game environment as fully random game or specific game.
+ Parameters
+ ----------
+ seed : int
+ Random seed for game.
+ options : dict
+ Env configuration.
+ minitask_pattern : dict
+ Assignment the following fields to return a specified game:\n
+ `type`: the type of mini-game, see the available options MinitaskType;\n
+ `level`: the difficulty of mini-game, see the available options MinitaskDifficulty;\n
+ `id`: the id of mini-game, the available range is 0 to MAX_ID.\n
+ If a field is not assigned a value, the field will be randomly selected within the feasible domain.
+ max_id : int
+ The max id of mini-game.
+ """
Tensor Environment
+ envs.freeciv_tensor_env.FreecivTensorEnv
+ Bases: Wrapper
CivRealm environment with Tensor actions
+ Source code in src/civrealm/envs/
The Gymnasium definition for the observation space. At the root is a Dict space with the following keys: ['game', 'rules', 'map', 'player', 'city', 'tech', 'unit', 'options', 'dipl', 'gov', 'client']. We describe the important state spaces below.
Click on the source code below to show the space definition.
Map State
+ Source code in src/civrealm/freeciv/map/
+# Common unit fields
+# TODO: may change this to actual map size
+# Unit type fields
+'type_worker':gymnasium.spaces.Discrete(1),# Boolean
+'type_can_transport':gymnasium.spaces.Discrete(1),# Boolean
+# My unit specific fields
Tech State
Get observation space.
"tech_status": Box,
+"current_tech": Box
+ )
+ "tech_status": Box of shape (reqtree_size,)
list status of all techs, with values of each entry:
+-1: obtained,
+ 0: under research,
+ 1: can be researched,
+ 2: need other prerequest tech(s),
+# Common player fields
+'is_alive':gymnasium.spaces.Discrete(2),# Boolean
+# Tax, science, luxury are percentages, should sum to 100
+# My player fields
+# mood_type, values are MOOD_PEACEFUL and MOOD_COMBAT
+# The turn when the revolution finishes
+# Other player fields
+# Possible values are player_const.ATTITUDE_TXT
Diplomatic state
+ Source code in src/civrealm/freeciv/players/
TensorWrapper is used to make Civrealm environment tensorized by converting
+observations from FreecivBaseEnv into tensors and tensor actions back to actions compatible with
TensorWrapper is composed TensorBase, TensorAction, TensorObservation
+and CacheLastObs.
+ FreecivBaseEnv
+ required
+ dict
tensor env configuration
+ default_tensor_config
+ dict
tensor wrapper configuration
+ Source code in src/civrealm/envs/freeciv_wrapper/
+ A basic wrapper that deals with config loading and entity id recording,
+ required by all tensor-related wrappers.
+ Parameters
+ ----------
+ env: FreecivBaseEnv
+ config: dict
+ tensor env configuration
+ Attributes
+ ---------
+ config: dict
+ A dict that specifies all configurations related to tensor wrapper.
+ my_player_id: int
+ My player id.
+ unit_ids: list
+ A sorted list of my unit ids.
+ city_ids: list
+ A sorted list of my city ids.
+ others_unit_ids: list
+ A sorted list of others unit ids.
+ others_city_ids: list
+ A sorted list of others city ids.
+ dipl_ids : list
+ A list of others player ids.
+ units : dict
+ ruleset information about units.
+ unit_types :list
+ A list of all unit types.
+ unit_costs : list
+ A list of int indicating unit costs.
+ improvements : dict
+ Ruleset information about city improvements.
+ impr_costs :list
+ A list of int indicating city improvements costs.
+ """
+# mutable ids
+# ruleset
+ Use city, unit and dipl information in observation to update ids.
+ """
+ Update config using ruleset information at the start of the turn.
+ """
+ update_config()
Update config using ruleset information at the start of the turn.
+ Source code in src/civrealm/envs/freeciv_wrapper/
A wrapper that defines tensor action spaces, transforms tensor actions into
+actions that could be handeled by FreecivBaseEnv instance, and adds masks to
TensorAction wrapper is composed of five wrappers, including TruncateDiplCity,
+DiplomacyLoop, CombineTechResearchGoal, PersistentCityProduction, and EmbarkWrapper.
+ TensorBase
A FreecivBaseEnv instance that has been wrapped by TensorBase.
+ required
+ dict
a dict that configs that specify sizes of mutable entities and action layout.
+ dict
a dict of masks of type numpy ndarray indicating available actions and entities. 0-> unavilalbe, 1->availble.
+ dict
cached info['available_actions'], a dict that indicates available actions.
+ Dict
a gymnasium.spaces.Dict with keys ['actor_type','city_id','unit_id',
+ Source code in src/civrealm/envs/freeciv_wrapper/
+ Reset self.mask
+ This is usually called at the start of a new turn to reset masks.
+ """
+# Reset mask
+# Units/Cities/Players and others Masks
+# Units/Cities Id Masks same as their Masks
+# Dipl id mask
+# Action type mask
+ Update self.mask using observation, info and action from the unwrapped env,
+ and add self.mask to the observation of the wrapped env.
+ """
Unify embark actions of all units to 'embark_{dir8}' where dir8 in [0,...7]
+indicating 8 directions.
Sometimes a unit can embark multiple carrier on the same direction. In that
+case, the wrapper automatically choose the carrier with the smallest unit id.
+ dict
a dict of embarkable units with key=(embarking_unit_id, dir8) and value=[carrier_ids]
+ Source code in src/civrealm/envs/freeciv_wrapper/
+ Unify embark actions of all units to 'embark_{dir8}' where dir8 in `[0,...7]`
+ indicating 8 directions.
+ Sometimes a unit can embark multiple carrier on the same direction. In that
+ case, the wrapper automatically choose the carrier with the smallest unit id.
+ Attributes
+ ----------
+ embarkable_units : dict
+ a dict of embarkable units with key=(embarking_unit_id, dir8) and value=[carrier_ids]
+ """
+ Translate `embark_{dir8}` action into embark actions that can be handled by FreecivBaseEnv.
+ """
+ Complete or modify embark actions in info['availble_actions']['unit']
+ If a unit has no `embark_.*` action, then set all `embark_{dir8}` action to False
+ If a unit has `embark_{dir}=True`, set all `embark_{other_dirs}` action to False
+ If a unit has `embark_{carrier_id}_{dir}=True`, store that carrier_id
+ and set its `embark_{dir8}` accordingly.
+ """
+# action == embark_dir_id
+# action == embark_dir
+),f"Expected embark_{{dir}}_{{target_id}},\
+ but got unsupported embark action name {action}"
+# set unavailable embark actions to False
+ action(action)
Translate embark_{dir8} action into embark actions that can be handled by FreecivBaseEnv.
+ Source code in src/civrealm/envs/freeciv_wrapper/
+ Complete or modify embark actions in info['availble_actions']['unit']
+ If a unit has no `embark_.*` action, then set all `embark_{dir8}` action to False
+ If a unit has `embark_{dir}=True`, set all `embark_{other_dirs}` action to False
+ If a unit has `embark_{carrier_id}_{dir}=True`, store that carrier_id
+ and set its `embark_{dir8}` accordingly.
+ """
+# action == embark_dir_id
+# action == embark_dir
+),f"Expected embark_{{dir}}_{{target_id}},\
+ but got unsupported embark action name {action}"
+# set unavailable embark actions to False
A wrapper for llm. It tells the surrounding observations of each unit and city extracted from FreecivBaseEnv, based
+on which llm can generate actions for units and cities. It transforms action_keys of actions from FreecivBaseEnv to
+readable action_names such that llm can understand. After llm have chosen an action and return the action_name to
+env, this wrapper transforms the action_name to an action_key, and then execute the corresponding action.
+ FreecivBaseEnv
A FreecivBaseEnv
+ required
+ dict
settings for llm_wrapper.
+ dict
a dict matches action_keys from FreecivBaseEnv to readable action_names
+ int
(length of a tile - 1) / 2
+ int
(width of a tile - 1) / 2
+ dict
a dict describes detailed surrounding observations of a unit or a city
+ int
(length of a block - 1) / 2
+ int
(width of a block - 1) / 2
+ dict
a dict describes zoomed-out surrounding observations of a unit or a city
+ list
a list describes which components of the game to control by llm; we temporarily only consider unit and city in
+ dict
a dict describes which categories of actions llm can take; it can be seen as an action mask
+ Source code in src/civrealm/envs/freeciv_wrapper/
+ A wrapper for llm. It tells the surrounding observations of each unit and city extracted from FreecivBaseEnv, based
+ on which llm can generate actions for units and cities. It transforms action_keys of actions from FreecivBaseEnv to
+ readable action_names such that llm can understand. After llm have chosen an action and return the action_name to
+ env, this wrapper transforms the action_name to an action_key, and then execute the corresponding action.
+ Parameters
+ ----------
+ env:
+ A FreecivBaseEnv
+ Attributes
+ ---------
+ llm_default_settings: dict
+ settings for llm_wrapper.
+ action_names: dict
+ a dict matches action_keys from FreecivBaseEnv to readable action_names
+ tile_length_radius: int
+ (length of a tile - 1) / 2
+ tile_width_radius: int
+ (width of a tile - 1) / 2
+ tile_info_template: dict
+ a dict describes detailed surrounding observations of a unit or a city
+ block_length_radius: int
+ (length of a block - 1) / 2
+ block_width_radius: int
+ (width of a block - 1) / 2
+ block_info_template: dict
+ a dict describes zoomed-out surrounding observations of a unit or a city
+ ctrl_types: list
+ a list describes which components of the game to control by llm; we temporarily only consider unit and city in
+ llm_wrapper
+ ctrl_action_categories: dict
+ a dict describes which categories of actions llm can take; it can be seen as an action mask
+ """
+ Convert observations and available actions of all actors from `FreecivBaseEnv` into a dict of natural language
+ """
+# The following two conditions are used to check if 1. the city is just built or is building
+# coinage, and 2. the city has just built a unit or an improvement last turn and there are some
+# production points left in stock.
+ Convert observations and available actions of a specific actor from `FreecivBaseEnv` into a dict of natural language
+ """
+actor_name=utype+' '+str(actor_id)
+actor_name=ctrl_type+' '+str(actor_id)
+fc_logger.debug(f'actor observations: {actor_info}')
+ Convert observations of a specific actor from `FreecivBaseEnv` into a dict of natural language
+ """
+""" consider map_const.TF_WRAPX == 1 """
+status_str=str(unexplored_tiles_num)+ \
+' '+'tiles unexplored'
+terrain_str=str(terrains_num)+' '+terrain
+extra_str=str(extras_num)+' '+extra
+unit_str=str(int(units_num))+' '+unit
+unit_owner_str='unit owners are:'
+unit_owner_str+=' myself player_'+ \
+unit_owner_str+=' '+ \
+DS_TXT[ds_of_owner]+' player_'+ \
+owner_num)+' cities of myself player_'+str(int(city_owner))
+city_owner_str=(str(owner_num)+' cities of a '+DS_TXT[ds_of_owner]+
+' player_'+str(int(city_owner)))
+ Convert observations and available actions of all actors from `FreecivBaseEnv` into a dict of natural language
+ """
+# The following two conditions are used to check if 1. the city is just built or is building
+# coinage, and 2. the city has just built a unit or an improvement last turn and there are some
+# production points left in stock.
