Skip to content

Commit

Permalink
📝 Tech report
Browse files Browse the repository at this point in the history
  • Loading branch information
yhs0602 committed Apr 11, 2024
1 parent 3e9d657 commit f77622f
Show file tree
Hide file tree
Showing 2 changed files with 191 additions and 2 deletions.
159 changes: 157 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
@@ -1,3 +1,158 @@
<html>
Hello.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>CraftGround: A Minecraft-based Reinforcement Learning Environment</title>
<style>
body { font-family: Arial, sans-serif; }
.container { max-width: 800px; margin: auto; }
h1, h2 { color: #333; }
p, li { color: #666; }
code { background-color: #f9f9f9; padding: 2px 4px; }
</style>
</head>
<body>
<div class="container">
<h1>CraftGround</h1>
<p>A <strong>fast</strong>, <strong>up-to-date</strong>, and <strong>feature-rich</strong> Minecraft-based reinforcement learning environment.</p>
<h2>Version Support</h2>
<p>Supports Minecraft version <code>1.19.4</code>. Current version of CraftGround: <code>1.7.23</code>.</p>
<h2>Features</h2>
<h3>Initial Environment</h3>
<p>Refer to the below proto for the InitialEnvironment:</p>
<pre><code>message InitialEnvironmentMessage {
repeated string initialInventoryCommands = 1;
repeated int32 initialPosition = 2;
repeated string initialMobsCommands = 3;
int32 imageSizeX = 4;
int32 imageSizeY = 5;
int64 seed = 6;
bool allowMobSpawn = 7;
bool alwaysNight = 8;
bool alwaysDay = 9;
string initialWeather = 10;
bool isWorldFlat = 11;
int32 visibleSizeX = 12;
int32 visibleSizeY = 13;
repeated string initialExtraCommands = 14;
repeated string killedStatKeys = 15;
repeated string minedStatKeys = 16;
repeated string miscStatKeys = 17;
repeated BlockState initialBlockStates = 18;
repeated int32 surroundingEntityDistances = 19;
bool hudHidden = 20;
int32 render_distance = 21;
int32 simulation_distance = 22;
bool biocular = 23;
float eye_distance = 24;
repeated string structurePaths = 25;
bool noWeatherCycle = 26;
bool no_pov_effect = 27;
bool noTimeCycle = 28;
bool request_raycast = 29;
int32 screen_encoding_mode = 30;
}</code></pre>
<h3>Observation Space</h3>
<p>Includes basic vision rendering, binocular rendering, list of sounds around the agent, agent's status effects, and more. See the proto file for detailed information.</p>
<pre><code>message ItemStack {
int32 raw_id = 1;
string translation_key = 2;
int32 count = 3;
int32 durability = 4;
int32 max_durability = 5;
}

message BlockInfo {
int32 x = 1;
int32 y = 2;
int32 z = 3;
string translation_key = 4;
}

message EntityInfo {
string unique_name = 1;
string translation_key = 2;
double x = 3;
double y = 4;
double z = 5;
double yaw = 6;
double pitch = 7;
double health = 8;
}

message HitResult {
enum Type {
MISS = 0;
BLOCK = 1;
ENTITY = 2;
}

Type type = 1;
BlockInfo target_block = 2;
EntityInfo target_entity = 3;
}

message StatusEffect {
string translation_key = 1;
int32 duration = 2;
int32 amplifier = 3;
}

message SoundEntry {
string translate_key = 1;
int64 age = 2;
double x = 3;
double y = 4;
double z = 5;
}

message EntitiesWithinDistance {
repeated EntityInfo entities = 1;
}

message ObservationSpaceMessage {
bytes image = 1;
double x = 2;
double y = 3;
double z = 4;
double yaw = 5;
double pitch = 6;
double health = 7;
double food_level = 8;
double saturation_level = 9;
bool is_dead = 10;
repeated ItemStack inventory = 11;
HitResult raycast_result = 12;
repeated SoundEntry sound_subtitles = 13;
repeated StatusEffect status_effects = 14;
map<string, int32> killed_statistics = 15;
map<string, int32> mined_statistics = 16;
map<string, int32> misc_statistics = 17;
repeated EntityInfo visible_entities = 18;
map<int32, EntitiesWithinDistance> surrounding_entities = 19;
bool bobber_thrown = 20;
int32 experience = 21;
int64 world_time = 22;
string last_death_message = 23;
bytes image_2 = 24;
}
</code></pre>
<h3>Action Space</h3>
<p>Similar to Minedojo. (Crafting Not supported)</p>
<pre><code>message ActionSpaceMessage {
repeated int32 action = 1;
repeated string commands = 2;
}
</code></pre>
<p>Supports headless offscreen rendering using VirtualGL and Xvfb.</p>
<h2>Performance</h2>
<p>300fps on M1 Pro with a random policy.</p>
<h2>Installation</h2>
<p><code>pip install git+https://github.com/yhs0602/CraftGround</code></p>
<p>Dependencies: JDK 17, OpenGL, GLEW, libpng, zlib</p>
<h2>Technical Report</h2>
<p>Link to the technical report on how the performance was achieved will be here.</p>
</div>
</body>
</html>
34 changes: 34 additions & 0 deletions docs/technical_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# CraftGround Internals
## Capturing Frames
### Method 1: Using Default Screenshot Functionality
```java
public static NativeImage takeScreenshot(Framebuffer framebuffer)
```
This method requires a `Framebuffer` object from the `MinecraftClient` class and is used to capture the current frame of the game. Eventually, the screen data must be converted into a `ByteString` object from protobuf.
### Method 2: Using `glGetTexImage` Function
The limitation of Method 1 involves multiple steps: reading pixel data from the texture, packing the alpha data, vertically mirroring, calling the `getBytes` method of `NativeImage`, reading it using `ImageIO`, resizing, writing into a `ByteArrayOutputStream`, converting to `ByteArray` (which copies the data again), and calling `ByteString.copyFrom()` (which also copies the data). To streamline this, we directly use the `glGetTexImage` function in native code, which reads pixel data from the texture and converts it directly into a `ByteString` object. This approach is faster as it minimizes data copying.
### Stage 3: Using `glReadPixels` Function (Current Method)
Although fast, Stage 2's method had a potential flaw: when calling the `glGetTexImage` function, rendering might not be completed, leading to outdated data capture. Thus, `glFinish` was necessary to ensure rendering completion, which could slow down the process. The current method utilizes the `glReadPixels` function, which inherently waits for rendering completion. This method is potentially quicker than combining `glGetTexImage` and `glFinish`, as it only waits for the necessary GL operations for the current frame rendering.
## Synchronizing Simulation
Minecraft uses two threads: one for `MinecraftClient`, which renders the game, and another for `MinecraftServer`, handling game logic. To synchronize, ensuring the client thread waits for the agent's action decision for the current tick and the server thread waits for the client's rendering completion, we employ the `TickSynchronizer` class. This synchronization allows seamless operation between client rendering, observation sending, and action reading from the agent.
## Offscreen Rendering
A major concern with 3D rendering environments is their functionality on GPU servers without a connected display, potentially leading to crashes or forced software rendering. By using [Xvfb](https://www.x.org/releases/X11R7.6/doc/man/man1/Xvfb.1.xhtml) and [VirtualGL](https://virtualgl.org/) for virtual display and GPU utilization, we ensure the environment runs on headless servers without issues, optimizing performance.
## Communication between Java and Python
To efficiently handle binary (e.g., screen captures) and text data (e.g., sounds, agent states), we use [protobuf](https://protobuf.dev/) for serialization and deserialization. Initially, one might consider base64 encoding within a JSON string, but this method is less efficient due to a 33% increase in data size. Protobuf adds minimal overhead, enhancing efficiency. For faster communication than TCP sockets, we utilize Unix domain sockets, ideal for the frequent data exchanges required every tick. Additionally, we send raw image data to avoid the slowdown from encoding/decoding processes, leaving data manipulation to the agent.

# Optimizations
To further improve the performance of the agent, we have implemented the following optimizations:
- **Optimization 1**: Using `glReadPixels` over `glGetTexImage` for data capture.
- **Optimization 2**: Incorporating [Sodium](https://github.com/CaffeineMC/sodium-fabric) and [Lithium](https://github.com/CaffeineMC/lithium-fabric) mods for rendering and simulation logic optimization.
- **Optimization 3**: Skipping world data saving to disk, unnecessary for agent learning. Traditionally happening every 6000 ticks (or five real-time minutes), this can be accelerated in simulations to occur every minute at 100 ticks per second (tps).
- **Optimization 4**: Omitting vertical image flipping in favor of numpy indexing on the Python side, optimizing channel swapping directly in the agent processing.
- **Optimization 5**: Using Unix domain sockets for communication between Java and Python, reducing latency and improving data exchange efficiency.
- **Optimization 6**: Adjusting JVM options for better performance.
# For contributors
## Fabric Mod
- Refer to the [Fabric Wiki](https://fabricmc.net/wiki/start) for mod development.
## Setting Up Building Native Code
- This project uses [Java Native Interface](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/) for native code. Ensure you have the necessary tools installed.
- [CMake](https://cmake.org/) build system is used for the native code. Install CMake and ensure it is in your system's PATH.
- As `glBindFramebuffer` is used, the project requires [GLEW](https://glew.sourceforge.net/) (OpenGL Extension Wrangler Library) for OpenGL function loading. Ensure GLEW is installed on your system. Also, you need to call `glewInit()` before using any OpenGL functions. This is actually implemented in the native code. Though JVM side is already using OpenGL 3.0 functions and above, it is necessary to initialize GLEW for the native code, separately.

0 comments on commit f77622f

Please sign in to comment.