Skip to content

Performance Features

genar edited this page Mar 13, 2024 · 20 revisions

"It's not fast enough, I wanna simulate millions of entities, Arch sucks..." - No it does not, we got you! Arch provides several features, especially in those cases.

Bulk adding

Arch supports bulk adding of entities, this is incredibly fast since it allows us to allocate enough space for a certain set of entities in one go. This reservation happens on top of the already existing entities in an archetype. You only need to reserve space once and then it will be filled later or sooner.

var archetype = new ComponentType[]{ typeof(Position), typeof(Velocity) };

// Create 1k entities
for(var index = 0; index < 1000; index++)
    world.Create(archetype)

world.Reserve(archetype, 1000000);              // Reserves space for additional 1mil entities
for(var index = 0; index < 1000000; index++)    // Create additional 1 mil entities
    world.Create(archetype)

// In total there now 1mil and 1k entities in that certain archetype. 

Bulk Query Operations

For maximum performance, some operations can also be applied to a query, addressing all entities contained in the query and executing the operation in bulk. This has the advantage that it saves code and is incredibly efficient and fast.

world.Destroy(in queryDesc);
world.Add<T0,T1,...T25>(in queryDesc, ...);
world.Remove<T0,T1...T25>(in queryDesc);
world.Set<T0,T1...T25>(in queryDesc, ...);

This operation copies and processes the entities in the bulk, making it more cache efficient and perfect for scenarios where the same logic is applied to all entities.

Batched operations

For complex games, it's pretty common to work on entities directly, especially when there are relations between entities. Normally this is often a bottleneck, however, we recently implemented generic overloads for this kind of task to improve the performance. This is valid for Entity, World, Archetype, and Chunk.

// Entity overloads
entity.Set<T0...T9>();
entity.Get<T0...T9>();
entity.Has<T0...T9>();
entity.Add<T0...T9>();
entity.Remove<T0...T9>();

// World overloads
world.Create<T0...T9>(...);
world.Set<T0...T9>(entity);
world.Get<T0...T9>(entity);
world.Has<T0...T9>(entity);
world.Add<T0...T9>(entity);
world.Remove<T0...T9>(entity);

// Archetype overloads
archetype.Set<T0...T9>(ref Slot slot);
archetype.Get<T0...T9>(ref Slot slot);
archetype.Has<T0...T9>();

// Chunk overloads
chunk.Set<T0...T9>(index);
chunk.Get<T0...T9>(index);
chunk.Has<T0...T9>();

So it will dramatically increase your games performance when you rewrite

var entity = world.Create(archetype);
entity.Set(new Transform());
entity.Set(new Movement());
ref var t = ref entity.Get<Transform>();
ref var m = ref entity.Get<Movement>();

to this

var entity = world.Create(new Transform(), new Movement());
var refs = entity.Get<Transform, Movement>();

Highperformance Queries

The default Query API is easy to use and still very fast, perfect for fast prototyping and the most features of your game. However, sometimes you need even more power, that's where the high-performance queries kick in.

world.InlineQuery<Struct,T0,T1...>(in queryDescription, ref myStruct);
world.InlineEntityQuery<Struct,T0,T1...>(in queryDescription, ref myEStruct);

Those high-performance queries use an interface and its struct implementation. This allows the compiler to inline the method call which results in less address jumping and even faster iteration speed. Therefore you need to know two important interfaces and how to implement them.

public interface IForEach<T0...T10>{
    void Update(ref T0 t0, ref T1 t1, ... ref T10 t10);
}

public interface IForEachWithEntity<T0...T10>{
    void Update(in Entity entity, ref T0 t0, ref T1 t1, ... ref T10 t10);
}

Those two interfaces provide Update methods with various generic overloads which can be used to implement the entity operations. All you need to do is implement the interface in a struct and pass that struct to the high-performance query API.

public struct VelocityUpdate : IForEach<Position, Velocity> {

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Update(ref Position pos, ref Velocity vel) { 
        pos.x += vel.x;
        pos.y += vel.y;
    }
}

world.InlineQuery<VelocityUpdate, Position, Velocity>(in queryDescription);


// Also possible with a struct reference
public struct VelocityUpdate : IForEach<Position, Velocity> {

    public int counter;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Update(ref Position pos, ref Velocity vel) { 
        pos.x += vel.x;
        pos.y += vel.y;
        counter++;
    }
}

var velUpdate = new VelocityUpdate();
world.InlineQuery<VelocityUpdate, Position, Velocity>(in queryDescription, ref velUpdate);
Console.WriteLine(velUpdate.counter);

That's all, pretty cool right? However, in some cases, you may also need a direct reference to the entity itself. In this case there's the IForEachEntity interface that is required.

public struct VelocityUpdate : IForEachWithEntity<Position, Velocity> {

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Update(in Entity entity, ref Position pos, ref Velocity vel) { 
        pos.x += vel.x;
        pos.y += vel.y;
        Console.WriteLine(entity);
    }
}

world.InlineEntityQuery<VelocityUpdate, Position, Velocity>(in queryDescription);  // <- Requires HPEQuery instead of HPQuery

Multithreading

Still not fast enough? You want to simulate those million entities, don't you? Well... I got your back!
Arch uses a self-written and alloc-free JobScheduler under the hood to dispatch your query logic to a bunch of worker threads.

Before you can use those features you need to create an instance of the JobScheduler, this creates a singleton that the World will use.

// Create Scheduler and assign it to world
_jobScheduler = new(
new JobScheduler.Config
{
  ThreadPrefixName = "Arch.Samples",
  ThreadCount = 0,                           // 0 = Determine at runtime
  MaxExpectedConcurrentJobs = 64,
  StrictAllocationMode = false,
}
);
_world.SharedJobScheduler = _jobScheduler;

// To dispose the JobScheduler at the end of the lifecycle.
_jobScheduler.Dispose();

The easiest ways to make use of that multithreading are the Parallel Query overloads. The syntax is the same as the normal Query methods you already know.

world.ParallelQuery(in query, ...);
world.InlineParallelQuery(in query, ...);
world.InlineParallelEntityQuery(in query, ...);
world.InlineParallelChunkQuery(in query, ...);
world.ScheduleInlineParallelChunkQuery(in query, ...); 

Therefore you can easily just rewrite your queries like this.

world.Query(in query, (ref Transform t, ref Velocity v) => {
   t.x += v.x;
   t.y += v.y;
});

to

world.ParallelQuery(in query, (ref Transform t, ref Velocity v) => {
   t.x += v.x;
   t.y += v.y;
});

And that's all! The rest is handled under the hood, however, those calls are blocking the main thread. A parallel query is being scheduled to a bunch of worker threads and the call waits for all scheduled jobs to finish before proceeding with the next query. Since the jobs are processed by multiple worker-threads simultaneously it's incredible fast.

An alternative to those high-level queries is the IChunkJob interface which can be passed to a specific overload. This inlined interface is then called by each worker thread for the processed chunks and you can define the logic yourself.

public struct VelocityUpdate : IChunkJob{

   public void Execute(ref Chunk chunk) {
      
      var transforms = chunk.GetSpan<Transform>();
      var velocities = velocity.GetSpan<Velocity>();

      foreach(var index in chunk){

         var transform = transforms[index];
         ...
      }
   }
}

world.ParallelQuery(in query, new VelocityUpdate());

In a multithreaded environment, you should NEVER modify the world, archetype, or chunk structure. You should not add or remove entities... however it's totally fine to update entities.

Pure ECS

One default entity is up to 8 bytes in size. It contains an int as its Id and an int for a reference to its World. A bunch of extensions utilizes the WorldId inside the Entity to add several methods for easing the workflow with them.

However, in some cases, it makes sense to remove the WorldId from the Entity to save those additional 4 bytes. The slimmer the entity is, the more entities fit into a chunk and the better the iteration speed is.

With a simple #define PURE_ECS inside your forked repo, you can easily turn on this "pure ECS" paradigm. It removes the WorldId from the entity and resizes it to exactly 4 bytes. This increases the iteration speed and saves memory. However, you will not be able to use any Entity extension methods anymore, you will need to operate on the world directly.

world.Query(in desc, (in Entity en, ...) -> {

   var references = en.Get<Transform, Velocity>();
   ...
});

Must be rewritten to this...

#define PURE_ECS
public static World world;
world.Query(in desc, static (in Entity en, ...) -> {

   var references = world.Get<Transform, Velocity>(en);
   ...
});

Arch.Unsafe

The main version of Arch uses secure and managed Arch code. However, this has some slight drawbacks, for example the internal associated arrays are still haunted by C#'s garbage collection.

To escape the GC beast there is an extra branch. On this branch the most important internals of Arch are written in unsafe C#, using native memory for entities and components. This removes them from the GC's influence and partially improves long-term performance.

Unmanaged components are stored in native memory, managed structs and classes are still stored in traditional gc controlled data structures. Thus, this version combines the best of both worlds.

https://github.com/genaray/Arch/tree/feature/unsafe

Performance tipps

Let's talk about some tips for getting the most out of this ECS framework. Follow these for hot paths and performance-critical code, otherwise you are free to do whatever you want.

Generic API & InlineQuery

To avoid allocations and benefit from perfect compile time static speed, you should always use the generic Apis if possible. They are as fast as it gets.

e.g.
entity.Set<Position, Velocity>(new Position(), new Velocity());
instead of 
entity.SetRange(new Position(), new Velocity()); // This is slower and should only be used if the types are not known compile time.

Also you should use the source generator if possible, custom enumeration or inline queries. These also give you a good performance boost.

More about this here: https://github.com/genaray/Arch/wiki/Query-techniques

Entity Size

Keep your entities as small as possible, their components should only hold the bare minimum. Remove unnecessary fields and try to outsource common values. The smaller your entity is in terms of byte size, the faster your systems will run.

// Game stats for an RPG
public struct Stats { public float minHealth, maxHealth, physicalDamage, physicalDefence, luck, critical, ... }
entity.Set<Stats>(new Stats(...));

Are you sure that this is really necessary? In most cases it's not, each e.g. Orc will mostly have the same stats. Try to store a reference to it instead. This is also great to reduce your memory usage in general, it's called the Flyweight Pattern.

public struct Stats { public float minHealth, maxHealth, physicalDamage, physicalDefence, luck, critical, ... }
public struct StatReference{ public int index; };

public Stats[] StatsArray = new Stats[]{ ... };

orcEntity.Set<StatReference>(new StatReference{ index = 10 });  // Entity references the stats located in the 10th index of the Stats array.
otherOrcEntity.Set<StatReference>(new StatReference{ index = 10 });  // Entity references the stats located in the 10th index of the Stats array.

// In your system you need to access that array based on your needs

Look, it's not that hard, right? By storing huge structs somewhere else, your entities are becoming smaller... and if they are smaller, more fit into each chunk, and less memory is being loaded into the cache. This combination makes the queries much faster.

Divide and Rule

Separate and divide your entity components based on what you query. This will improve your query performance a lot.
Let's look at one example.

public struct Transform{ public float x,y,z, rotX,rotY,rotZ };  // Stores position AND rotation

world.query(..., (ref Transform tranform) => {
   // Update position
};


world.query(..., (ref Transform tranform) => {
   // Update rotation
};

What do you notice? Exactly... you iterate over all Transforms to update the position... and later you iterate over all transforms to update the rotation. This is a no-go, why do you iterate over Transform if you really just need to access the Rotation from it?

This will slow down your query since the CPU needs to load more data into the cache, unused data. Since when you iterate over Transform to update its Rotation, you also load the Position with it... which in this case is not necessary and slow.

Instead, you should always divide components based on your specialized needs.

public struct Position{ public float x,y,z };  // Stores position
public struct Rotation{ public float rotX, rotY, rotZ } // Stores rotation

world.query(..., (ref Position pos) => {
   // Update position
};


world.query(..., (ref Rotation rot) => {
   // Update rotation
};

This will only load and process what you need, thus it becomes way faster. So always remember, to divide your components and only query what you need at that particular moment.

Update single fields

Update only fields instead of the whole object or struct. This has the advantage that less code is generated, which makes the code slightly faster.

World.Query(in desc, (ref Position pos, ref Velocity vel) => {
   pos.X += vel.X;
   pos.Y += vel.Y;
});

instead of 

World.Query(in desc, (ref Position pos, ref Velocity vel) => {
   pos = new Position(pos.X + vel.X, pos.Y + vel.Y);
});

This can make a small but significant difference depending on the scenario. This was analyzed and benchmarked during a discussion: https://github.com/genaray/Arch/discussions/49