Avoid unnecessary work by deferring it until the result is needed.
使用延时计算避免不必要的工作。
Many games have something called a scene graph. This is a big data structure that contains all of the objects in the world. The rendering engine uses this to determine where to draw stuff on the screen.
许多游戏都有一个称之为场景图的东西.这是一个大的数据结构,包含了游戏世界中所有的物体。渲染引擎使用他来决定将物体绘制到屏幕的什么地方。
At its simplest, a scene graph is just a flat list of objects. Each object has a model, or some other graphic primitive, and a transform. The transform describes the object's position, rotation, and scale in the world. To move or turn an object, we simply change its transform.
简单来说,一个场景图是包含多个物体的列表。每个物体都含有一个模型或者其他图元,和一个变换。变换描述了物体在世界中的位置,旋转角度和缩放大小。想要移动或者旋转物体,我们可以修改他的变换。
The mechanics of how this transform is stored and manipulated are unfortunately out of scope here. The comically abbreviated summary is that it's a 4x4 matrix. You can make a single transform that combines two transforms -- for example, translating and then rotating an object -- by multiplying the two matrices.
变换的存贮和应用的机制不在我们的讨论范围之内.概括的说他是一个4x4的矩阵。你可以将两个转换 矩阵合并为一个 ——举个例子,移动一个物体再旋转——可以将这两个转换矩阵相乘得到一个等价的新矩阵。
How and why that works is left as an exercise for the reader.
这么做的方法和原理作为一个练习留给读者。
When the renderer draws an object, it takes the object's model, applies the transform to it, and then renders it there in the world. If we had a scene bag and not a scene graph, that would be it, and life would be simple.
当渲染器绘制一个物体时,它将这个物体的转换作用到这个物体的模型上,然后将它渲染出来。如果我们有场景包而不是场景图的话,生活也会简单的多。
However, most scene graphs are hierarchical. An object in the graph may have a parent object that it is anchored to. In that case, its transform is relative to the parent's position and isn't an absolute position in the world.
然而,许多场景图是继承结构的。场景中的一个物体会附着在一个父物体上。在这种情况下,他的变换就依赖于它父物体的变换,而不是世界中的一个绝对位置。
For example, imagine our game world has a pirate ship at sea. Atop the ship's mast is a crow's nest. Hunched in that crow's nest is a pirate. Clutching the pirate's shoulder is a parrot. The ship's local transform positions the ship in the sea. The crow's nest's transform positions the nest on the ship, and so on.
举个例子,想象我们的游戏中有一艘橡木船,桅杆的顶部是一个乌鸦槽。一个海盗站在这个乌鸦嘴上。一只鹦鹉抓 在海盗的肩膀上。这艘船的变化标记了它在海中的位置。乌鸦槽的变换标记了它在船上的位置。。。
Programmer art!
This way, when a parent object moves, its children move with it automatically. If we change the local transform of the ship, the crow's nest, pirate, and parrot go along for the ride. It would be a total headache if, when the ship moved, we had to manually adjust the transforms of all the objects on it to keep them from sliding off.
这样,当一个父物体移动时,它的子物体也会跟着移动。如果我们修改船的局部变换,海盗,鹦鹉,也会变动。如果在船移动是要手动修改船上所有物体的变换来防止相对滑动,这会是一件很头疼的事情。
To be honest, when you are at sea you do have to keep manually adjusting your position to keep from sliding off. Maybe I should have chosen a drier example.
老实说,但你在海里是,你需要手动的调整你的位置来防止滑动。
But to actually draw the parrot on screen, we need to know its absolute position in the world. I'll call the parent-relative transform the object's local transform. To render an object, we need to know its world transform.
但是实际上绘制海盗到屏幕上时,我们需要知道它在世界中的绝对位置。我们需要依次方位他们相对于父物体的本地变换。为了渲染一个物体,我们需要知道它的世界变换。
Calculating an object's world transform is pretty straightforward -- you just walk its parent chain starting at the root all the way down to the object, combining transforms as you go. In other words, the parrot's world transform is:
计算一个物体的时间变换是相当直观的--只要从根节点沿着它的父链将变换组合起来就是。也就是说,鹦鹉的世界变换就是:
In the degenerate case where the object has no parent, its local and world transforms are equivalent. 在退化的情况下,当物体没有父物体时,它的本地变换和世界变换相等。
We need the world transform for every object in the world every frame, so even though there are only a handful of matrix multiplications per model, it's on the hot code path where performance is critical. Keeping them up to date is tricky because when a parent object moves, that affects the world transform of itself and all of its children, recursively.
每个游戏帧,我们都需要世界中每个物体的世界变换。所以即使每个模型中只有少数的几个矩阵相乘,都是代码中 影响性能的关键所在。保持他们及时更新是棘手的。因为当一个父物体移动,他会影响他自己的世界变化和他所有的子 物体,以及子物体的子物体...
The simplest approach is to calculate transforms on the fly while rendering. Each frame, we recursively traverse the scene graph starting at the top of the hierarchy. For each object, we calculate its world transform right then and draw it.
最简单的途径是在渲染的过程中计算变换。每一帧中,我们从顶层开始,递归便利场景图。对每个物体,我们计算他们的 世界坐标并绘制它。
But this is terribly wasteful of our precious CPU juice! Many objects in the world are not moving every frame. Think of all of the static geometry that makes up the level. Recalculating their world transforms each frame even though they haven't changed is a waste.
这是,这对我们宝贵的CPU资源是一种可怕的浪费。许多物体并不是每一帧都移动。想想关卡中那些静止的几何体。每一帧都要重计算没有移动的物体的时间变换是一种浪费。
The obvious answer is to cache it. In each object, we store its local transform and its derived world transform. When we render, we only use the precalculated world transform. If the object never moves, the cached transform is always up to date and everything's happy.
一个明显的解决方法是将它缓存起来。在每个物体中,我们保存它的本地变换和它的时间变换。但我们渲染时,我们只使用预先计算好的时间变换。如果物体没有移动,那么缓存的变换就是最新的,皆大欢喜。
When an object does move, the simple approach is to refresh its world transform right then. But don't forget the hierarchy! When a parent moves, we have to recalculate its world transform and all of its children's, recursively.
当一个物体移动了。简单的方法就是立即刷新它的世界变换。但是不要忘了继承链!当一个父物体移动时,我们需要重计算它的时间变换和递归的计算它所有子物体的世界变换。
Imagine some busy gameplay. In a single frame, the ship gets tossed on the ocean, the crow's nest rocks in the wind, the pirate leans to the edge, and the parrot hops onto his head. We changed four local transforms. If we recalculate world transforms eagerly whenever a local transform changes, what ends up happening?
现象一个激烈的游戏。船被扔进海里,乌鸦槽在风中晃动。海盗靠在槽边,鹦鹉在他的头上 跳动。我们修改了4个本地变换。如果我们在每个本地变换修改时都急忙忙的重新计算世界变换,结果会发生 什么?
You can see on the lines marked ★ that we're recalculating the parrot's world transform four times when we only need the result of the final one.
我们可以看到 ★ 这行。我们重新计算了4次鹦鹉的世界变换才得到我们想要的最终结果。
We only moved four objects, but we did ten world transform calculations. That's six pointless calculations that get thrown out before they are ever used by the renderer. We calculated the parrot's world transform four times, but it is only rendered once.
我们只移动了4个物体,但是我们做了10次世界变换计算。这6次无意义的计算在渲染器使用之前就被扔掉了。 我们计算了4次鹦鹉的世界变换,但是只渲染了一次。
The problem is that a world transform may depend on several local transforms. Since we recalculate immediately each time one of the transforms changes, we end up recalculating the same transform multiple times when more than one of the local transforms it depends on changes in the same frame.
问题的关键是一个世界变换可能依赖好几个本地变换。由于我们在每次变换修改时都立刻重计算,结果是如果一帧内有好几个依赖的本地变换改变了,我们将这个变换重新计算了好多变。
We'll solve this by decoupling changing local transforms from updating the world transforms. This lets us change a bunch of local transforms in a single batch and then recalculate the affected world transform just once after all of those modifications are done, right before we need it to render.
我们通过将修改本地变换和更新世界变换解耦来解决这个问题。这让我们在单次作业中修改多个本机变换,然后 在所有变动完成之后,仅仅需要计算一次被影响的世界变换。
It's interesting how much of software architecture is intentionally engineering a little slippage.
To do this, we add a flag to each object in the graph. "Flag" and "bit" are synonymous in programming -- they both mean a single micron of data that can be in one of two states. We call those "true" and "false", or sometimes "set" and "cleared". I'll use all of these interchangeably.
要做到这点,我们为图找那个的每个物体添加一个flag。“Flag”和“bit”在编程中是同义词——他们都表示单个微笑的数据能够储存两个状态中的一个。我们称之为“true”和“falsely”,有时也叫“set”和“cleared”。我会交互的使用他们。
When the local transform changes, we set it. When we need the object's world transform, we check the flag. If it's set, we calculate the world transform and then clear the flag. The flag represents, "Is the world transform out of date?" For reasons that aren't entirely clear, the traditional name for this "out-of-date-ness" is "dirty". Hence: a dirty flag. "Dirty bit" is an equally common name for this pattern, but I figured I'd stick with the name that didn't seem as prurient.
当本地变换改动时,我们设置它。当我们需要这个物体 的世界变换时,我们检查这个flag。如果它被设置了,我们计算这个世界变换,然后清除这个flag。这个flag 代表这“这个世界变换是不是过期了?”由于一些模糊的原因,传统上这个“过期的”被称作“dirty”。“Dirty bit"也是这个模式常见的名字。但是我想我会坚持使用这种看起来没什么特殊的名字。
Wikipedia's editors don't have my level of self-control and went with dirty bit
维基百科的编辑没有我这么强的自制力,将它称之为dity bit
If we apply this pattern and then move all of the objects in our previous example, the game ends up doing:
如果我们运用这个模式,然后将我们例子中的所有物体移去。
That's the best you could hope to do -- the world transform for each affected object is calculated exactly once. With only a single bit of data, this pattern does a few things for us:
这是你能期望的最好的办法。每个被影响的物体的世界变换只需要计算一次。只需要一个简单的位数据,这个模式 为我们做了不少事:
-
It collapses modifications to multiple local transforms along an object's parent chain into a single recalculation on the object.
-
它将父链上物体的多个本地世界变换的改动分解为每个物体的一次重计算。
-
It avoids recalculation on objects that didn't move.
-
他避免了没有移动的物体的重计算
-
And a minor bonus: if an object gets removed before it's rendered, it doesn't calculate its world transform at all.
-
一个额外的好处:如果一个物体在渲染之前移除了,就根本不计算他的世界变换。
A set of primary data changes over time. A set of derived data is determined from this using some expensive process. A "dirty" flag tracks when the derived data is out of sync with the primary data. It is set when the primary data changes. If the flag is set when the derived data is needed, then it is reprocessed and the flag is cleared. Otherwise, the previous cached derived data is used.
一系列的 primary data 随时间变化。一系列的 derived data 根据些数据由一些昂贵操作 得到。 一个dirty flag 跟踪这个派生数据是否和原生数据同步。他在元数据做改动时设置。如果它被设置 了,当需要派生数据时,他们就重新计算并清楚标记。否则就使用缓存的数据。
Compared to some other patterns in this book, this one solves a pretty specific problem. Also, like most optimizations, you should only reach for it when you have a performance problem big enough to justify the added code complexity.
想对于本书中的其他模式,这个模式解决一个相当特定的问题。同时,就像大多数优化那样,仅当性能问题大到 不惜增加代码复杂度时才使用它。
Dirty flags are applied to two kinds of work: calculation and synchronization. In both cases, the process of going from the primary data to the derived data is time-consuming or otherwise costly.
脏位涉及到两个关键词:“计算”和“同步”。在两种情况下,在处理元数据到派生数据过程中是费时或其他 大的开销。
In our scene graph example, the process is slow because of the amount of math to perform. When using this pattern for synchronization, on the other hand, it's more often that the derived data is somewhere else -- either on disk or over the network on another machine -- and simply getting it from point A to point B is what's expensive.
在我们的场景图例子中,过程很慢是应为计算量很大。当使用这个模式做同步时,相反,派生数据通常在别的地方。 要么在磁盘上,要么通过网络上的其他机器上。简单的得到它就很昂贵。
There are a couple of other requirements too: 这里也有些其他的要求:
-
The primary data has to change more often than the derived data is used. This pattern works by avoiding processing derived data when a subsequent primary data change would invalidate it before it gets used. If you find yourself always needing that derived data after every single modification to the primary data, this pattern can't help.
-
元数据的修改次数比衍生数据的使用次数多在真正使用之前,数据被随后的修改而失效。这个模式 通过避免计算之前的改动而工作。如果你需要在每次改动原数据时都立刻需要衍生数据,这个模式就没有效果。
-
It should be hard to update incrementally. Let's say the pirate ship in our game can only carry so much booty. We need to know the total weight of everything in the hold. We could use this pattern and have a dirty flag for the total weight. Every time we add or remove some loot, we set the flag. When we need the total, we add up all of the booty and clear the flag.
But a simpler solution is to keep a running total. When we add or remove an item, just add or remove its weight from the current total. If we can "pay as we go" like this and keep the derived data updated, then that's often a better choice than using this pattern and calculating the derived data from scratch when needed.
-
累次的更新数据十分困难我们假设游戏的小船能运载众多的战利品。我们需要知道每个东西的重量。我们 能够使用这个模式,为总量设置一个赃位。每当我们增加或者减少战利品时,我们设置赃位。当我们需要总量 时,我们将所有战利品的重量加起来并清楚赃位。
一个简单的解决方法是保持一个动态的总量。当我们增加或者减少物体是,就从总量上增加或者减去 他的重量。我们可想像这样保持我们的衍生数据更新时,这种方法要比使用这个模式要好。
This makes it sound like dirty flags are rarely appropriate, but you'll find a place here or there where they help. Searching your average game codebase for the word "dirty" will often turn up uses of this pattern.
这些要求看起来觉得赃位没有合适的时候,但是你总能发现他能帮上忙的地方。在你的每个游戏的代码中搜索 ’dirty‘这个单词,通常会发现这种模式的使用。
From my research, it also turns up a lot of comments apologizing for "dirty" hacks.
从我的调查来看,他也导致了很多使用"dirty"技巧的批评。
Even after you've convinced yourself this pattern is a good fit, there are a few wrinkles that can cause you some discomfort.
即使当你有相当的自信认为这个模式十分适用,这里还有一些小的瑕疵让你感到不便。
This pattern defers some slow work until the result is actually needed, but when it is, it's often needed right now. But the reason we're using this pattern to begin with is because calculating that result is slow!
这个模式把某些耗时的工作推迟到真正需要时,但是但需要是,他经常立马需要。但是,我们使用这个这个模式 的初衷是计算出结果很慢。
This isn't a problem in our example because we can still calculate world coordinates fast enough to fit within a frame, but you can imagine other cases where the work you're doing is a big chunk that takes noticeable time to chew through. If the game doesn't start chewing until right when the player expects to see the result, that can cause an unpleasant visible name="gc">pause.
这在我们的例子中不是问题,因为计算世界坐标足够在一帧内完成。你可想想象其他情景,工作量大到需要一个 能够察觉的时间才能完成。如果游戏直到玩家想要看到结果是才开始计算,这会导致一个不好的视觉暂停。
Another problem with deferring is that if something goes wrong, you may fail to do the work at all. This can be particularly problematic when you're using this pattern to save some state to a more persistent form.
另外一个延时的问题是如果某个东西出错,你可能陷入完全无法工作的境地。当你将状态保存在一个更加持久化的 形式中使用这个模式时,问题会尤其突出。
For example, text editors know if your document has "unsaved changes". That little bullet or star in your file's title bar is literally the dirty flag visualized. The primary data is the open document in memory, and the derived data is the file on disk.
举个例子,文本编辑器知道文档是否还有“未保存的修改”。在你文件标题栏上的小子弹或者星星标识这个状态。 元数据是在内存中的打开文档,衍生数据是磁盘上的文件。
Many programs don't save to disk until either the document is closed or the application is exited. That's fine most of the time, but if you accidentally kick the power cable out, there goes your masterpiece.
许多程序在文档关键或者程序退出之前都不会存盘。这在大部分情况下都十分良好,但是如果你意外的将 电源线缆踢出,你的工作就付之东流。
Editors that auto-save a backup in the background are compensating specifically for this shortcoming. The auto-save frequency is a point on the continuum between not losing too much work when a crash occurs and not thrashing the file system too much by saving all the time.
编辑器为了减缓这种损失会在后台自动保存一个备份。自动保存的频率是在既不丢失太多数据,也不造成文件 系统繁忙的一个折中点。
This mirrors the different garbage collection strategies in systems that automatically manage memory. Reference counting frees memory the second it's no longer needed, but it burns CPU time updating ref counts eagerly every time references are changed.
与之类似的是自动内存管理系统中不同的垃圾回收策略。引用技术在没有使用时释放内存,但是每次引用技术 改变是都刷新技术会十分消耗CPU时间。
Simple garbage collectors defer reclaiming memory until it's really needed, but the cost is the dreaded "GC pause" that can freeze your entire game until the collector is done scouring the heap.
简单垃圾回收策略将内存回收推迟到需要时再进行,但是代价是可怕的“GC 暂停”,他将是整个游戏冻结起来, 直到回收器清理完了堆数据。
In between the two are more complex systems like deferred ref-counting and incremental GC that reclaim memory less eagerly than pure ref-counting but more eagerly than stop-the-world collectors.
在这两者之间的是更复杂的系统,如延时引用计数和增量式GC。他们比纯粹的引用计数更少的回收内存,但是比 暂停世界的回收器更加频繁。
Since the derived data is calculated from the primary data, it's essentially a cache. Whenever you have cached data, the trickiest aspect of it is cache invalidation -- correctly noting when the cache is out of sync with its source data. In this pattern, that means setting the dirty flag when any primary data changes.
既然衍生数据是通过原始数据计算而来,它基本上就是一份缓存。单一修改数据时,棘手的问题是缓存失效。——当缓存 和原始数据不同步是,什么都不正确了。在这个模式只能怪,它意味着当任何元数据变动是,都要设置脏标记。
Phil Karlton famously said, "There are only two hard things in Computer Science: cache invalidation and naming things."
Phil Karlton 有句名言。"在计算机科学中只有两件难事:缓存失效和命名"
Miss it in one place, and your program will incorrectly use stale derived data. This leads to confused players and bugs that are very hard to track down. When you use this pattern, you'll have to take care that any code that modifies the primary state also sets the dirty flag.
在一个地方错过,你的程序就会不正确的使用陈旧的数据。这回导致玩家困惑和十分难以跟踪的bug。当你使用这个模式时, 你需要小心,在任何改动数据的地方都要设置脏标记。
One way to mitigate this is by encapsulating modifications to the primary data behind some interface. If anything that can change the state goes through a single narrow API, you can set the dirty flag there and rest assured that it won't be missed.
一个减轻这点的方法是将原始数据的改动封装起来。任何可能的变动都通过一个狭窄的API,你可以在这里这是脏标记,并且不用担心会遗失。
When the derived data is needed and the dirty flag isn't set, it uses the previously calculated data. This is obvious, but that does imply that you have to keep that derived data around in memory in case you end up needing it later.
当需要衍生数据而赃位没有设置时,就会使用之前计算的数据。这是显然易见的,但是这不意味着你你必须你 必须将衍生数据保存在内存中以防你之后会用到它。
This isn't much of an issue when you're using this pattern to synchronize the primary state to some other place. In that case, the derived data isn't usually in memory at all.
当你使用这个模式来同步原数据到其他地方是,这不是什么问题。在这种情况下,衍生数据更本不在内存中。
If you weren't using this pattern, you could calculate the derived data on the fly whenever you needed it, then discard it when you were done. That avoids the expense of keeping it cached in memory at the cost of having to do that calculation every time you need the result.
如果你没有使用这个模式,你可以在需要的时候即可计算衍生数据,然后在使用完之后丢弃。这避免了将 它缓存在内存中的损失,代价是每次需要结果时都要计算一次。
Like many optimizations, then, this pattern trades memory for speed. In return for keeping the previously calculated data in memory, you avoid having to recalculate it when it hasn't changed. This trade-off makes sense when the calculation is slow and memory is cheap. When you've got more time than memory on your hands, it's better to calculate it as needed.
就行其他优化那样。这个模式在空间和时间上做平衡。当返回内存中之前计算的数据时,你避免了未修改数据 的重计算。这在内存便宜而计算费时的情况下是合算的。当内存比时间更加宝贵是,按需计算会比较好。
Conversely, compression algorithms make the opposite trade-off: they optimize space at the expense of the processing time needed to decompress.
相反的,压缩算法做了相反的取舍,他利用耗时的解码来优化空间大小。
Let's assume we've met the surprisingly long list of requirements and see how the pattern looks in code. As I mentioned before, the actual math behind transform matrices is beyond the humble aims of this book, so I'll just encapsulate that in a class whose implementation you can presume exists somewhere out in the æther:
让我们假设我们我们满足了"超"长的要求列表,来看看这个模式在代码中是怎样的。如同我之前提到的,矩形计算的数学原理不是本书目标,所以我封装在类里,它的实现你可以假设在其他地方。
class Transform
{
public:
static Transform origin();
Transform combine(Transform& other);
};
The only operation we need here is combine()
so that we can get an object's
world transform by combining all of the local transforms along its parent chain.
It also has a method to get an "origin" transform -- basically an identity
matrix that means no translation, rotation, or scaling at all.
这里我需要的唯一操作就是combine()
, 这样我们可以通过组合父链中所有的本地变换得到它的世界变换。它还有一个方法来得到一个原本的变换。————一个简单的单元矩阵表示没有转换,旋转,缩放。
Next, we'll sketch out the class for an object in the scene graph. This is the bare minimum we need before applying this pattern:
接下来,我们来定义场景图中的物体的类。这是我们运用这个模式的基础。
class GraphNode
{
public:
GraphNode(Mesh* mesh)
: mesh_(mesh),
local_(Transform::origin())
{}
private:
Transform local_;
Mesh* mesh_;
GraphNode* children_[MAX_CHILDREN];
int numChildren_;
};
Each node has a local transform which describes where it is relative to its
parent. It has a mesh which is the actual graphic for the object. (We'll allow
mesh_
to be NULL
too to handle non-visual nodes that are used just to group
their children.) Finally, each node has a possibly empty collection of child
nodes.
每一个节点包含一个本地变换,描述它相对于父物体的变换。它还有一个单元,代表这个物体的真正图元。(我们允许
mesh_
为NULL
来处理只是为了组合子物体的不可见的节点)。最后,每个节点都包含一个可能为空的子物体集合。
With this, a "scene graph" is really only a single root GraphNode
whose
children (and grandchildren, etc.) are all of the objects in the world:
有了这个,一个”场景图“就是一个单一的根GraphNode
,它的子节点(子子节点,...)就是世界中的所有物体。
GraphNode* graph_ = new GraphNode(NULL);
// Add children to root graph node...
In order to render a scene graph, all we need to do is traverse that tree of nodes, starting at the root, and call the following function for each node's mesh with the right world transform:
为了绘制一个场景图,我们需要做的就是遍历节点树,从根节点开始,为每个节点图元结合正确的世界变换调用下面的方法。
void renderMesh(Mesh* mesh, Transform transform);
We won't implement this here, but if we did, it would do whatever magic the renderer needs to draw that mesh at the given location in the world. If we can call that correctly and efficiently on every node in the scene graph, we're happy. 我们这里不实现它,它做的都是一些将图元在给定的地方绘制出来的工作。如果我们能共正确并高效的在每个节点上调用它,那将十分快乐。
To get our hands dirty, let's throw together a basic traversal for rendering the
scene graph that calculates the world positions on the fly. It won't be optimal,
but it will be simple. We'll add a new method to GraphNode
:
开始做一份脏活,我们通过简单的遍历在渲染的时候计算世界坐标。它没有优化,但是简单。 我们为GraphNode
添加一个新方法。
void GraphNode::render(Transform parentWorld)
{
Transform world = local_.combine(parentWorld);
if (mesh_) renderMesh(mesh_, world);
for (int i = 0; i < numChildren_; i++)
{
children_[i]->render(world);
}
}
We pass the world transform of the node's parent into this using parentWorld
.
With that, all that's left to get the correct world transform of this node is
to combine that with its own local transform. We don't have to walk up the
parent chain to calculate world transforms because we calculate as we go while
walking down the chain.
我们用通过parentWorld
将父节点的世界变化传给它。有了这个,剩下的工作就是将它和本地变化结合起来得到正确的世界变换。我们不需要回溯到父节点去计算世界坐标,应为我们沿着父链下来已经计算过了。
We calculate the node's world transform and store it in world
, then we render
the mesh, if we have one. Finally, we recurse into the child nodes, passing in
this node's world transform. All in all, it's a tight, simple recursive
method.
我们计算节点的世界变换然后保存在world
中,然后我们渲染非空图元。最有我们递归的进入子节点中,将当前节点的世界变换传递进去。最后,这是一个紧凑,简单的递归方法。
To draw an entire scene graph, we kick off the process at the root node:
为了绘制整个场景图,我们空根节点开始渲染:
graph_->render(Transform::origin());
So this code does the right thing -- it renders all the meshes in the right place
-- but it doesn't do it efficiently. It's calling local_.combine(parentWorld)
on every node in the graph, every frame. Let's see how this pattern fixes that.
First, we need to add two fields to GraphNode
:
这样,这份代码做了正确的操作——它在正确的地方渲染图元——但并不高效。它每帧都在每个node上调用local_.combine(parentWorld)
。让我们看脏模式是如何修正这点的,我们需要添加两个变量到GraphNode
中。
class GraphNode
{
public:
GraphNode(Mesh* mesh)
: mesh_(mesh),
local_(Transform::origin()),
dirty_(true)
{}
// Other methods...
private:
Transform world_;
bool dirty_;
// Other fields...
};
The world_
field caches the previously calculated world transform, and
dirty_
, of course, is the dirty flag. Note that the flag starts out true
.
When we create a new node, we haven't calculated its world transform yet. At
birth, it's already out of sync with the local transform.
world_
变量缓存了上次计算了的时间变换,dirty_
变量就是脏标记。注意,这个变量用true
初始化。当我们创建一个新
节点时,我们没有计算过他的世界变换。在开始,他就没有和本地变换同步。
The only reason we need this pattern is because objects can move, so let's add support for that:
我们需要这个模式的唯一理由是物体能够移动,所以我们来提供这个功能。
void GraphNode::setTransform(Transform local)
{
local_ = local;
dirty_ = true;
}
The important part here is that it sets the dirty flag too. Are we forgetting anything? Right -- the child nodes!
这里重要的一点是同时设置dirty位。我们忘记什么了吗?哦 -- 子节点。
When a parent node moves, all of its children's world coordinates are invalidated too. But here, we aren't setting their dirty flags. We could do that, but that's recursive and slow. Instead, we'll do something clever when we go to render. Let's see:
当一个父节点移动时,它所有的子节点的世界坐标都无效了。但是这里,我们不设置他们的赃位。我们能做 这点,但是需要递归而且缓慢。相反,我们在渲染是做点聪明的事。来看:
void GraphNode::render(Transform parentWorld, bool dirty)
{
dirty |= dirty_;
if (dirty)
{
world_ = local_.combine(parentWorld);
dirty_ = false;
}
if (mesh_) renderMesh(mesh_, world_);
for (int i = 0; i < numChildren_; i++)
{
children_[i]->render(world_, dirty);
}
}
There's a subtle assumption here that the
if
check is faster than a matrix multiply. Intuitively, you would think it is; surely testing a single bit is faster than a bunch of floating point arithmetic.
这里有一个微妙的假设,
if
检查要比矩阵乘法快。这是一个直观的想法;单bit测试要比一批浮点数计算快。
However, modern CPUs are fantastically complex. They rely heavily on pipelining -- queueing up a series of sequential instructions. A branch like our
if
here can cause a branch misprediction and force the CPU to lose cycles refilling the pipeline.
然而,现代CPU十分复杂,它们严重依赖流水线操作——一系列的操作指令队列。像我们这里的一份
if
分支会导致分支预测错误强制CPU丢失流水线的循环填充。
The Data Locality chapter has more about how modern CPUs try to go faster and how you can avoid tripping them up like this.
Data Locality 这一节有更多关于现代CPU是如何加快运行和你如何避免像这样妨碍它快速运行的细节。
This is similar to the original naïve implementation. The key changes are that
we check to see if the node is dirty before calculating the world transform and
we store the result in a field instead of a local variable. When the node is
clean, we skip combine()
completely and use the old-but-still-correct world_
value.
这和之前的原始实现很相似。关键的变动在于在计算世界变换之前,我们先检查脏位,并且我们将结果保存在成员中而不是局部变量中。当节点没有改动时,我们完全跳过combine()
,使用老的但是仍然正确的world_
值。
The clever bit is that dirty
parameter. That will
be true
if any node above this node in the parent chain was dirty. In much the
same way that parentWorld
updates the world transform incrementally as we
traverse down the hierarchy, dirty
tracks the dirtiness of the parent chain.
聪明的位就是dirty
参数。如果父链上他之上的任何物体标记为脏,它将被置为true
。在我们递归的时候通过相同的方式通过parentWorld
渐进的更新世界变换。dirty
跟踪父链是否为脏。
This lets us avoid having to recursively mark each child's dirty_
flag
in setTransform()
. Instead, we pass the parent's dirty flag down to its
children when we render and look at that too to see if we need to recalculate
the world transform.
这让我们避免在'setTransform()'中递归的标记每个子节点的‘dirty_’位。相反,我们在渲染是传递父节点的 dirty为到他的子节点中,并查看它来确认是否有需要重新计算世界变换。
The end result here is exactly what we want: changing a node's local transform is just a couple of assignments, and rendering the world calculates the exact minimum number of world transforms that have changed since the last frame.
最终结果就是我们想要的:修改一个节点的本地变换只是几条赋值语句,渲染世界只计算了最少的变动的世界变换。
Note that this clever trick only works because
render()
is the only thing inGraphNode
that needs an up-to-date world transform. If other things accessed it, we'd have to do something different.
注意,这个聪明的技巧能工作是因为
render()
是GraphNode
中 唯一 需要实时世界坐标的操作。如果其他 操作访问它,我们需要做一些不同的操作。
This pattern is fairly specific, so there are only a couple of knobs to twiddle:
这个模式是想的特定的,所以只需要注意几点:
-
When the result is needed:
-
当需要计算结果时
-
It avoids doing calculation entirely if the result is never used. For primary data that changes much more frequently than the derived data is accessed, this can be a big win.
-
当结果重不使用时,它完全避免了计算当原数据变动的平率远大于衍生数据访问的频率是,优化 效果显著。
-
If the calculation is time-consuming, it can cause a noticeable pause. Postponing the work until the player is expecting to see the result can affect their gameplay experience. It's often fast enough that this isn't a problem, but if it is, you'll have to do the work earlier.
-
如果计算十分耗时,会造成明显的卡顿把工作推迟到玩家需要查看结果是才做会影响游戏体验。通常 计算足够快捷而没什么问题,但是一旦计算十分,最好提前开始计算。
-
-
At well-defined checkpoints:
-
在精心设定的检查点
Sometimes, there is a point in time or in the progression of the game where it's natural to do the deferred processing. For example, we may want to save the game only when the pirate sails into port. Or the sync point may not be part of the game mechanics. We may just want to hide the work behind a loading screen or a cut scene.
有时,在游戏过程中有一个时间点十分适合做延时计算工作。举个例子,我们可能只想在船靠岸时才存档。或者 存档点就是游戏机制的一部分。我们可能在一个加载界面或者一个截屏下做这些工作。
-
Doing the work doesn't impact the user experience. Unlike the previous option, you can often give something to distract the player while the game is busy processing.
-
在不影响用户体验下工作不同之前的考虑,当游戏忙于处理时你可以通知玩家。
-
You lose control over when the work happens. This is sort of the opposite of the earlier point. You have micro-scale control over when you process, and can make sure the game handles it gracefully.
What you can't do is ensure the player actually makes it to the checkpoint or meets whatever criteria you've defined. If they get lost or the game gets in a weird state, you can end up deferring longer than you expect.
-
当工作执行时,你失去了控制权 这和之前一点有些相反。在处理时,你有轻微的控制,保证游戏优雅的处理。
你不能确保玩家实际上到达检查点,或者达到任何你设定的标准。如果他们丢失或者游戏进入了奇怪的状态,你可以将 预期的操作进一步延迟。
-
-
In the background:
-
在后台
Usually, you start a fixed timer on the first modification and then process all of the changes that happen between then and when the timer fires.
通常,你可以在最初变动的时候启动一个固定的计时器,并计算计时器到达之间的所有变动。
The term in human-computer interaction for an intentional delay between when a program receives user input and when it responds is hysteresis.
术语滞后 是人机交互中,故意将用户的输入和计算机响应推迟一段时间。
-
You can tune how often the work is performed. By adjusting the timer interval, you can ensure it happens as frequently (or infrequently) as you want.
-
你可以调整工作执行的频率。 通过调整定时器的间隔,你可以按照你想要的频率处理。
-
You can do more redundant work. If the primary state only changes a tiny amount during the timer's run, you can end up processing a large chunk of mostly unchanged data.
-
你可以做更多冗余的工作。 如果在定时器期间原始的改动很少,你可以中断处理很多没有修改的数据。
-
You need support for doing work asynchronously. Processing the data "in the background" implies that the player can keep doing whatever it is that they're doing at the same time. That means you'll likely need threading or some other kind of concurrency support so that the game can work on the data while it's still being played.
Since the player is likely interacting with the same primary state that you're processing, you'll need to think about making that safe for concurrent modification too.
-
需要要进行异步操作。 在后台处理数据意味着玩家可以同事做他正在做的事情。这意味着你需要线程或者其他并发支持,以便你能够在游戏进 行时处理数据。
-
Imagine our pirate game lets players build and customize their pirate ship. Ships are automatically saved online so the player can resume where they left off. We're using dirty flags to determine which decks of the ship have been fitted and need to be sent to the server. Each chunk of data we send to the server contains some modified ship data and a bit of metadata describing where on the ship this modification occurred.
想象一下我们的海盗游戏允许玩家建造和定制他们的海盗船。船会自动线上保存以便在玩家离线之后能恢复。我们 使用脏标记来决定船的那一部分被改动了需要发送到服务器。每一份我们发送给服务器的数据包含了一些船的数据改动和 一份元数据描述这份改动是在什么地方变动的。
-
If it's more fine-grained:
-
更精细的粒度:
Say you slap a dirty flag on each tiny plank of each deck. 你将夹板上的每一份小木块加上脏标记。
-
You only process data that actually changed. You'll send exactly the facets of the ship that were modified to the server.
-
你只需要处理真正变动了的数据 你将船的真正变动发送给服务器
-
-
If it's more coarse-grained:
-
更粗糙的粒度:
Alternatively, we could associate a dirty bit with each deck. Changing anything on it marks the entire deck dirty.
另外,我们可以为每一个甲板关联一个脏标记。在它之上的每份改动将整个甲板标记为脏。
I could make some terrible joke about it needing to be swabbed here, but I'll refrain.
-
You end up processing unchanged data. Add a single barrel to a deck and you'll have to send the whole thing to the server.
-
你最终需要处理未变动的数据 在甲板上放置一个酒桶,你需要把整个东西发送给服务器。
-
Less memory is used for storing dirty flags. Add ten barrels to a deck and you only need a single bit to track them all.
-
存储脏标记消耗更少的内存 添加10个酒桶在甲板上只需要一个位来跟踪他们。
-
Less time is spent on fixed overhead. When processing some changed data, there's often a bit of fixed work you have to do on top of handling the data itself. In the example here, that's the metadata required to identify where on the ship the changed data is. The bigger your processing chunks, the fewer of them there are, which means the less overhead you have.
-
固定开销花费更少的时间 当处理修改后的数据时,通知有一套固定的流程要预先处理这些数据。在这个例子中, 就是标识船上那出地方改动了的数据。处理越大块,标识就越少,通用开支就少。
-
-
This pattern is common outside of games in browser-side web frameworks like Angular. They use dirty flags to track which data has been changed in the browser and needs to be pushed up to the server.
-
这种模式在游戏外的领域也是常见的,比如像Angular这种BS框架。他利用赃标记来跟踪浏览器中变动 的数据以及需要提交到服务端的数据。
-
Physics engines track which objects are in motion and which are resting. Since a resting body won't move until an impulse is applied to it, they don't need processing until they get touched. This "is moving" bit is a dirty flag to note which objects have had forces applied and need to have their physics resolved.
-
物理引擎跟踪着那些物理在运动那些空闲。应为一个空闲的物理知道收到力的作用才会移动,他在受力 之前不需要处理。这个“是否在移动”是一个赃位,来标记那些物体收到了力的作用并需要计算他们的 物理状态。