Novel Method for Video Generation Using Feature-Space Optical Flow Mapping? #25
Replies: 1 comment
-
You can also apply Bernoulli's principle from fluid mechanics to pixels of a video and the optical flow. Larger areas tend to have smaller overall motion than smaller areas do. I think we can use this concept in tandem with object detection methods as part of an iterative quality check process. Essentially if the areas recognized as larger structures like a head, body, etc; have a larger magnitude of optical flow than areas recognized as a mouth, eyes, etc, we can assume that the flow is turbulent, incoherent, and the resulting video will not give us the result that we're looking for, so we would need to iterate again. |
Beta Was this translation helpful? Give feedback.
-
This isn't something that would be possible for me to do, both due to hardware limitations, and experience with these things, but after doing a lot of research trying to figure out temporal coherence using existing tools, I had an idea that may (or may not) be feasible.
I haven't seen any papers regarding this idea, so I figure I might as well share it.
Ideally, you would be able to take a small number of target (style) frames and input video frames, and then use that to make your whole scene.
Background
This approach is reminiscent of the traditional technique of "keyframe animation" used by animators.
However, this approach involves using machine learning to learn a mapping from the input video to a stylized version based on keyframes. The process of figuring out what the style of the in-between frames should be is akin to the process animators go through when creating in-between frames.
The optical flow provides information about how objects in the scene are moving from frame to frame, analogous to how an animator needs to understand the motion of characters or objects between keyframes.
The training of a CNN to learn the mapping from input optical flow to stylized optical flow is similar to how an animator learns to draw the motion of characters or objects over time.
The idea of extending this concept by computing the optical flow in feature space, at multiple levels of abstraction, could be seen as a high-tech version of an animator understanding the motion of a scene at different levels of detail.
A Bit of Extra Details I came up with:
Super Simple Outline;
Essentially, you'd take the optical flow between target keyframes, the optical flow between the corresponding input frames, and use that as a way to create some sort of mapping between the two and use that information to help inform the creation of new target frames, leading to a full video after iteration.
You could (albeit SUUUUPER SIMPLY) think of this as:
[&] = Generic placeholder for operands.
StyleFlow(a->x, x<-a) = Optical Flow between frame a and x, and backward flow from x to a, where a is the keyframe we have access to, and x is the frame we need. (or in the correspondence calculation case, the next keyframe)
InputFlow(b->y, y<-b) = Optical Flow between frame b and y, and backward flow from y to b, where b is the input keyframe at the same time as a, and y is the keyframe for x, which may be unknown.
MapRatio = The Machine learned correlation. Think of it as a coefficient.
We would then need to solve for the 'unknown' frame, which I would see being almost like a differential equation problem.
In a differential equation, you're given the derivative of a function (analogous to the optical flow, which shows how the video changes from frame to frame) and you need to find the function that satisfies this derivative (analogous to generating the sequence of frames that results in the given optical flow).
Where M is the mapped correlation, and x is the new frame we're trying to create.
Find the root of the equation E(x) = 0 using numerical methods like Newton-Raphson:
x
E(x)
at the currentx
E(x)
with respect tox
x = x - E(x) / E'(x)
E(x)
approaches zero or reaches a satisfactory tolerance level.Continue the iterations until a solution is found that minimizes the difference between the two sides of the equation.
Anyways, hopefully someone finds this interesting, and maybe it has at least some degree of validity to it!
Beta Was this translation helpful? Give feedback.
All reactions