Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix and improve: VAE tiling #372

Merged
merged 3 commits into from
Aug 27, 2024
Merged

Conversation

Green-Sky
Copy link
Contributor

@Green-Sky Green-Sky commented Aug 26, 2024

  • properly handle the upper left corner interpolating both x and y
  • use smootherstep to preserve more detail and spend less area blending

Completely fixes tiling seams!
Thanks to @stduhpf

fixes #353.

Old Details

Here is a test image, which replaces each tile with a single uniform shade of gray. The lines are the midway points of the overlap.

Before:

image

After:

image

And now with a proper test image.

Before:

image

After:

image

Without tiling:

image

IMO a significant improvement. There are however still vertical seams. I am a bit out of ideas now, so I wanted to contribute what I already have.

Some more details can be found in #353 .

This is ready to merge.

Before:

image

After:

artius_output

- properly handle the upper left corner interpolating both x and y
- refactor out lerp
- use smootherstep to preserve more detail and spend less area blending
@stduhpf
Copy link
Contributor

stduhpf commented Aug 26, 2024

I played around a bit with the interpolation code, and ended up with this. It seems to be almost flawless, except for the right and bottom borders that get washed out:

// Would be great to have acess to the max values for x and y
  const float x_f_0 = x>0 ? ix / float(overlap) : 1;
  const float x_f_1 = (width - ix) / float(overlap);
  const float y_f_0 = y>0 ? iy / float(overlap) : 1;
  const float y_f_1 = (height - iy) / float(overlap);

  const float x_f = std::min(x_f_0,x_f_1);
  const float y_f = std::min(y_f_0,y_f_1);

  ggml_tensor_set_f32(
      output,
      old_value +  new_value * ggml_smootherstep_f32(y_f<1 ? y_f : 1) * ggml_smootherstep_f32(x_f<1 ? x_f : 1),
      x + ix, y + iy, k
  );

output
output

@stduhpf
Copy link
Contributor

stduhpf commented Aug 26, 2024

Ok I got it! I could get the image width and height from the output tensor

// unclamped -> expects x in the range [0-1]
__STATIC_INLINE__ float ggml_smootherstep_f32(const float x) {
    GGML_ASSERT(x >= 0.f && x <= 1.f);
    return x * x * x * (x * (6.0f * x - 15.0f) + 10.0f);
}

__STATIC_INLINE__ void ggml_merge_tensor_2d(struct ggml_tensor* input,
                                            struct ggml_tensor* output,
                                            int x,
                                            int y,
                                            int overlap) {
    int64_t width    = input->ne[0];
    int64_t height   = input->ne[1];

    int64_t img_width    = output->ne[0];
    int64_t img_height   = output->ne[1];

    int64_t channels = input->ne[2];
    GGML_ASSERT(input->type == GGML_TYPE_F32 && output->type == GGML_TYPE_F32);
    for (int iy = 0; iy < height; iy++) {
        for (int ix = 0; ix < width; ix++) {
            for (int k = 0; k < channels; k++) {
                float new_value = ggml_tensor_get_f32(input, ix, iy, k);
                if (overlap > 0) {  // blend colors in overlapped area
                    float old_value = ggml_tensor_get_f32(output, x + ix, y + iy, k);

                    const float x_f_0 = x>0 ? ix / float(overlap) : 1;
                    const float x_f_1 = x<(img_width - width)? (width - ix) / float(overlap) : 1 ;
                    const float y_f_0 = y>0 ? iy / float(overlap) : 1;
                    const float y_f_1 = y<(img_height - height)? (height - iy) / float(overlap) : 1;

                    const float x_f = std::min(x_f_0,x_f_1);
                    const float y_f = std::min(y_f_0,y_f_1);
                    ggml_tensor_set_f32(
                        output,
                        old_value +  new_value * ggml_smootherstep_f32(y_f<1?y_f:1)*ggml_smootherstep_f32(x_f<1?x_f:1),
                        x + ix, y + iy, k
                    );
                }else{
                    ggml_tensor_set_f32(output, new_value, x + ix, y + iy, k);
                }
            }
        }
    }
}

output

@Green-Sky
Copy link
Contributor Author

Green-Sky commented Aug 26, 2024

Why does this work?

old_value + new_value * ggml_smootherstep_f32(y_f) * ggml_smootherstep_f32(x_f),

This is not a lerp, it is adding the new value x[0,1], right?

(btw totally forgot you can multiply values in the [0,1] just like that, 😄 )

@stduhpf
Copy link
Contributor

stduhpf commented Aug 26, 2024

Why does this work?

old_value + new_value * ggml_smootherstep_f32(y_f) * ggml_smootherstep_f32(x_f),

This is not a lerp, it is adding the new value x[0,1], right?

(btw totally forgot you can multiply values in the [0,1] just like that, 😄 )

Assuming the following tiles:
A B
C D

If you ignore the smoothersteps, the values of the output tensor in the middle overlap take these succesive values:
((x,y) are the coordinates within the quadruple overlap)

  • (0) + A * (1-x)*(1-y)
  • (A * (1-x)*(1-y)) + B * x * (1.-y) = lerp(A, B, x)*(1.-y)
  • (lerp(A, B, x)*(1.-y)) + C * (1.-x)*y
  • (lerp(A, B, x)*(1.-y) + C * (1.-x)*y) + D * x * y = lerp(A, B, x)*(1.-y) + lerp(C, D, x)*y = lerp(lerp(A, B, x), lerp(C, D, x), y)

image

If you're on an edge instead of a corner, either x or y is clamped to 0 or 1.

@Green-Sky
Copy link
Contributor Author

Yea I get that (thanks for the effort), but what I meant was, we are summing the color here.
color0 + color1 * f;
So we are not interpolating between them, something most be funny somewhere still.
I am about to push your code, adding you as a coauthor.

@stduhpf
Copy link
Contributor

stduhpf commented Aug 26, 2024

It's equivalent to a bilinear interpolation. Just done in multiple steps because we don't have access to all the values at the same time

@Green-Sky
Copy link
Contributor Author

Pushed, should be ready to merge. 🎉

ggml_extend.hpp Outdated Show resolved Hide resolved
@leejet
Copy link
Owner

leejet commented Aug 27, 2024

Thank you for your contribution.

@leejet leejet merged commit e71ddce into leejet:master Aug 27, 2024
9 checks passed
@stduhpf
Copy link
Contributor

stduhpf commented Aug 27, 2024

Thanks for the merge.
I realize now that my code isn't quite optimal (variables are assigned inside the inner loop when they could very well be assigned outside), but the impact doesn't seem noticable (maybe the compiler optimizes it?)

I'll do some benchmarking to see if it's worth making a pr for optimizing this loop.

@stduhpf
Copy link
Contributor

stduhpf commented Aug 27, 2024

I can gain a bit over 40ms (out of 18s, so 0.22% difference) for 56 tiles by optimizing this out. It's measurable, but not really significant.

@Green-Sky
Copy link
Contributor Author

Yea, I thought about optimizing it too, but then i remember that simplicity is key, and this code is not using enough compute to bother. :)

stduhpf added a commit to stduhpf/stable-diffusion.cpp that referenced this pull request Nov 1, 2024
* fix and improve: VAE tiling
- properly handle the upper left corner interpolating both x and y
- refactor out lerp
- use smootherstep to preserve more detail and spend less area blending

* actually fix vae tile merging

Co-authored-by: stduhpf <[email protected]>

* remove the now unused lerp function

---------

Co-authored-by: stduhpf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug? --vae-tiling produces very obvious seams/artifacts near the edge of each "tiles"
3 participants