Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@astrojs/image: Don't include original image in output bundle #4961

Closed
1 task
jkjustjoshing opened this issue Oct 3, 2022 · 16 comments
Closed
1 task

@astrojs/image: Don't include original image in output bundle #4961

jkjustjoshing opened this issue Oct 3, 2022 · 16 comments
Labels
- P2: nice to have Not breaking anything but nice to have (priority) pkg: image Related to the `@astrojs/image` package (scope)

Comments

@jkjustjoshing
Copy link
Contributor

What version of astro are you using?

1.2.5

Are you using an SSR adapter? If so, which one?

No

What package manager are you using?

npm

What operating system are you using?

Mac

Describe the Bug

I'm not sure if this is a bug or a feature request - my apologies if I should have submitted this to github.com/withastro/rfcs

When using @astrojs/image with the <Picture> component and widths={} prop, multiple different sizes of images end up in my ./dist/assets folder corresponding to the values I pass widths. However, there is also a copy of the original full-sized image.

This full-sized image is not referenced in the HTML at all, and is never downloaded by the site user, so I have no concerns for user-experience with an oversized image. However, including this full-size image in the bundle makes deploys slower and can increase hosting costs.

Suggested change

Remove full-size image from output directory

Alternatative

Setting in integration config options for removing the full-size image.

Link to Minimal Reproducible Example

Not applicable (but I can provide one if anyone disagrees)

Participation

  • I am willing to submit a pull request for this issue.
@matthewp matthewp added the - P2: nice to have Not breaking anything but nice to have (priority) label Oct 4, 2022
@delucis
Copy link
Member

delucis commented Oct 4, 2022

I think this was added at the request of people who did need the full size image as well (to reference as an open graph image or use as a download link or something). So I guess a config option that lets you opt out might make sense (or opt in, not sure what makes most sense)

@jkjustjoshing
Copy link
Contributor Author

That's what I figured. However, the original image has a hashed filename - how would someone write code to actually expose the URL of the full-sized image to the user?

If it is possible to get the path to the full-sized image, then I agree that it's good to keep in the output folder (with an option to exclude it). However, if it's not possible then I would say either a) there's no value in having it or b) there should be a way to expose the original file's path to the developer.

@panwauu
Copy link
Contributor

panwauu commented Oct 5, 2022

Possible duplicate of #4896

@tony-sull
Copy link
Contributor

The original image actually does need to be included here in case an image is imported and the src is used directly.

Astro won't actually know what image variants are used until every page is built, and it won't actually know at all in SSR since the pages are built at run time.

It's a bit of an implementation detail and a little annoying to see extra files included here, but there's not a way to safely avoid it either unfortunately

---
import hero from '../hero.jpg'
---

<!-- this src can be used as-is, but it will fail if the original file isn't included in dist -->
<img src={hero.src} />

@jkjustjoshing
Copy link
Contributor Author

Ah that makes sense. I wasn't thinking about using the image in a raw/native <img> element without using <Image> or <Picture>. Thanks for the context!

@supermoos
Copy link

supermoos commented Oct 19, 2022

@tony-sull could a workaround be to have some sort of import parameter that would exclude it from the output? something like:

---
import hero from '../hero.jpg!discard'
---

It's a big issue as it defeats the purpose for a lot of image generation scenarios to output the original source also.

@supermoos
Copy link

Or perhaps some sort of post build way to delete those files again?

@jkjustjoshing
Copy link
Contributor Author

jkjustjoshing commented Oct 19, 2022

@supermoos the naming convention of the files is regular, so you probably could write a script post-build that looks for the original file. It would be pretty fragile, but could do something like the following:

Look for files matching the regex /^([^.]+)\.([0-9a-f]+)\.([a-z]+)$/ - that is, "xxxx.abc123.jpg". This is an original (all non-original files have an underscore in the segment before the file extension). $1 is the first part of the path, $2 is the unique content hash, and $3 is the file extension. Before deleting it would be smart to make sure there exist optimized versions of this file - files that follow the pattern $1.$2_[anything alphanumeric].[any extension].

image

@supermoos
Copy link

Thanks for this. Seems a bit error prone though. What if I actually wanna keep some of the source files for some reason? The import suggestion would solve that issue.

@simonwiles
Copy link

Fwiw, this causes me a problem on a very image-heavy site, as my originals are large and when they're copied into the output folder I end up with something that's too big to upload to my hosting environment.

@rikur
Copy link

rikur commented Jul 26, 2024

I get a lot of noise on aHrefs site audit, because the crawler will download the original images that are way too big.

@robinwatts96
Copy link

Hi, this is a really useful thread. I'm having a somewhat unrelated issue, but this thread seems like the best place I could find help.

I want to get the final path to the images in the _astro folder dynamically in my head element. I need to do this before the build process, not on the client. It would be easy enough if I could do this on the client...

I want to set the og:image property in the head, but the value I set won't function as intended if set it on the client with JS, as Google scrapes your page pre any JS you add on the client.

So if there was a way for me to guess the abc123 part of the file path: "xxxx.abc123.jpg" then I think i'd be able to work the rest out.

Any help with this would be greatly appreciated!

@supermoos the naming convention of the files is regular, so you probably could write a script post-build that looks for the original file. It would be pretty fragile, but could do something like the following:

Look for files matching the regex /^([^.]+)\.([0-9a-f]+)\.([a-z]+)$/ - that is, "xxxx.abc123.jpg". This is an original (all non-original files have an underscore in the segment before the file extension). $1 is the first part of the path, $2 is the unique content hash, and $3 is the file extension. Before deleting it would be smart to make sure there exist optimized versions of this file - files that follow the pattern $1.$2_[anything alphanumeric].[any extension].

image

@delucis
Copy link
Member

delucis commented Aug 8, 2024

@robinwatts96 you can get the final output path by importing the image:

---
import cover from '../cover.jpg';
---

<meta name="og:image" content={new URL(cover.src, Astro.site)}>

If you need to do this more dynamically, this docs page may help: https://docs.astro.build/en/recipes/dynamically-importing-images/

Otherwise please do jump into our Discord chat where people are always happy to provide support: https://astro.build/chat

@robinwatts96
Copy link

@delucis This is exactly what I needed - thank you so much, I appreciate your help.

@wtchnm
Copy link
Contributor

wtchnm commented Aug 29, 2024

@supermoos the naming convention of the files is regular, so you probably could write a script post-build that looks for the original file. It would be pretty fragile, but could do something like the following:

Look for files matching the regex /^([^.]+)\.([0-9a-f]+)\.([a-z]+)$/ - that is, "xxxx.abc123.jpg". This is an original (all non-original files have an underscore in the segment before the file extension). $1 is the first part of the path, $2 is the unique content hash, and $3 is the file extension. Before deleting it would be smart to make sure there exist optimized versions of this file - files that follow the pattern $1.$2_[anything alphanumeric].[any extension].

image

For those who don't use imported images directly in plain <img /> tags, here's an Astro integration that removes the original images from the build:

// In Astro 4, unused images are removed from the build, but some original images may still remain in the _astro folder.
const removeOriginalImages: AstroIntegration = {
	name: 'remove-original-images',
	hooks: {
		'astro:build:done': async ({ dir }) => {
			const path = dir.pathname + '_astro/'
			const files = await fs.readdir(path)
			for (const file of files) {
				const parts = file.split('.')
				const ext = parts.pop()
				if (ext && ['jpg', 'png'].includes(ext)) {
					const hash = parts.pop()
					if (hash && !hash.includes('_')) {
						await fs.unlink(path + file)
					}
				}
			}
		}
	}
}

@tenpaMk2
Copy link

Thank you for sample codes.
I made improved version.

  • Add supports for jpeg and webp .
  • Detect _ only in hash strings.
import fs from "node:fs/promises";
import path from "node:path";

...

    {
      name: "remove-original-images",
      hooks: {
        "astro:build:done": async ({ dir }) => {
          const astroDir = path.join(dir.pathname, `_astro/`);
          const files = await fs.readdir(astroDir);

          for (const file of files) {
            const { name, ext } = path.parse(file);
            const { ext: hashStr } = path.parse(name);

            if (!ext) continue;
            if (!hashStr) continue;
            if (![`.jpg`, `.jpeg`, `.png`, `.webp`].includes(ext)) continue;
            if (hashStr.includes(`_`)) continue;

            console.log(`Removing original image: ${file}`);
            await fs.unlink(path.join(astroDir, file));
          }
        },
      },
    },

Optimized image seems to have {name}.{original hash}_{additinal hash}.{ext} .
However, sadly, some {original hash} contain _ .
Therefore, the above scripts wrongly detect these images as optimized and will not remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- P2: nice to have Not breaking anything but nice to have (priority) pkg: image Related to the `@astrojs/image` package (scope)
Projects
None yet
Development

No branches or pull requests