Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: does not support Astro v5 content collections that are not immediate children of <srcDir>/content #74

Open
techfg opened this issue Dec 13, 2024 · 2 comments

Comments

@techfg
Copy link
Contributor

techfg commented Dec 13, 2024

Astro v5 introduced the Content Layer which allows a content collection to live anywhere. Prior to v5, or when using legacy collections in v5, content collections were required to be placed as immediate children of <srcDir>/content.

With content collections now able to live anywhere, this causes an issue with the plugin as its no longer able to reliably resolve relative paths unless the content collection is an immediate child of <srcdir>/content.

Repro: https://stackblitz.com/edit/github-pdzrjo-nm7axy9t

Steps to Reproduce

  1. Open repro
  2. Under Docs Collection section, click on Docs Root Test link
  3. Click any of the links in the 'Docs Collection' and Docs Collection Subdir sections - they will all correctly resolve to /docs/<path> urls as expected because, despite using Content Layer, they are physically located in src/content/docs.
  4. Click on Site Index at bottom of page
  5. Under Docs New Collection section, click on Docs New Root Test link
  6. Click any of the links in the Docs Collection and Docs Collection Subdir sections - they will all resolve to the .md file directly and the path will not be resolved correctly because they are located in ./docsnew

The reason for the issue is that the plugin assumes that all content collections are located in <srcDir>/content and derives the content collection name based on that assumption. This assumption was valid prior to v5 (assuming not using legacy configuration).

With content collections able to live anywhere, a change needs to be made to fully support v5.

Currently, as long as content collections live in <srcDir>/content/<collectionDir>, the plugin works with the new Content Layer in v5 so the issue only effects content collections that are not immediate children of <srcDir>/content.

In order to support this situation, there are a few options:

  1. Do not support collections that are not immediate children of <srcDir>/content - This is not ideal but technically an option and would be documented as a limitation of the plugin.
  2. Provide content directory via configuration parameter - This is similar to option 1 with the only difference being that the content directory would be defined via a configuration parameter with a default of src/content. In earlier versions of this plugin, there was a configuration parameter contentPath, however it was removed in favor of srcDir because collections had to live in <srcDir>/content so contentPath & collectionPathMode were replaced with srcDir (to align with Astro's own configuration properties) and collections. In short, we would remove srcDir and add contentDir configuration option. The requirement though, similar to option 1, is that all collections must live in an immediate child directory of contentDir. While this does provide some flexibility over option 1 (content dir could be anywhere vs. always having to be /content`, its still not ideal since all content collections would have to share a common parent which is no longer a requirement of Astro v5.
  3.  Collection Root Marker - Require that the root folder of every content collection have some type of marker file (e.g., .astro-rehype-rml). The approach to resolving the "collection name" would walk the file system from the "current markdown file" up until it finds the marker file. Once found, it would use that directory as the collection root. This is not an ideal way to solve since it requires extra files that are otherwise meaningless, but it should work.
  4. Extend collections Config - Currently, the collections configuration is completely optional and only needed when changing the base or name behavior for a collection. By adding a contentDir property that is required for every collection that would like relative path resolution, when a markdown file is being processed, the collections configuration would be walked for every entry and then using the contentDir property, it see if the current file path starts with contentDir. Once found, the root of the content dir would be known.
collections: {
   blog: {
      contentDir: './mycontent/blog'
  1. Modify collections config - Currently, the key to this configuration property is the physical directory name on disk of the collection under <srcDir>/content. This could be changed such that the key becomes the path instead of just the directory name and every collection that should have its markdown resolved must have an entry. It would mirror the value of base that is passed to default glob loader. From there, its essentially equivalent to option 4 where the keys would be walked to find a match to determine the name of the collection. One note here is that we could leave the configuration the way it is and default to always using src/content for contentDir if an entry in collections isn't found. This would avoid having to specify every collection explicitly but come at the cost of having different ways to resolve collection names.
collections: {
  'mycontent/blog': {}
}
  1. Require collections entry for every collection that requires markdown resolution - For every markdown based content collection, require an entry in collections (could just be an empty object if no overrides are needed). We would then use the keys of collection to call getCollection API to retrieve all entries for all collections. The collection entry returned contains a filePath property that could be used to compare against current file to find the collection the file is a part of. This has a couple of shortcomings: 1) filePath is not a publicly documented property of CollectionEntry so even though it appears to always be returned for markdown files, it may not in the future; 2) the lookup would have to occur on every markdown file processed which could impact performance, especially on larger sites
  2. As mentioned in feat: create Astro integration wrapper for plugin #45, there is the idea of changing this plugin to be a full Astro integration that contains the rehype plugin instead of just a rehype plugin. Doing this would allow us to obtain the Astro config (via the astro:config:done hook) to get the srcDir and other config values. In reviewing the current Integrations API, I'm not seeing an obvious hook that would allow us to contain collection information, but possibly there is a way to get that or something that would allow us to dynamically resolve collections to collection dirs and avoid having user config for the plugin directly that would map a collection to a collection dir like the above ideas and and as we do currently. This would take further investigation but may be worth pursuing.

The above are just some initial ideas on how we solve for this. Although none are ideal, I believe all would work.

@vernak2539 - Let me know your thoughts and/or other ideas on how you think we could solve for this. if I were to lean one way, I'd lean Option 5. Also, #26 becomes extremely relevant with Astro v5 so I think making a decision there, then making a decision here likely would be best. Thanks!

@vernak2539
Copy link
Owner

Thanks @techfg , lots to think about that's for sure!

if I were to lean one way, I'd lean Option 5.

I'd actually lean towards this option as well.

One note here is that we could leave the configuration the way it is and default to always using src/content for contentDir if an entry in collections isn't found. This would avoid having to specify every collection explicitly but come at the cost of having different ways to resolve collection names.

This, IMO, is the important part. If people choose to continue to use <srcDir>/content by default, I'd like that to be as easy as possible to support (i.e. if I upgrade a current usage to a new version with this functionality, I'd like to make no changes to my configuration, leaving "content/blog" as "blog")

@techfg
Copy link
Contributor Author

techfg commented Dec 19, 2024

Yeah, the more I've thought about this, Option 5 seems the path forward and agreed that having a default of src/content would simplify required config. However, how we do Option 5 really comes down to a decision from #26.

Let's start with a decision there and then circle back to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants