Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: rendering issues for migrated content #1371

Merged
merged 4 commits into from
Mar 27, 2024

Conversation

HussainTaj-arbisoft
Copy link
Contributor

@HussainTaj-arbisoft HussainTaj-arbisoft commented Mar 13, 2024

What are the relevant tickets?

Relates to https://github.com/mitodl/hq/issues/3678

Required for mitodl/ocw-studio#2130

Description (What does it do?)

In https://github.com/mitodl/hq/issues/3678, we will migrate old external link data to the new external resource type content. I expect a few issues to arise so I'm adding these fixes proactively.

Please see individual comments below for details of each issue this PR solves.

How can this be tested?

  1. Grab this course content and place it inside your local content directory.
  2. Navigate to your local ocw-hugo-themes project directory.
  3. Checkout the branch hussaintaj/3678-migrate-data-fixes.
  4. Run
    yarn start course ht-test-2024
    
  5. Open http://localhost:3000/pages/external-links/.
  6. Notice the links.
  7. Expect something similar to:
    Screenshot 2024-03-25 at 4 19 29 PM
  8. Open http://localhost:3000/.
  9. Notice the course description.
  10. Expect to see a working external resource link inside the description.

@github-actions github-actions bot temporarily deployed to pull request March 13, 2024 11:40 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 13, 2024 11:40 Inactive
Copy link

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot temporarily deployed to pull request March 21, 2024 11:12 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 21, 2024 11:12 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 25, 2024 11:15 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 25, 2024 11:15 Inactive
{{- $text := default .title .text | htmlEscape | page.RenderString -}}
{{- $text := default .title .text -}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

htmlEscape is not required, rather it should be avoided. This is because Studio renders escaped content where required.

RenderString is unnecessary because markdown is rendered automatically in the relevant shortcode.

Comment on lines +6 to +8
{{- if findRE "^(?:http|https|ftp|mailto)" $title -}}
{{- $title = (replace $title "/" "\\/") -}}
{{- end -}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shortcode creates a new anchor tag whenever it sees a URL in the text.

So a markdown of

<a href="https://google.com">https://google.com</a>

produces two anchor tags.

<a href="https://google.com"></a>
<a href="https://google.com">https://google.com</a>

I was unable to figure out why this is the case. My best guess is that the shortcode's output is run through an additional markdown renderer.

Escaping the URL fixes the problem. Though a better solution might still exist.

Copy link
Collaborator

@gumaerc gumaerc Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look into this and I think I found an explanation. Although, I should preface this with the fact that I was able to revert resource_link.html to the old version without seeing the two anchor tags like you mentioned. Anyway, here's what I think is going on:

  1. The content calls the shortcode with the % delimiters that tell Hugo that the shortcode will produce markdown that needs to be processed, i.e. {{% resource_link "f24cc1d7-0985-4902-bf27-f9fc8a03e9b4" "Google" %}}
  2. The shortcode calls {{- partial "external_resource_link" (merge .Params (dict "text" $title)) -}}, which calls a partial and not a shortcode
  3. The external_resource_link partial then gathers all the properties of the link and calls {{- partial "link" (dict "href" $href "text" $text "class" $className "onClick" $onClick "target" "_blank") -}}
  4. The link.html partial builds and returns an HTML link

Hugo then goes to render the page, and since the original shortcode was called with % delimiters, it assumes that it is returning markdown. Therefore, the raw HTML link generated by link.html is processed as markdown, resulting in the behavior you saw. I think in general, it is problematic to have a shortcode that can generate markdown through one code path, and HTML through another. The link.html partial was created to be able to create links in templates that set more than just the href and inner text, rather than the limited syntax of markdown links which only let you set those properties. I can think of two solutions here:

  1. Change the else block in resource_link.html to return a raw HTML link and change all calls to resource_link in content to use < / > delimiters, something like:
  {{- else -}}
    <a href="{{- .Permalink -}}{{- $anchor_id -}}">{{ $title }}</a>
  {{- end -}}
  1. Create a separate shortcode, maybe resource_link_external.html and call that partial instead for external links but also with the < / > delimiters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumaerc, I looked at this again and you're exactly right about the cause.

I was hoping to find something that could mark the partial's output as "does not need markdown rendering" but I don't think such a feature exists in Hugo for {{% tags.

From the possible solutions you shared above, we might not be able to rely on 2. Because using a new shortcode might require us to create a new UI flow.

I think 1 is our best option, however, it involves many steps. We'd have to:

  • Update themes
  • Update Studio (i.e. CKEditor syntax because {{< resource_link... is not currently supported)
  • Update content and publish it

This issue with the URL as text is the only glitch I have found. I think we can apply this PR code as a temporary fix while we work on the proper solution. @gumaerc what do you think?

In any case, I'll write up an issue for this.

{{- .context.RenderString (truncate 320 $courseData.course_description) -}}
{{- truncate 320 (.context.RenderString $courseData.course_description) -}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncating description before resolving shortcodes breaks the shortcodes. The order has been changes here to first render content (i.e. shortcodes) then truncate the output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Do you think this will fix this recently reported bug, https://github.com/mitodl/hq/issues/3858 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this was my fault in #876. @pdpinch I was able to confirm that this will fix https://github.com/mitodl/hq/issues/3858.

@HussainTaj-arbisoft HussainTaj-arbisoft marked this pull request as ready for review March 25, 2024 11:39
@gumaerc gumaerc self-assigned this Mar 25, 2024
Copy link
Collaborator

@gumaerc gumaerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the links functioned properly for me at http://localhost:3000/pages/external-links/.

I was not seeing a proper link in the course description as you mentioned:
image

Looking at the course.json in your test course, I saw this at the beginning of the description field:

"course_description": "{{\\< resource\\_link \"be7e0a90-ad6e-4085-80f2-2b6b6f4a77fe\" \"OCW\" >}} etc...

I noticed that the opening angle brace and the underscore in resource_link are escaped. This will cause Hugo to render this as plain text. After removing those escape characters so it looked like this:

"course_description": "{{< resource_link \"be7e0a90-ad6e-4085-80f2-2b6b6f4a77fe\" \"OCW\" >}} etc...

...then I was able to see the link:
image

I believe this is an issue in ocw-studio. Previous to gohugoio/hugo#6703 being resolved and Hugo 0.100.0 being released, it was not possible put shortcodes inside markdown being sent to .RenderString. Since this is now possible (and is the mechanism being used here to render a link in the description), we likely need to look at the character escaping being done on these fields when they're being written out to course.json in ocw-studio.

Comment on lines +6 to +8
{{- if findRE "^(?:http|https|ftp|mailto)" $title -}}
{{- $title = (replace $title "/" "\\/") -}}
{{- end -}}
Copy link
Collaborator

@gumaerc gumaerc Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look into this and I think I found an explanation. Although, I should preface this with the fact that I was able to revert resource_link.html to the old version without seeing the two anchor tags like you mentioned. Anyway, here's what I think is going on:

  1. The content calls the shortcode with the % delimiters that tell Hugo that the shortcode will produce markdown that needs to be processed, i.e. {{% resource_link "f24cc1d7-0985-4902-bf27-f9fc8a03e9b4" "Google" %}}
  2. The shortcode calls {{- partial "external_resource_link" (merge .Params (dict "text" $title)) -}}, which calls a partial and not a shortcode
  3. The external_resource_link partial then gathers all the properties of the link and calls {{- partial "link" (dict "href" $href "text" $text "class" $className "onClick" $onClick "target" "_blank") -}}
  4. The link.html partial builds and returns an HTML link

Hugo then goes to render the page, and since the original shortcode was called with % delimiters, it assumes that it is returning markdown. Therefore, the raw HTML link generated by link.html is processed as markdown, resulting in the behavior you saw. I think in general, it is problematic to have a shortcode that can generate markdown through one code path, and HTML through another. The link.html partial was created to be able to create links in templates that set more than just the href and inner text, rather than the limited syntax of markdown links which only let you set those properties. I can think of two solutions here:

  1. Change the else block in resource_link.html to return a raw HTML link and change all calls to resource_link in content to use < / > delimiters, something like:
  {{- else -}}
    <a href="{{- .Permalink -}}{{- $anchor_id -}}">{{ $title }}</a>
  {{- end -}}
  1. Create a separate shortcode, maybe resource_link_external.html and call that partial instead for external links but also with the < / > delimiters.

{{- .context.RenderString (truncate 320 $courseData.course_description) -}}
{{- truncate 320 (.context.RenderString $courseData.course_description) -}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this was my fault in #876. @pdpinch I was able to confirm that this will fix https://github.com/mitodl/hq/issues/3858.

@HussainTaj-arbisoft
Copy link
Contributor Author

HussainTaj-arbisoft commented Mar 26, 2024

Thanks for the review @gumaerc. I'll work on the feedback.

A small note about the comment below.

Looking at the course.json in your test course, I saw this at the beginning of the description field:

"course_description": "{{\\< resource\\_link \"be7e0a90-ad6e-4085-80f2-2b6b6f4a77fe\" \"OCW\" >}} etc...

You're correct about Studio not supporting shortcodes for these fields.

Did you download the content from RC? I intended for you to use this repo from my org. (I should've been more clear about this in the instructions.)

In practice, we will use {{% resource_link ... %}} shortcodes, which seem to work correctly in Studio. Although, you can't edit them or add new ones because we don't have the supporting UI yet.

Copy link
Collaborator

@gumaerc gumaerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue with the URL as text is the only glitch I have found. I think we can apply this PR code as a temporary fix while we work on the proper solution.

I think this is fine, as long as you create an issue to address the problem at hand.

I was hoping to find something that could mark the partial's output as "does not need markdown rendering" but I don't think such a feature exists in Hugo for {{% tags.

This is the whole point of {{< / >}} vs {{% %}}, by using the angle brace delimiters when you call the shortcode you are telling it exactly that; that you don't need the shortcode to render markdown. It will indeed be a bit of a pain to address, as we'll need to switch the way resource_link is called. We will run into the same issues we always have when trying to make a change that affects all content. Most people who make Hugo sites have direct control over their content, so our situation is not likely a common issue. The situation surrounding shortcode delimiters has a lot of history. This is an old article (2019), but it details a lot of the historical problems surrounding this issue: https://oostens.me/posts/hugo-shortcodes-with-markdown-gotchas/. In general I think it's problematic to enforce that a shortcode be called one way or the other. The functionality behind the scenes should be transparent to the "user" (in this case, the content creator) and you should be able to use either one. Like Nelis points out in his article, having different behavior between the two delimiters exposes implementation details to the user. If you look at the ref shortcode documentation as an example though, they say "Always use the {{% %}} notation when calling this shortcode." This indicates to me that the ref shortcode always returns markdown that needs to be processed, so if you call it the other way then unrendered markdown will end up in your content.

From the Hugo documentation on shortcodes:

"Shortcodes using the % as the outer-most delimiter will be fully rendered when sent to the content renderer. This means that the rendered output from a shortcode can be part of the page’s table of contents, footnotes, etc."

"The < character indicates that the shortcode’s inner content does not need further rendering. Often shortcodes without Markdown include internal HTML."

What they are referring to with that last bit is the ability to call {{ .Inner }} in a shortcode template, which will give you the inner HTML from the called shortcode like in their example:

{{< myshortcode >}}<p>Hello <strong>World!</strong></p>{{< /myshortcode >}}

Meanwhile what they mention about % shortcodes also makes sense, as if your shortcode is rendering markdown that contains a header (for example) and your page also has a markdown Table of Contents, you need the shortcode to return markdown in order for that to work. The same goes for the footnote syntax.

In our case with resource_link I think we're fine to switch to angle brace syntax as resource links will not make use of either the TOC or footnotes markdown syntax. If you look at the ocw-hugo-themes source code, there are some shortcodes that already require using the angle brace syntax, since they return raw HTML and use the {{ .Inner }} property, such as:

  • Image Gallery
  • Tables
  • Quizzes

Whether it's worth it or not to switch just to address this one issue, I'm not sure. Relying on string replacement of escape characters doesn't feel great, but neither does the idea of changing the expected resource_link syntax.

Did you download the content from RC? I intended for you to use this repo from my org. (I should've been more clear about this in the instructions.)

I thought I had grabbed this from your test org, as I have a folder specifically for content from there for testing your PR's, but apparently I cloned this repo from RC inside that folder:

gumaerc@gumaerc-work:~/Code/hussaintaj-mitodl/ht-test-2024$ git remote -v
origin  [email protected]:ocw-content-rc/ht-test-2024.git (fetch)
origin  [email protected]:ocw-content-rc/ht-test-2024.git (push)

I removed and replaced it with the version from your test org and the description field is fine now.

To summarize, I think we are okay to merge this PR. Performing the string replacement feels hacky, so I wanted to call it out and explain the situation a little better. I think I've said all there is to say on the matter though, so we need to make a decision about what to do about the resource_link syntax afterward.

@HussainTaj-arbisoft
Copy link
Contributor Author

Thank you for the detailed explanation @gumaerc. I appreciate it. There's still much to learn about Hugo.

I've created #1376 and linked our relevant conversations there.

@HussainTaj-arbisoft HussainTaj-arbisoft merged commit f8df1c8 into main Mar 27, 2024
6 checks passed
This was referenced Mar 27, 2024
@odlbot odlbot mentioned this pull request Apr 1, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants