Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCS-3388 Preprocessing the adoc to create an xref using uuid #248

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
package com.redhat.pantheon.asciidoctor;

import com.google.common.base.Charsets;
import com.google.common.hash.HashCode;
import com.google.common.hash.Hashing;
import com.redhat.pantheon.asciidoctor.extension.HtmlModulePostprocessor;
import com.redhat.pantheon.asciidoctor.extension.MetadataExtractorTreeProcessor;
import com.redhat.pantheon.asciidoctor.extension.SlingResourceIncludeProcessor;
import com.redhat.pantheon.asciidoctor.extension.UuidPreProcessor;
import com.redhat.pantheon.conf.GlobalConfig;
import com.redhat.pantheon.model.module.Content;
import com.redhat.pantheon.model.module.Metadata;
import com.redhat.pantheon.model.module.Module;
import com.redhat.pantheon.model.module.ModuleVersion;
import com.redhat.pantheon.sling.ServiceResourceResolverProvider;
import static java.util.stream.Collectors.toMap;

import java.text.SimpleDateFormat;
Expand All @@ -26,19 +39,8 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.google.common.base.Charsets;
import com.google.common.hash.HashCode;
import com.google.common.hash.Hashing;
import com.redhat.pantheon.asciidoctor.extension.HtmlModulePostprocessor;
import com.redhat.pantheon.asciidoctor.extension.MetadataExtractorTreeProcessor;
import com.redhat.pantheon.asciidoctor.extension.SlingResourceIncludeProcessor;
import com.redhat.pantheon.conf.GlobalConfig;
import com.redhat.pantheon.model.ProductVersion;
import com.redhat.pantheon.model.module.Content;
import com.redhat.pantheon.model.module.Metadata;
import com.redhat.pantheon.model.module.Module;
import com.redhat.pantheon.model.module.ModuleVersion;
import com.redhat.pantheon.sling.ServiceResourceResolverProvider;


/**
* Business service class which provides Asciidoctor-related methods which work in conjunction with other
Expand Down Expand Up @@ -229,18 +231,29 @@ private String buildModule(Module base, ModuleVersion moduleVersion, Map<String,
asciidoctor.javaExtensionRegistry().includeProcessor(
new SlingResourceIncludeProcessor(base));
asciidoctor.javaExtensionRegistry().postprocessor(
new HtmlModulePostprocessor(base));
new HtmlModulePostprocessor(base));

// add specific extensions for metadata regeneration
if(regenMetadata) {
asciidoctor.javaExtensionRegistry().treeprocessor(
new MetadataExtractorTreeProcessor(moduleVersion.metadata().getOrCreate()));
}
}

String html = "";
try {
try {
String ascContent = moduleVersion.content().get().asciidocContent().get();

//Extracting link from the adoc using preprocessor
asciidoctor.javaExtensionRegistry().preprocessor(
new UuidPreProcessor(base));
html = asciidoctor.convert(
ascContent,
OptionsBuilder.options().toFile(false));

log.info("asciidoctor content: {}", html);
Comment on lines +249 to +253
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these lines. You are calling .convert() twice here - the first result gets overwritten by the second, so it must be unnecessary. We probably don't need the log statement here either.


html = asciidoctor.convert(
moduleVersion.content().get().asciidocContent().get(),
ascContent,
ob.get());
cacheContent(moduleVersion.content().get(), html);
} finally {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ public HtmlModulePostprocessor(Resource module) {
public String process(Document document, String output) {
return Html.parse(Charsets.UTF_8.name())
.andThen(Html.encodeAllImageLocations(module))
.andThen(Html.dereferenceAllHyperlinks(module.getResourceResolver()))
.apply(output)
.toString();
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
package com.redhat.pantheon.asciidoctor.extension;

import org.asciidoctor.ast.Document;
import org.asciidoctor.extension.Preprocessor;
import org.asciidoctor.extension.PreprocessorReader;
import com.redhat.pantheon.jcr.JcrQueryHelper;
import org.apache.sling.api.resource.ResourceResolver;
import java.util.Optional;

import org.apache.sling.api.resource.Resource;
import java.util.ArrayList;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class UuidPreProcessor extends Preprocessor { // (1)

private static Resource module;
private static final Logger log = LoggerFactory.getLogger(UuidPreProcessor.class);
private static String newModulePath;
Comment on lines +18 to +20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields cannot be static. I'm happy to explain why, if you'd like me to.


public UuidPreProcessor(Resource module) {
this.module = module;
}

@Override
public void process(Document document, PreprocessorReader reader) {

List<String> lines = reader.readLines();
List<String> newLines = new ArrayList<String>();
String[] split;
String uuid, newLink;

for (String line: lines) {
if(line.startsWith("xref:")){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start but it needs to be updated so that it works for all of these scenarios:

  • Line has an "xref:" but it's not at the start of the line.
  • Author uses the "<<xrefTarget,xrefLabel>>" syntax (I'm not sure how parameters are supplied in this scenario, if it's even possible... please find out)
  • Author uses a "link:" instead of an xref

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Ben,
These cases look good and there are a few more cases that crossed my mind.
I would want to incorporate as many possible in this scenario.

split = line.split(",pantheon-id=");
uuid = split[1].replace(split[1].substring(split[1].length()-1), "");
Comment on lines +36 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what this does. Are we assuming that the 'pantheon-id' parameter is the final parameter of the xref, and then the last character is the closing ] bracket, and then the line ends? Anyway, please make this a bit more reliable like we discussed in-person.

Don't spend too much time making the code update "perfect" - I'm not sure that we'll stick with UUIDs for the final solution, so just getting the next N characters (N = however long a UUID is) from the string would be good enough for this POC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach was done considering the last character to be ']' and assuming 'pantheon-id' as the final parameter of xref.
Since I would be working with more number of cases as mentioned in the above comments, I would probably applying some regex to identify and resolve the UUID.

resolveActualPath(module.getResourceResolver(), uuid);
newLink = split[0].replaceAll(":.*?\\[", ":"+newModulePath+"[");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOF, regex syntax!

I think I can read this. It mostly makes sense, but what is the purpose of the ? in the expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained the use of "?" in the below example:
input = "The << first match >> and << second match >> of the input string"
regex = <<.>> matches the whole string "<< first match >> and << second match >>"
But
regex = <<.
?>> produces two matches: "<< first match >>" and "<< second match >>"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh, the "?" makes it non-greedy! Ok, that makes sense, thank you very much!

newLink = newLink + "]";
newLines.add(newLink);
Comment on lines +38 to +41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see this logic being conditional based on whether or not the UUID query returns a result. It looks like the current behavior is that if the UUID fails to resolve, the xref is totally wiped out and replaced with nothing. What the code should do is leave the original author-supplied xref target intact.

}else {
newLines.add(line);
}
}
reader.restoreLines(newLines);
}

private static void resolveActualPath(ResourceResolver resolver, String uuid) {
JcrQueryHelper qh = new JcrQueryHelper(resolver);
try {
Optional<Resource> result =
qh.query("select * from [nt:base] WHERE [jcr:uuid] = '" + uuid + "'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we talked about having [nt:base] here earlier. It may not make a difference from a performance perspective, but from a functionality perspective, it makes sense to put [pant:module] here instead, defensively. The reason is that we may expose UUIDs for other objects (maybe products, for some reason?) to authors at some point in the future. If the author accidentally supplies a product's UUID in an xref, we wouldn't want this code resolving that. It is better to specify that we are looking for modules only and to return no result in such a scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, since this is searching for all the UUIDs across the hierarchy, we might have to switch to [pant:module]

.findFirst();

result.ifPresent(output -> {
assignValue(output.getPath());
log.info("result:", output.getPath());
});
}catch (Exception e) {
e.printStackTrace();
}
};

private static void assignValue(String path){
newModulePath = path + ".preview";
log.info("newPath2:"+ newModulePath);
}
}
24 changes: 0 additions & 24 deletions pantheon-bundle/src/main/java/com/redhat/pantheon/html/Html.java
Original file line number Diff line number Diff line change
Expand Up @@ -56,30 +56,6 @@ public static Function<Document, Document> encodeAllImageLocations(final Resourc
};
}

public static Function<Document, Document> dereferenceAllHyperlinks(ResourceResolver resolver) {
JcrQueryHelper qh = new JcrQueryHelper(resolver);
return document -> {
document.select("a")
.forEach(hyperlink -> {
hyperlink.childNodes().stream()
.filter(child -> "#comment".equals(child.nodeName()))
.map(child -> UUID_PATTERN.matcher(child.outerHtml()))
.filter(matcher -> matcher.find())
.map(matcher -> matcher.group())
.forEach(uuid -> {
try {
qh.query("select * from [pant:module] as module WHERE module.[jcr:uuid] = '" + uuid + "'")
.findFirst()
.ifPresent(resource -> hyperlink.attr("href", resource.getPath() + ".preview"));
} catch (RepositoryException e) {
e.printStackTrace();
}
});
});
return document;
};
}

/**
* An extractor function which returns just the body content for the parsed html document.
* @return An html extactor function.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ protected void doGet(SlingHttpServletRequest request, SlingHttpServletResponse r
Optional<Content> content;
if (draft) {
content = module.getDraftContent(LocaleUtils.toLocale(locale));
log.info("asciidoctorservlet content");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to touch this file. I am 90% sure that this servlet is unused in our project... but no one has gotten around to deleting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have to delete this

} else {
content = module.getReleasedContent(LocaleUtils.toLocale(locale));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,44 +68,6 @@ void encodeAllImageLocations() {
});
}

@Test
void dereferenceAllHyperlinks() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely makes sense to remove this test since you're removing the code that it targets, but don't forget to add appropriate tests of your own for the new code that you're introducing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Ben,
Once we have finalized and covered the cases in the code, I would write corresponding tests for them.

// Given
sCtx.create().resource("/test",
"name", "a-name",
"jcr:primaryType", "pant:module");
sCtx.create().resource("/test/child",
"name", "child-name");
String resourceUuid = sCtx.resourceResolver()
.getResource("/test")
.getValueMap()
.get("jcr:uuid")
.toString();

String html = "<html>" +
"<head><title>This is the head</title></head>" +
"<body>This is the body" +
"<a href='1234'>vanilla hyperlink</a>" +
"<a href='abcd'><!-- " + resourceUuid + " -->link with a valid uuid</a>" +
"<a href='xyz'><!-- 123e4567-e89b-12d3-a456-426655440000 -->link with a random uuid</a>" +
"</body>" +
"</html>";

// When
String transformedHtml = Html.parse(Charsets.UTF_8.name())
.andThen(Html.dereferenceAllHyperlinks(sCtx.resourceResolver()))
.andThen(doc -> doc.toString())
.apply(html);

// Then
Document doc = Jsoup.parse(transformedHtml, "UTF-8");
List<Element> elms = doc.select("a").stream().collect(Collectors.toList());
assertFalse(elms.isEmpty());
assertTrue("1234".equals(elms.get(0).attr("href")));
assertFalse("abcd".equals(elms.get(1).attr("href")));
assertTrue("xyz".equals(elms.get(2).attr("href")));
}

@Test
void getBody() {
// Given
Expand Down