-
Notifications
You must be signed in to change notification settings - Fork 24
Use Jsoup instead of TagSoup for HTML parsing. #1019
Use Jsoup instead of TagSoup for HTML parsing. #1019
Conversation
Being way more flexible, it allows us to simplify the parsing *a lot* and do checks we couldn't do before, like one which was needed to know when extra line breaks between inline nodes and block ones are needed.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1019 +/- ##
==========================================
- Coverage 90.28% 89.53% -0.75%
==========================================
Files 179 104 -75
Lines 21474 17996 -3478
Branches 280 280
==========================================
- Hits 19387 16113 -3274
+ Misses 2084 1880 -204
Partials 3 3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -57,7 +58,7 @@ class InterceptInputConnectionIntegrationTest { | |||
textView.text.dumpSpans(), equalTo( | |||
listOf( | |||
"hello: android.widget.TextView.ChangeWatcher (0-5) fl=#6553618", | |||
"hello: android.text.style.StyleSpan (0-5) fl=#33", | |||
"hello: android.text.style.StyleSpan (0-5) fl=#17", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are caused by using inSpans
, but they shouldn't affect the functionality of the editor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, thanks!
@@ -81,6 +81,9 @@ class MainActivity : ComponentActivity() { | |||
var linkDialogAction by remember { mutableStateOf<LinkAction?>(null) } | |||
val coroutineScope = rememberCoroutineScope() | |||
|
|||
LaunchedEffect(state.messageHtml) { | |||
Timber.d("Message HTML: '${state.messageHtml}'") | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you want to remove this log (privacy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, this is in the compose integration example only 😅 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, I read it a bit fast. I am always paranoid regarding what app and libraries are logging out.
"pre" -> parseCodeBlock(element) | ||
"blockquote" -> parseQuote(element) | ||
"p" -> parseParagraph(element) | ||
"br" -> parseLineBreak(element) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a log for un-handled tag names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! In fact, I added a clean step where we remove any tags and attributes we don't recognise.
InlineFormat.Italic -> StyleSpan(Typeface.ITALIC) | ||
InlineFormat.Underline -> UnderlineSpan() | ||
InlineFormat.StrikeThrough -> StrikethroughSpan() | ||
InlineFormat.InlineCode -> return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This return
statement is strange to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments to clarify this.
Add logs for unknown tag found while parsing.
Quality Gate passedIssues Measures |
I had to add some extra processing for whitespaces because Jsoup apparently doesn't directly strip it as Tagsoup did, and this could potentially cause index issues in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
Jsoup, being way more flexible, allows us to simplify the parsing a lot and do checks we couldn't do before like one which lets us know when extra line breaks between inline nodes and block ones are needed.