-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Internal tokenizer #204
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mischov
reviewed
Aug 22, 2019
lib/floki/html/tree_construction.ex
Outdated
@@ -0,0 +1,63 @@ | |||
defmodule Floki.HTML.TreeConstruction do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been a long road to get here- congrats!
Now the real "fun" starts. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mischov thank you! 💜 😄
I hope I can have some fun with all this complexity 😅
d06386d
to
f611461
Compare
c389a2e
to
f5c4c76
Compare
485bac5
to
a28ab2a
Compare
a28ab2a
to
837c2dd
Compare
837c2dd
to
2bda94b
Compare
29aef07
to
e7906cc
Compare
cf12a09
to
6a8006a
Compare
6a8006a
to
2b89e62
Compare
1a10a4a
to
dfcb4b8
Compare
c326149
to
a0ed877
Compare
d0490a3
to
3c80baf
Compare
fdb9523
to
5c874fd
Compare
This is needed to correct match characters using the UTF8 format. Since this parser is not planned to work with encodings other than UTF8, it's OK to match direct with `utf8`. For more on UTF8 and Elixir: - https://www.bignerdranch.com/blog/unicode-and-utf-8-explained/ - https://www.bignerdranch.com/blog/elixir-and-unicode-part-2-working-with-unicode-strings/
Also introduce the `emit` function, which should be helpful if we want to build the three async.
It also improves the Document.add_node/3 to accept only the parent node id instead of having to keep all the parent data. This should give a little bit of performance and reduce the memory usage.
It adds some of the states transitions of tree construction
This algorithm is important as it return the place/node that we need to add a new node.
This has the intention to improve performance.
This is a big change that enable us to replace the fetch of HTML entities from a JSON file parsed in memory to a lookup in a module. This is similar to what we have in HTML Entities package. Also this fixes a lot of small IO data building errors.
This feature was incomplete and is can be added in the future.
It uses less memory.
This is a replacement for the strategy that was loading all the tests at once from metaprogramming. Now the tests are being loaded much faster and are easier to debug.
Those files are based on html5lib-tests and ensure that we are following or we are close to the HTML specification. The test files that were not passing were not added.
This will enable easier debugging. The fix of warnings make clear which functions are public (re entrant tokenizer states) and also fix typespecs.
Don't worries: it is going to appear in another PR. This is a preparation to merge the pending work.
8aba3f4
to
47396b0
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is work in progress of a HTML tokenizer.
It is related with #37