Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zimcheck performance improvement #134

Closed
MiguelRocha opened this issue Jul 29, 2020 · 7 comments · Fixed by #162
Closed

zimcheck performance improvement #134

MiguelRocha opened this issue Jul 29, 2020 · 7 comments · Fixed by #162

Comments

@MiguelRocha
Copy link
Contributor

zimcheck performance is quite important since for big articles can take several minutes (maybe hours).

There are a few performance improvements that can be done in zimcheck specific code:

  • Decrease the number of function calls to getLinks()
  • Improve getLinks() performance
  • Improve isExternalUrl()

The first 2 points are responsible for 18% while the last one takes 20% speedtime.

Also, there maybe some performance gains if we find a better way to validate the URLs. Currently we are calling getArticle() from libzim and this takes around 50% of the time since it needs access the direntCache. (related: openzim/libzim#385 (comment))

@kelson42
Copy link
Contributor

@MiguelRocha Good to me to implement. If the amount of changes is subtantial, please split this ticket.

erumoico pushed a commit to KNOT-FIT-BUT/zim-tools that referenced this issue Aug 12, 2020
@veloman-yunkan
Copy link
Collaborator

@MiguelRocha Do you mind if I work on this in combination with optimizations on the libzim side?

@MiguelRocha
Copy link
Contributor Author

@veloman-yunkan please feel free to do it. And let me know if you need any help. I have a draft branch that already fixes the first point that I described and I will create a PR after finishing other issues.

@kelson42
Copy link
Contributor

kelson42 commented Sep 2, 2020

@MiguelRocha Would you be able to move the isExternalUrl() in an other ticket please
@veloman-yunkan Thank you for your offer, I propose @MiguelRocha make a PR with what he already has done and then let discuss this ticket again.

@kelson42
Copy link
Contributor

@mgautierfr @MiguelRocha I believe this should not be closed because the following points need to be implemented:

  • Improve getLinks() performance
  • Improve isExternalUrl()

@kelson42 kelson42 reopened this Sep 23, 2020
@mgautierfr
Copy link
Collaborator

Yes, it was automatically closed when I've merged #162.

@kelson42 kelson42 pinned this issue Sep 30, 2020
@kelson42
Copy link
Contributor

kelson42 commented Oct 7, 2020

There is no obvious and consensual solution to optimize getLinks() and isExternalUrl(). Closing for the moment and we will open a new ticket if a clear (speed?) problem happens again.

@kelson42 kelson42 closed this as completed Oct 7, 2020
@kelson42 kelson42 unpinned this issue Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants