Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Article title not set to PDF title #98

Open
kelson42 opened this issue Oct 21, 2024 · 3 comments
Open

Article title not set to PDF title #98

kelson42 opened this issue Oct 21, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@kelson42
Copy link
Contributor

Image

Here the tab/article title should be "La fée de la rosée".

@kelson42 kelson42 added the bug Something isn't working label Oct 21, 2024
@benoit74
Copy link
Collaborator

This is "normal" since scraper is not yet using zimscraperlib 4, so not yet automatically using PDF metadata and data. The title of this issue should probably be more something like Update to zimscraperlib 4 to benefit from PDF indexing and metadata. And this is not a bug but an enhancement unless I missed something.

@rgaudin
Copy link
Member

rgaudin commented Oct 21, 2024

@kelson42 this ticket is not great 😐

  • There is no identifier nor URL for the ZIM ; please include it when attaching an example. It is prunelle_auteurs_en_herbe_fr_2024-05 : Auteurs en herbe (Prunelle) from https://library.kiwix.org/#lang=fra&q=prunelle
  • It's not clear what the expected behavior should be:
    • the UI makes direct links (on _blank) to the PDF documents. There is no wrapping entry here.
    • Every browser, using kiwix-serve sees a PDF document, displays the PDF and sets the window/tab title to the PDF title. That's beyond our control
    • the apple reader you are using is displaying the PDF document but is setting the bookmark name (and possibly the tab title – not visible here) to the ZIM entry title.
  • It is true that nautilus is not setting the title of the files' entries so libzim defaults to setting it to the entry path.
  • If we were to set a title to the ZIM entry, it would be the user-defined title in the collection, and not the PDF title.
  • The PDF title of that document is Copie de la fée de la rosée - Canva - La fée de la rosée .pdf
  • Le lack of title was originally set to not create suggestions and changing/fixing this as been discussed in No URL to access directly a content #54

@kelson42
Copy link
Contributor Author

kelson42 commented Dec 26, 2024

@benoit74 @rgaudin I have been again impacted by that bug today and had therefore the opportunity to come back on your two last comments. I have comments about the following things:

The title of this issue should probably be more something like Update to zimscraperlib 4 to benefit from PDF indexing and metadata

In general, we need ticket written from the user perspective AFAP (in particular if this is reported my a non-dev). I want to avoid any technical exchange before the problem or improvement is clear - from the user perspective.

And this is not a bug

This is a bug! Any "front article" (so not an HTML resource) should have (if possible at all) a title metadata in the ZIM. The reason is that this is the way how reader label an article, if we don't have one, then readers have to fallback on the path of the content... which is not user friendly. It's important that this is very clear for any scraper dev (@benoit74 measures to be taken there?).

  • It is true that nautilus is not setting the title of the files' entries so libzim defaults to setting it to the entry path.

  • If we were to set a title to the ZIM entry, it would be the user-defined title in the collection, and not the PDF title.

  • The PDF title of that document is Copie de la fée de la rosée - Canva - La fée de la rosée .pdf

  • Le lack of title was originally set to not create suggestions and changing/fixing this as been discussed in

Thank you for link this to #54, this is indeed a variation around "No proper suggestion list" I guess. I can stick to the conclusion I had taken 6 months ago... but whatever how we take the problem/solution, I believe we could stick to the principle: if the content is loaded directly (so not a resource) in Kiwix, then this is a "front article" and if this is a "front article", we really better put a proper title (whatever if this comes from the PDF - probably default behaviour - or set manually in the json).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants