Skip to content

Issues: adbar/trafilatura

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Function to use part of the heuristics on bare HTML fragments enhancement New feature or request
#369 opened Jun 14, 2023 by adbar
Refactor code to provide a "keep-tags" option enhancement New feature or request
#52 opened Jan 12, 2021 by adbar
3 tasks
Keeping all valid table information and formatting bug Something isn't working
#78 opened Jun 2, 2021 by adbar
Simplify handling of nested elements enhancement New feature or request
#93 opened Jul 12, 2021 by adbar
Extract inline structured data from page <body> question Further information is requested
#173 opened Feb 15, 2022 by Seirdy
include_images changes text extraction bug Something isn't working
#194 opened Apr 12, 2022 by carschno
Title repeated in the body enhancement New feature or request
#220 opened Jul 3, 2022 by rgeronimi
Add document language to metadata enhancement New feature or request
#224 opened Jul 19, 2022 by adbar
Fix XPath expression in subtree maintenance Software compability and continuity
#289 opened Jan 19, 2023 by adbar
Collected links as metadata field? enhancement New feature or request
#290 opened Jan 26, 2023 by Amaimersion
List of smaller extraction bugs (text & metadata) good first issue Good for newcomers up for grabs Good for (first) contributors
#4 opened Jan 9, 2020 by adbar
Gooey dependency seems unmaintained and broken wontfix This will not be worked on
#367 opened Jun 9, 2023 by tkapias
Check URLs passed to courlan functions extract_links and fix_relative_urls question Further information is requested
#382 opened Jun 23, 2023 by adbar
Image markdown not included during processing bug Something isn't working
#388 opened Jul 2, 2023 by kianwilcox
Empty h1 blocks non-empty h2 bug Something isn't working
#400 opened Aug 3, 2023 by pieterhartel
Question about the title question Further information is requested
#402 opened Aug 4, 2023 by pieterhartel
Returns horribly bad result for MSN page bug Something isn't working
#410 opened Aug 21, 2023 by TheRabidWolverine
List items are being missed bug Something isn't working
#431 opened Oct 12, 2023 by alroythalus
ProTip! Adding no:label will show everything without a label.