-
-
Notifications
You must be signed in to change notification settings - Fork 284
Issues: adbar/trafilatura
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Function to use part of the heuristics on bare HTML fragments
enhancement
New feature or request
#369
opened Jun 14, 2023 by
adbar
Refactor code to provide a "keep-tags" option
enhancement
New feature or request
#52
opened Jan 12, 2021 by
adbar
3 tasks
Keeping all valid table information and formatting
bug
Something isn't working
#78
opened Jun 2, 2021 by
adbar
Simplify handling of nested elements
enhancement
New feature or request
#93
opened Jul 12, 2021 by
adbar
Extract inline structured data from page <body>
question
Further information is requested
#173
opened Feb 15, 2022 by
Seirdy
Add include_video parameter (iframe elements are missing)
enhancement
New feature or request
#175
opened Feb 18, 2022 by
fraseInc
include_images
changes text extraction
bug
#194
opened Apr 12, 2022 by
carschno
Add document language to metadata
enhancement
New feature or request
#224
opened Jul 19, 2022 by
adbar
Extraction of Youtube iframes and img elements with links
enhancement
New feature or request
#272
opened Dec 5, 2022 by
sampathmende
Fix XPath expression in subtree
maintenance
Software compability and continuity
#289
opened Jan 19, 2023 by
adbar
Collected links as metadata field?
enhancement
New feature or request
#290
opened Jan 26, 2023 by
Amaimersion
List of smaller extraction bugs (text & metadata)
good first issue
Good for newcomers
up for grabs
Good for (first) contributors
#4
opened Jan 9, 2020 by
adbar
Gooey dependency seems unmaintained and broken
wontfix
This will not be worked on
#367
opened Jun 9, 2023 by
tkapias
Check URLs passed to courlan functions Further information is requested
extract_links
and fix_relative_urls
question
#382
opened Jun 23, 2023 by
adbar
Image markdown not included during processing
bug
Something isn't working
#388
opened Jul 2, 2023 by
kianwilcox
included_images
failed when trying to extract images in a table
bug
#396
opened Jul 27, 2023 by
ChangyaoTian
Question about the title
question
Further information is requested
#402
opened Aug 4, 2023 by
pieterhartel
Returns horribly bad result for MSN page
bug
Something isn't working
#410
opened Aug 21, 2023 by
TheRabidWolverine
include_links breaks the extraction for https://news.ycombinator.com
bug
Something isn't working
#411
opened Aug 28, 2023 by
shivanker
Parts are getting missed out after using extract funtion
enhancement
New feature or request
#430
opened Oct 12, 2023 by
alroythalus
Entire/majority content of these 2 sites being missed out
enhancement
New feature or request
#432
opened Oct 12, 2023 by
alroythalus
Previous Next
ProTip!
Adding no:label will show everything without a label.