-
-
Notifications
You must be signed in to change notification settings - Fork 284
Issues: adbar/trafilatura
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Question about error handling design in request functions
#789
opened Feb 26, 2025 by
L-cloud
updated Feb 26, 2025
markdown conversion removes elements with anchors in a list
#788
opened Feb 20, 2025 by
ziodave
updated Feb 20, 2025
Fast and full mode yield the same results
bug
Something isn't working
#787
opened Feb 12, 2025 by
adbar
updated Feb 12, 2025
Trafilatura cannot read gzipped pages?
bug
Something isn't working
#781
opened Feb 2, 2025 by
LaundroMat
updated Feb 3, 2025
Issues with xpath processing along the "FullText" path template recognition.
bug
Something isn't working
#780
opened Jan 29, 2025 by
krstp
updated Jan 31, 2025
Deduplication is non-deterministic (and destructive)
question
Further information is requested
#778
opened Jan 24, 2025 by
BramVanroy
updated Jan 27, 2025
Table tags incorrect in HTML formatted output
bug
Something isn't working
#777
opened Jan 14, 2025 by
GICodeWarrior
updated Jan 27, 2025
Trafilatura fails to extract structured heading tags (h2, h3)
#774
opened Jan 7, 2025 by
LeMoussel
updated Jan 7, 2025
Turning on "--keep-dirs" gives no output
bug
Something isn't working
#771
opened Dec 20, 2024 by
DesBw
updated Dec 27, 2024
Duplicated lines when nested in <article> and <main>, with <br> in front
bug
Something isn't working
#768
opened Dec 14, 2024 by
ibestvina
updated Dec 23, 2024
Question regarding title extraction
question
Further information is requested
#770
opened Dec 16, 2024 by
unsleepy22
updated Dec 18, 2024
Documentation: on precision
documentation
Docs in need of update or extension
#766
opened Dec 10, 2024 by
DesBw
updated Dec 10, 2024
CLI: better control of output file names
enhancement
New feature or request
#754
opened Nov 30, 2024 by
DesBw
updated Dec 5, 2024
Backticks produce extra line breaks
bug
Something isn't working
#755
opened Nov 30, 2024 by
klvbdmh
updated Dec 2, 2024
Support for sidemap parsing from text instead of urls
feedback
Feedback from users requested
#751
opened Nov 27, 2024 by
NiClassic
updated Nov 28, 2024
Performance bottleneck in Further information is requested
prune_unwanted_nodes
causing 200ms per call
question
#750
opened Nov 23, 2024 by
thsunkid
updated Nov 25, 2024
Review input type for New feature or request
is_probably_readerable()
function
enhancement
#749
opened Nov 22, 2024 by
adbar
updated Nov 22, 2024
Documentation about settings could use examples
documentation
Docs in need of update or extension
#746
opened Nov 15, 2024 by
georgedorn
updated Nov 18, 2024
Add document language to metadata
enhancement
New feature or request
#224
opened Jul 19, 2022 by
adbar
updated Nov 12, 2024
feat(cli/lib): Add tqdm based progress bar as an option
enhancement
New feature or request
#663
opened Jul 30, 2024 by
chitralverma
updated Oct 22, 2024
Review HTML element list and conversion
enhancement
New feature or request
#720
opened Oct 15, 2024 by
adbar
updated Oct 15, 2024
2 tasks
Empty Results When Using Spider Function with Category URL
question
Further information is requested
#696
opened Sep 9, 2024 by
felipehertzer
updated Oct 1, 2024
List of smaller extraction bugs (text & metadata)
good first issue
Good for newcomers
up for grabs
Good for (first) contributors
#4
opened Jan 9, 2020 by
adbar
updated Sep 22, 2024
Docs: add page explaining how to run tests
documentation
Docs in need of update or extension
#698
opened Sep 9, 2024 by
adbar
updated Sep 9, 2024
Downloads: add support to switch between proxies
enhancement
New feature or request
#697
opened Sep 9, 2024 by
adbar
updated Sep 9, 2024
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.