Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include_links breaks the extraction for https://news.ycombinator.com #411

Open
shivanker opened this issue Aug 28, 2023 · 2 comments
Open
Labels
bug Something isn't working

Comments

@shivanker
Copy link

Just as the title says. Attaching screenshot as an example.
Screenshot 2023-08-28 at 16 22 11

@adbar adbar added the bug Something isn't working label Aug 30, 2023
@adbar
Copy link
Owner

adbar commented Aug 30, 2023

Hi @shivanker, extraction of main content from what is actually a summary page is tricky, but there is a bug here indeed.

@HammadRafique29
Copy link

I go through the source code, and found out (windows) that the include_links feature is working well. The only problem is that the base base_url passed is somehow is None.

image

I have printed the target link which looks like this.

image

In above, you can see there is no base_url (used to create a relative url)

You can pass the Url paremeter to get the full url

image

Here is the output of above code:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants