Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch(report_news): add peef.dev as source #124

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pythonbrad
Copy link
Member

No description provided.

@pythonbrad pythonbrad changed the title report_news: add peef.dev as source report_news: add peef.dev as source Nov 7, 2024
@pythonbrad pythonbrad marked this pull request as draft November 7, 2024 17:55
@pythonbrad pythonbrad marked this pull request as ready for review November 7, 2024 19:41
@pythonbrad pythonbrad changed the title report_news: add peef.dev as source patch(report_news): add peef.dev as source Nov 7, 2024
continue

link = re.sub(r'^http://', 'https://', urlquote(loc.text, safe=":/"))
pub_date = lastmod.text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of creating this variable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sanix-Darker, which variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub_date


articles = []
for url in xml_raw.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url')[-MAX_ARTICLES:]:
loc = url.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you secure each iteration with a try/except ?

Copy link
Member Author

@pythonbrad pythonbrad Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated, I think now only HTTP errors are remaining.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we use this MAX_ARTICLES=10 limit here, can you elaborate ?
I left a comment on the other PR regarding what I think we should do here,

link = re.sub(r'^http://', 'https://', urlquote(loc.text, safe=":/"))
pub_date = lastmod.text

if not link.startswith("https://peef.dev/post/"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add here that not all content from /post/ are article

  1. We have comments or simple post for the early version : /post/<slug> where slug is autogenerated
  2. And articles are from /post/<username>/<slug> where slug comes from the article's title

So, i think normal /post/<slug> should be excluded and use only /post/<username>/<slug>

Copy link
Member Author

@pythonbrad pythonbrad Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdounasser202, I updated 👌

pub_date = lastmod.text

# A peef article in under the form https://peef.dev/post/<author>/<slug>
if not link.startswith("https://peef.dev/post/") or link.strip("/").count("/") != 5:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, but why not making it very strict ? If we need only articles, then it should be strict, so using AND instead of OR is better

Doing that, every URL should pass both checks : start with https://peef.dev/post/ and have exactly 5 "/"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdounasser202, I think, it is doing exactly what you said.

If I am wrong, please, give me more details for me to understand your point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right, I did a test, and it's doing what I was explaining
so it's fine ! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants