-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patch(report_news): add peef.dev
as source
#124
base: main
Are you sure you want to change the base?
Conversation
add peef.dev
as sourcepeef.dev
as source
200c5ae
to
9b1b85f
Compare
9b1b85f
to
ab0e011
Compare
peef.dev
as sourcepeef.dev
as source
continue | ||
|
||
link = re.sub(r'^http://', 'https://', urlquote(loc.text, safe=":/")) | ||
pub_date = lastmod.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of creating this variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sanix-Darker, which variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub_date
|
||
articles = [] | ||
for url in xml_raw.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url')[-MAX_ARTICLES:]: | ||
loc = url.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you secure each iteration with a try/except ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated, I think now only HTTP errors are remaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we use this MAX_ARTICLES=10
limit here, can you elaborate ?
I left a comment on the other PR regarding what I think we should do here,
link = re.sub(r'^http://', 'https://', urlquote(loc.text, safe=":/")) | ||
pub_date = lastmod.text | ||
|
||
if not link.startswith("https://peef.dev/post/"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to add here that not all content from /post/ are article
- We have comments or simple post for the early version :
/post/<slug>
where slug is autogenerated - And articles are from
/post/<username>/<slug>
where slug comes from the article's title
So, i think normal /post/<slug>
should be excluded and use only /post/<username>/<slug>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdounasser202, I updated 👌
pub_date = lastmod.text | ||
|
||
# A peef article in under the form https://peef.dev/post/<author>/<slug> | ||
if not link.startswith("https://peef.dev/post/") or link.strip("/").count("/") != 5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, but why not making it very strict ? If we need only articles, then it should be strict, so using AND instead of OR is better
Doing that, every URL should pass both checks : start with https://peef.dev/post/ and have exactly 5 "/"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdounasser202, I think, it is doing exactly what you said.
If I am wrong, please, give me more details for me to understand your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you're right, I did a test, and it's doing what I was explaining
so it's fine ! 👍
No description provided.