You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, in the second example, I find there is one line of code: tagLinks = pTag.findAll('a', href=re.compile('/wiki/'), class_=False)
And I want to make sure if this line is used to find the string that starts with "a" and href='/wiki/'; for example: <a href="/wiki/Mass_communication" title="Mass communication">mass communication</a>
However, when I use pTag.findAll('a', href=re.compile('https://www.nytimes.com'), class_=False) with no base url to extract <a class="css-1g7m0tk" href="https://www.nytimes.com/2021/07/23/technology/silicon-valleys-pandemic-profits.html" title="">, it doesn't return anything.
Would you mind explaining a bit about the meaning of the codes and my problem. Thank you!
The text was updated successfully, but these errors were encountered:
You are correct. pTag.findAll('a', href=re.compile('/wiki/'), class_=False) returns a list of strings that (1) starts with <a and (2) contains /wiki/ substring (in the href attribute).
If you use pTag.findAll('a', href=re.compile('https://www.nytimes.com'), class_=False), this should return a list of strings that (1) starts with <a and (2) contains https://www.nytimes.com substring (e.g., href="https://www.nytimes.com/2021/07/23/technology/silicon-valleys-pandemic-profits.html").
If your code does not return any result, make sure that pTag contains the strings that you are looking for.
Would you try this code:
import bs4
import requests
url = "address of webpage that includes <a class..."
req = requests.get(url)
soup = bs4.BeautifulSoup(req.text, 'html.parser')
print(soup.findAll('a', href=re.compile('https://www.nytimes.com'), class_=False))
Hi, in the second example, I find there is one line of code:
tagLinks = pTag.findAll('a', href=re.compile('/wiki/'), class_=False)
And I want to make sure if this line is used to find the string that starts with "a" and href='/wiki/'; for example:
<a href="/wiki/Mass_communication" title="Mass communication">mass communication</a>
However, when I use
pTag.findAll('a', href=re.compile('https://www.nytimes.com'), class_=False)
with no base url to extract<a class="css-1g7m0tk" href="https://www.nytimes.com/2021/07/23/technology/silicon-valleys-pandemic-profits.html" title="">
, it doesn't return anything.Would you mind explaining a bit about the meaning of the codes and my problem. Thank you!
The text was updated successfully, but these errors were encountered: