Profile scraping error: res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*") #178

Kiru6ik · 2023-06-26T21:18:00Z

When scraping the person who worked multiple times at organization this error occurred.
I checked the page structure and it should work fine but for some reason it fails.
This part of code causes the problem:
if position_summary_text and len(position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")) > 1: #.find_element(By.CLASS_NAME,"pvs-list") descriptions = position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li") for description in descriptions: res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*") position_title_elem = res[0] if len(res) > 0 else None work_times_elem = res[1] if len(res) > 1 else None location_elem = res[2] if len(res) > 2 else None
it cant find res by tag name a.
As far as I understood it tries to find the top part of the job description(title, duration at position, location) and all this is located under a tag on the web page. @joeyism do you have any insights on that? Am I referring correctly to the part of the page that this code is trying to analyse?

The whole error message:
`Traceback (most recent call last):
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\lists_check.py", line 23, in
person.scrape(close_on_complete=False)
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 89, in scrape
self.scrape_logged_in(close_on_complete=close_on_complete)
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 285, in scrape_logged_in
self.get_experiences()
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\linkedin_scraper\person.py", line 156, in get_experiences
res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*")
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 417, in find_element
return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"]
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 395, in _execute
return self._parent.execute(command, params)
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 346, in execute
self.error_handler.check_response(response)
File "C:\Users\User\PycharmProjects\pythonProject\pythonProject\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 245, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"tag name","selector":"a"}
(Session info: chrome=114.0.5735.134); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
Backtrace:
GetHandleVerifier [0x0025A813+48355]
(No symbol) [0x001EC4B1]
(No symbol) [0x000F5358]
(No symbol) [0x001209A5]
(No symbol) [0x00120B3B]
(No symbol) [0x00119AE1]
(No symbol) [0x0013A784]
(No symbol) [0x00119A36]
(No symbol) [0x0013AA94]
(No symbol) [0x0014C922]
(No symbol) [0x0013A536]
(No symbol) [0x001182DC]
(No symbol) [0x001193DD]
GetHandleVerifier [0x004BAABD+2539405]
GetHandleVerifier [0x004FA78F+2800735]
GetHandleVerifier [0x004F456C+2775612]
GetHandleVerifier [0x002E51E0+616112]
(No symbol) [0x001F5F8C]
(No symbol) [0x001F2328]
(No symbol) [0x001F240B]
(No symbol) [0x001E4FF7]
BaseThreadInitThunk [0x762B0099+25]
RtlGetAppContainerNamedObjectPath [0x77A97B6E+286]
RtlGetAppContainerNamedObjectPath [0x77A97B3E+238]
(No symbol) [0x00000000]

Process finished with exit code 1
`

The text was updated successfully, but these errors were encountered:

joeyism · 2023-06-26T21:51:07Z

Can you provide the code that you've used please?

Kiru6ik · 2023-06-26T21:58:22Z

Sorry, I forgot to include the failing account at the first place. This bug occurred at this profile: https://www.linkedin.com/in/sheanahamill/.
Error occurs at any basic person scraping. This is a code I used to discover this bug

from selenium.common.exceptions import WebDriverException
from selenium import webdriver
from linkedin_scraper import Person, actions, Company
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
import time, pickle
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("user-data-dir=C:\\Users\\User\\AppData\\Local\\Google\\Chrome\\User Data\\Profile 3")


driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=options)

person=Person("https://www.linkedin.com/in/sheanahamill", driver=driver, scrape=False)
time.sleep(3)
person.scrape(close_on_complete=False)

name=person.name
title=person.job_title
now_company=person.company
print(name, title, now_company)
experience=person.experiences
print(experience)
current_company=experience[0]
print(current_company)
link_to_company=current_company.linkedin_url
print(link_to_company)
location=current_company.location
print(location)


company=Company(link_to_company, driver=driver, get_employees=False, close_on_complete=False)

company_name=company.name
company_size=company.company_size
company_website=company.website
about=company.about_us
print(company_name, company_size, company_website, about)

this code works fine with other account(other that log1 problem from #173)

khamamoto6 · 2023-06-28T21:18:01Z

Hey - I only updated two functions as I needed: get_experiences() and get_name_and_location(). In addition to UI updates I also fixed the scraper issue where it gets confused when a person has multiple positions at the same company over time.

You can selectively scrape by doing this:
person=Person("https://www.linkedin.com/in/sheanahamill", driver=driver, scrape=False)
person.get_experiences()
print(person.experiences)

def get_name_and_location(self):
        main = self.wait_for_element_to_load(by=By.TAG_NAME, name="main")
        top_panels = main.find_elements(By.CLASS_NAME,"pv-text-details__left-panel")
        self.name = top_panels[0].find_elements(By.XPATH,"*")[0].text
        self.location = top_panels[1].find_element(By.TAG_NAME,"span").text

def get_experiences(self): # modified
        url = os.path.join(self.linkedin_url, "details/experience")
        self.driver.get(url)
        self.focus()
        main = self.wait_for_element_to_load(by=By.TAG_NAME, name="main")
        self.scroll_to_half()
        self.scroll_to_bottom()
        main_list = self.wait_for_element_to_load(name="pvs-list", base=main)
        for position in main_list.find_elements(By.XPATH,"li"):
            position = position.find_element(By.CLASS_NAME,"pvs-entity")
            company_logo_elem, position_details = position.find_elements(By.XPATH,"*")

            # company elem
            company_linkedin_url = company_logo_elem.find_element(By.XPATH,"*").get_attribute("href")

            # position details
            position_details_list = position_details.find_elements(By.XPATH,"*")
            position_summary_details = position_details_list[0] if len(position_details_list) > 0 else None
            position_summary_text = position_details_list[1] if len(position_details_list) > 1 else None # skills OR list of positions
            outer_positions = position_summary_details.find_element(By.XPATH,"*").find_elements(By.XPATH,"*")

            if len(outer_positions) == 4:
                position_title = outer_positions[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].text
                company = outer_positions[1].find_element(By.TAG_NAME,"span").text
                work_times = outer_positions[2].find_element(By.TAG_NAME,"span").text
                location = outer_positions[3].find_element(By.TAG_NAME,"span").text
            elif len(outer_positions) == 3:
                if "·" in outer_positions[2].text:
                    position_title = outer_positions[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].text
                    company = outer_positions[1].find_element(By.TAG_NAME,"span").text
                    work_times = outer_positions[2].find_element(By.TAG_NAME,"span").text
                    location = ""
                else:
                    position_title = ""
                    company = outer_positions[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].text
                    work_times = outer_positions[1].find_element(By.TAG_NAME,"span").text
                    location = outer_positions[2].find_element(By.TAG_NAME,"span").text

            elif len(outer_positions) == 2: # this is for when person has multiple pos over time at one company
                company_div, work_times_div = outer_positions
                company = company_div.find_element(By.TAG_NAME,"span").text
                company_linkedin_url = ""
                print(colored(company, 'yellow'))

                positions_list = position_summary_text.find_element(By.CLASS_NAME, "pvs-list").find_element(By.CLASS_NAME, "pvs-list")

                for position in positions_list.find_elements(By.XPATH,"*"):
                    print(colored('count position', "yellow"))
                    position = position.find_element(By.CLASS_NAME,"pvs-entity")
                    position_details_list = position.find_elements(By.XPATH,"*")[1].find_elements(By.XPATH,"*")

                    position_summary_details = position_details_list[0] if len(position_details_list) > 0 else None
                    position_summary_text = position_details_list[1] if len(position_details_list) > 1 else None # skills OR list of positions
                    outer_positions = position_summary_details.find_element(By.XPATH,"*").find_elements(By.XPATH,"*")

                    if len(outer_positions) == 3:
                        position_title = outer_positions[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].find_elements(By.XPATH,"*")[0].text
                        print(colored(position_title, 'yellow'))
                        work_times = outer_positions[1].find_element(By.TAG_NAME,"span").text
                        location = outer_positions[2].find_element(By.TAG_NAME,"span").text
                    else:
                        print('need fix.')

                    if 'work_times' not in locals() and 'work_times' not in globals():
                        work_times = None # modified
                    times = work_times.split("·")[0].strip() if work_times else ""
                    duration = work_times.split("·")[1].strip() if times != "" and len(work_times.split("·")) > 1 else None # modified

                    from_date = " ".join(times.split(" ")[:2]) if times else ""
                    to_date = " ".join(times.split(" ")[3:]) if times else ""

                    if position_summary_text and len(position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")) > 1:
                        descriptions = position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")
                        for description in descriptions:
                            res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*")
                            position_title_elem = res[0] if len(res) > 0 else None
                            work_times_elem = res[1] if len(res) > 1 else None
                            location_elem = res[2] if len(res) > 2 else None

                            location = location_elem.find_element(By.XPATH,"*").text if location_elem else None
                            position_title = position_title_elem.find_element(By.XPATH,"*").find_element(By.TAG_NAME,"*").text if position_title_elem else ""
                            work_times = work_times_elem.find_element(By.XPATH,"*").text if work_times_elem else ""
                            times = work_times.split("·")[0].strip() if work_times else ""
                            duration = work_times.split("·")[1].strip() if len(work_times.split("·")) > 1 else None
                            from_date = " ".join(times.split(" ")[:2]) if times else ""
                            to_date = " ".join(times.split(" ")[3:]) if times else ""

                            experience = Experience(
                                position_title=position_title,
                                from_date=from_date,
                                to_date=to_date,
                                duration=duration,
                                location=location,
                                description=description,
                                institution_name=company if 'company' in locals() or 'company' in globals() else "Not provided", #modified
                                linkedin_url=company_linkedin_url
                            )
                            self.add_experience(experience)
                    else:
                        description = position_summary_text.text if position_summary_text else ""

                        experience = Experience(
                            position_title=position_title,
                            from_date=from_date,
                            to_date=to_date,
                            duration=duration,
                            location=location,
                            description=description,
                            institution_name=company,
                            linkedin_url=company_linkedin_url
                        )
                        self.add_experience(experience)
                return


            if 'work_times' not in locals() and 'work_times' not in globals():
                work_times = None
            times = work_times.split("·")[0].strip() if work_times else ""
            duration = work_times.split("·")[1].strip() if times != "" and len(work_times.split("·")) > 1 else None

            from_date = " ".join(times.split(" ")[:2]) if times else ""
            to_date = " ".join(times.split(" ")[3:]) if times else ""

            if position_summary_text and len(position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")) > 1:
                descriptions = position_summary_text.find_element(By.CLASS_NAME,"pvs-list").find_element(By.CLASS_NAME,"pvs-list").find_elements(By.XPATH,"li")
                for description in descriptions:
                    res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*")
                    position_title_elem = res[0] if len(res) > 0 else None
                    work_times_elem = res[1] if len(res) > 1 else None
                    location_elem = res[2] if len(res) > 2 else None

                    location = location_elem.find_element(By.XPATH,"*").text if location_elem else None
                    position_title = position_title_elem.find_element(By.XPATH,"*").find_element(By.TAG_NAME,"*").text if position_title_elem else ""
                    work_times = work_times_elem.find_element(By.XPATH,"*").text if work_times_elem else ""
                    times = work_times.split("·")[0].strip() if work_times else ""
                    duration = work_times.split("·")[1].strip() if len(work_times.split("·")) > 1 else None
                    from_date = " ".join(times.split(" ")[:2]) if times else ""
                    to_date = " ".join(times.split(" ")[3:]) if times else ""

                    experience = Experience(
                        position_title=position_title,
                        from_date=from_date,
                        to_date=to_date,
                        duration=duration,
                        location=location,
                        description=description,
                        institution_name=company if 'company' in locals() or 'company' in globals() else "Not provided",
                        linkedin_url=company_linkedin_url
                    )
                    self.add_experience(experience)
            else:
                description = position_summary_text.text if position_summary_text else ""

                experience = Experience(
                    position_title=position_title,
                    from_date=from_date,
                    to_date=to_date,
                    duration=duration,
                    location=location,
                    description=description,
                    institution_name=company,
                    linkedin_url=company_linkedin_url
                )
                self.add_experience(experience)

This is from ~ a week ago, hopefully still working.

Kiru6ik · 2023-07-04T18:48:41Z

Still facing same issue even with this update

joeyism · 2023-07-04T20:51:11Z

I just deployed a fix. Please try with v2.11.2 please

Kiru6ik · 2023-07-04T21:23:41Z

Thanks it works, I tested it on 2 profiles but havent tested at scale yet.
I am new to git; I dont know how to submit a pr but the company.py doesnt work either.
Updates needed are:

Change class name to mb6 in line 210: grid = driver.find_element(By.CLASS_NAME, "mb6") # used to be artdeco-card.p5.mb4
Change class name to mb1 in line 241: grid = driver.find_element(By.CLASS_NAME, "mb1") # used to be mt1

And now it works for me

arpit5292 · 2023-08-16T06:52:22Z

Thanks it works, I tested it on 2 profiles but havent tested at scale yet. I am new to git; I dont know how to submit a pr but the company.py doesnt work either. Updates needed are:

Change class name to mb6 in line 210: grid = driver.find_element(By.CLASS_NAME, "mb6") # used to be artdeco-card.p5.mb4

Change class name to mb1 in line 241: grid = driver.find_element(By.CLASS_NAME, "mb1") # used to be mt1

And now it works for me

hi it is not working i have changed as it shown properties "https://www.linkedin.com/company/google" i was checking

Kiru6ik · 2023-08-16T15:47:30Z

The way I troubleshooted it is:

Try to identify the part of the scraping that is failing
See the error
Try to understand what it is doing and what its function
Find the block that this part is trying to find(might be challenging as sometimes its not clear)
Find the new element name etc
You can send the full error message and I can try helping out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile scraping error: res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*") #178

Profile scraping error: res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*") #178

Kiru6ik commented Jun 26, 2023

joeyism commented Jun 26, 2023

Kiru6ik commented Jun 26, 2023 •

edited by joeyism

Loading

khamamoto6 commented Jun 28, 2023 •

edited

Loading

Kiru6ik commented Jul 4, 2023

joeyism commented Jul 4, 2023

Kiru6ik commented Jul 4, 2023

arpit5292 commented Aug 16, 2023

Kiru6ik commented Aug 16, 2023

Profile scraping error: res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*") #178

Profile scraping error: res = description.find_element(By.TAG_NAME,"a").find_elements(By.XPATH,"*") #178

Comments

Kiru6ik commented Jun 26, 2023

joeyism commented Jun 26, 2023

Kiru6ik commented Jun 26, 2023 • edited by joeyism Loading

khamamoto6 commented Jun 28, 2023 • edited Loading

Kiru6ik commented Jul 4, 2023

joeyism commented Jul 4, 2023

Kiru6ik commented Jul 4, 2023

arpit5292 commented Aug 16, 2023

Kiru6ik commented Aug 16, 2023

Kiru6ik commented Jun 26, 2023 •

edited by joeyism

Loading

khamamoto6 commented Jun 28, 2023 •

edited

Loading