Skip to content
This repository has been archived by the owner on Jan 1, 2023. It is now read-only.

Getting text and attribute from a website #150

Open
marcpre opened this issue Sep 12, 2021 · 0 comments
Open

Getting text and attribute from a website #150

marcpre opened this issue Sep 12, 2021 · 0 comments

Comments

@marcpre
Copy link

marcpre commented Sep 12, 2021

I am using "@nesk/puphpeteer": "^2.0.0" and want get the text and the href-attribute from a link.

I tried the following:

<?php

require_once '../vendor/autoload.php';

use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

$debug = true;

$puppeteer = new Puppeteer([
    'read_timeout' => 100,
    'debug' => $debug,
]);
$browser = $puppeteer->launch([
    'headless' => !$debug,
    'ignoreHTTPSErrors' => true,
]);

$page = $browser->newPage();
$page->goto('http://example.python-scraping.com/');

//get text and link
$links = $page->querySelectorXPath('//*[@id="results"]/table/tbody/tr/td/div/a', JsFunction::createWithParameters(['node'])
    ->body('return node.textContent;'));

// get single text
$singleText = $page->querySelectorXPath('//*[@id="pagination"]/a', JsFunction::createWithParameters(['node'])
    ->body('return node.textContent;'));

$browser->close();

When I run the above script I get the nodes from the page, BUT I cannot access the attributes or the text?

Any suggestions how to do this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant