Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lychee link checker from built website version #855

Merged
merged 39 commits into from
Mar 8, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c692d83
docs: add lychee github action for test purposes
TC-MO Jan 17, 2024
f8a7759
bump checkout action & add .lycheeignore file
TC-MO Jan 17, 2024
aa3ce3e
fix broken regex
TC-MO Jan 17, 2024
56599f6
fix typo in file name
TC-MO Jan 17, 2024
87c8ba4
add yt to ignored links
TC-MO Jan 17, 2024
4352529
add base URL flag
TC-MO Jan 17, 2024
3b7986f
add regex to ignore image links
TC-MO Jan 17, 2024
da2a79a
add webp format to ignored
TC-MO Jan 17, 2024
5d5dde3
add svg format to ignored
TC-MO Jan 17, 2024
f800ecf
set sources as start point for lychee - test
TC-MO Jan 17, 2024
7b9f40d
remove trailing slash from base arg
TC-MO Jan 17, 2024
c6d3107
Merge branch 'master' into lychee-test
TC-MO Jan 23, 2024
757dd5d
Merge branch 'master' into lychee-test
TC-MO Feb 5, 2024
926a67e
fix: use the build website version for link checking
barjin Feb 12, 2024
e03be6a
add edit in github links to ignored by lychee
TC-MO Feb 20, 2024
2b90b9f
add new ignore
TC-MO Feb 20, 2024
a4448c0
add new ignore
TC-MO Feb 22, 2024
04c9609
fix ignore
TC-MO Feb 22, 2024
22a88b2
Merge branch 'master' into feat/lychee-link-checker
TC-MO Feb 22, 2024
4af0ea2
fix broken links
TC-MO Feb 22, 2024
d865680
fix broken links * add new ignores
TC-MO Feb 26, 2024
4e0b297
fix broken links
TC-MO Feb 26, 2024
2f88c43
fix broken links
TC-MO Feb 26, 2024
3216550
test broken link
TC-MO Feb 26, 2024
78acbc2
add new ignore & new arguments
TC-MO Feb 29, 2024
991c904
Merge branch 'master' into feat/lychee-link-checker
TC-MO Feb 29, 2024
d9a30b8
fix exclude & dead links
TC-MO Feb 29, 2024
8b703fd
fix chrome web store links
TC-MO Feb 29, 2024
84c8afe
add new ignore
TC-MO Feb 29, 2024
f0b8104
add og-image to ignores
TC-MO Feb 29, 2024
d2e4d5c
fix: update the remaining links (trailing slash, random md loader qui…
barjin Mar 6, 2024
5291407
chore: don't link check node_modules
barjin Mar 6, 2024
0deb1cf
change max retries value to 6
TC-MO Mar 7, 2024
d9831c3
comment out restricted google spreadsheet link
TC-MO Mar 8, 2024
5d202b1
Merge branch 'master' into feat/lychee-link-checker
TC-MO Mar 8, 2024
70bc6d8
fix vale issues
TC-MO Mar 8, 2024
7784732
chore: fix spelling
barjin Mar 8, 2024
b55b8b6
chore: add 429 to accepted HTTP statuses
barjin Mar 8, 2024
9725cee
fix: revert the default accepted status codes
barjin Mar 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/styles/Apify/Capitalization.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ message: "The word '%s' should always be capitalized."
ignorecase: false
level: error
tokens:
- '\bactor\b'
- '\bactors\b'
- '(?<!\W)\bactor\b'
- '(?<!\W)\bactors\b'
- '(?<!@)\bapify\b(?!-\w+)'
- '(?<!\()\bhttps?://[^\s]*\bapify\b[^\s]*\b(?!\))|(?<!\[)\bhttps?://[^\s]*\bapify\b[^\s]*\b(?!\])'

Expand Down
34 changes: 34 additions & 0 deletions .github/workflows/lychee.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Lychee Link Checker

on: [pull_request]

jobs:
link-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Use Node.js 18
uses: actions/setup-node@v4
with:
node-version: 18
cache: 'npm'
cache-dependency-path: 'package-lock.json'
always-auth: 'true'
registry-url: 'https://npm.pkg.github.com/'
scope: '@apify-packages'

- name: Build docs
run: |
npm ci --force
npm run build
env:
APIFY_SIGNING_TOKEN: ${{ secrets.APIFY_SIGNING_TOKEN }}
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}

- uses: lycheeverse/[email protected]
env:
GITHUB_TOKEN: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
with:
fail: true
args: --base https://docs.apify.com --exclude-path 'build/versions.html' --exclude-path 'node_modules' --max-retries 6 --verbose --no-progress './**/*.html'
9 changes: 9 additions & 0 deletions .lycheeignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
http:\/\/localhost:3000.*
https:\/\/www\.youtube.*
\.(jpg|jpeg|png|gif|bmp|webp|svg)$
https:\/\/github\.com\/apify\/apify-docs\/edit\/[^ ]*a
https:\/\/docs\.apify\.com\/assets\/[^ ]*
file:\/\/\/.*
https://chrome\.google\.com/webstore/.*
https?:\/\/(www\.)?npmjs\.com\/.*
^https://apify\.com/api/og-image.*
4 changes: 2 additions & 2 deletions apify-docs-theme/src/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ const themeConfig = ({
items: [
{
label: 'Reference',
href: `${absoluteUrl}/api/v2/`,
href: `${absoluteUrl}/api/v2`,
target: '_self',
rel: 'dofollow',
},
Expand Down Expand Up @@ -170,7 +170,7 @@ const themeConfig = ({
items: [
{
label: 'Reference',
href: `${absoluteUrl}/api/v2/`,
href: `${absoluteUrl}/api/v2`,
target: '_self',
rel: 'dofollow',
},
Expand Down
2 changes: 1 addition & 1 deletion sources/academy/glossary/concepts/http_cookies.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ HTTP cookies are small pieces of data sent by the server to the user's web brows
2. To make the website show location-specific data (works for websites where you could set a zip code or country directly on the page, but unfortunately doesn't work for some location-based ads).
3. To make the website less suspicious of the crawler and let the crawler's traffic blend in with regular user traffic.

For local testing, we recommend using the [**EditThisCookie**](https://chrome.google.com/webstore/detail/editthiscookie/fngmhnnpilhplaeedifhccceomclgfbg?hl=en) Chrome extension.
For local testing, we recommend using the [**EditThisCookie**](https://chrome.google.com/webstore/detail/fngmhnnpilhplaeedifhccceomclgfbg) Chrome extension.
2 changes: 1 addition & 1 deletion sources/academy/glossary/tools/modheader.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

If you read about [Postman](./postman.md), you might remember that you can use it to modify request headers before sending a request. This is great, but the main problem is that Postman can only make static requests - meaning, it is unable to load JavaScript or any [dynamic content](../concepts/dynamic_pages.md).

[ModHeader](https://chrome.google.com/webstore/detail/modheader/idgpnmonknjnojddfkpgkljpfnnfcklj?hl=en) is a Chrome extension which can be used to modify the HTTP headers of the requests you make with your browser. This means that, for example, if your scraper using a headless browser Puppeteer is being blocked due to an improper **User-Agent** header, you can use ModHeader to test the target website and quickly solve the issue.
[ModHeader](https://chrome.google.com/webstore/detail/idgpnmonknjnojddfkpgkljpfnnfcklj) is a Chrome extension which can be used to modify the HTTP headers of the requests you make with your browser. This means that, for example, if your scraper using a headless browser Puppeteer is being blocked due to an improper **User-Agent** header, you can use ModHeader to test the target website and quickly solve the issue.

Check warning on line 16 in sources/academy/glossary/tools/modheader.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/glossary/tools/modheader.md#L16

[write-good.Passive] 'be used' may be passive voice. Use active voice if you can.
Raw output
{"message": "[write-good.Passive] 'be used' may be passive voice. Use active voice if you can.", "location": {"path": "sources/academy/glossary/tools/modheader.md", "range": {"start": {"line": 16, "column": 121}}}, "severity": "WARNING"}

Check warning on line 16 in sources/academy/glossary/tools/modheader.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/glossary/tools/modheader.md#L16

[write-good.TooWordy] 'modify' is too wordy.
Raw output
{"message": "[write-good.TooWordy] 'modify' is too wordy.", "location": {"path": "sources/academy/glossary/tools/modheader.md", "range": {"start": {"line": 16, "column": 132}}}, "severity": "WARNING"}

Check warning on line 16 in sources/academy/glossary/tools/modheader.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/glossary/tools/modheader.md#L16

[write-good.Passive] 'being blocked' may be passive voice. Use active voice if you can.
Raw output
{"message": "[write-good.Passive] 'being blocked' may be passive voice. Use active voice if you can.", "location": {"path": "sources/academy/glossary/tools/modheader.md", "range": {"start": {"line": 16, "column": 284}}}, "severity": "WARNING"}

Check warning on line 16 in sources/academy/glossary/tools/modheader.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/glossary/tools/modheader.md#L16

[Microsoft.Terms] Prefer 'Personal digital assistant' over 'Agent'.
Raw output
{"message": "[Microsoft.Terms] Prefer 'Personal digital assistant' over 'Agent'.", "location": {"path": "sources/academy/glossary/tools/modheader.md", "range": {"start": {"line": 16, "column": 324}}}, "severity": "WARNING"}

Check warning on line 16 in sources/academy/glossary/tools/modheader.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/glossary/tools/modheader.md#L16

[write-good.Weasel] 'quickly' is a weasel word!
Raw output
{"message": "[write-good.Weasel] 'quickly' is a weasel word!", "location": {"path": "sources/academy/glossary/tools/modheader.md", "range": {"start": {"line": 16, "column": 393}}}, "severity": "WARNING"}

Check warning on line 16 in sources/academy/glossary/tools/modheader.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/glossary/tools/modheader.md#L16

[Microsoft.Adverbs] Consider removing 'quickly'.
Raw output
{"message": "[Microsoft.Adverbs] Consider removing 'quickly'.", "location": {"path": "sources/academy/glossary/tools/modheader.md", "range": {"start": {"line": 16, "column": 393}}}, "severity": "WARNING"}

## The ModHeader interface {#interface}

Expand Down
2 changes: 1 addition & 1 deletion sources/academy/glossary/tools/switchyomega.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

---

SwitchyOmega is a Chrome extension for managing and switching between proxies which can be added in the [Chrome Webstore](https://chrome.google.com/webstore/detail/proxy-switchyomega/padekgcemlokbadohgkifijomclgjgif).
SwitchyOmega is a Chrome extension for managing and switching between proxies which can be added in the [Chrome Webstore](https://chrome.google.com/webstore/detail/padekgcemlokbadohgkifijomclgjgif).

Check warning on line 14 in sources/academy/glossary/tools/switchyomega.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/glossary/tools/switchyomega.md#L14

[write-good.Passive] 'be added' may be passive voice. Use active voice if you can.
Raw output
{"message": "[write-good.Passive] 'be added' may be passive voice. Use active voice if you can.", "location": {"path": "sources/academy/glossary/tools/switchyomega.md", "range": {"start": {"line": 14, "column": 89}}}, "severity": "WARNING"}

After adding it to Chrome, you can see the SwitchyOmega icon somewhere amongst all your other Chrome extension icons. Clicking on it will display a menu, where you can select various differnt connection profiles, as well as open the extension's options.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Before moving on, give these valuable resources a quick lookover:
- Refamiliarize with the various available data on the [Request object](https://crawlee.dev/api/core/class/Request).
- Learn about the [`failedRequestHandler` function](https://crawlee.dev/api/browser-crawler/interface/BrowserCrawlerOptions#failedRequestHandler).
- Understand how to use the [`errorHandler`](https://crawlee.dev/api/browser-crawler/interface/BrowserCrawlerOptions#errorHandler) function to handle request failures.
- Ensure you are comfortable using [key-value stores](/sdk/js/docs/guides/data-storage#key-value-store) and [datasets](/sdk/js/docs/api/dataset#__docusaurus), and understand the differences between the two storage types.
- Ensure you are comfortable using [key-value stores](/sdk/js/docs/guides/result-storage#key-value-store) and [datasets](/sdk/js/docs/guides/result-storage#dataset), and understand the differences between the two storage types.

## Knowledge check 📝 {#quiz}

Expand Down
4 changes: 2 additions & 2 deletions sources/academy/platform/get_most_of_actors/actor_readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
3. **How much will it cost to scrape (target site)?**

- Simple text explaining what type of proxies are needed and how many platform credits (calculated mainly from consumption units) are needed for 1000 results.
- This is calculated from carrying out several runs (or from runs saved in the DB). @Zuzka can help if needed. [Information in this table](https://docs.google.com/spreadsheets/d/1NOkob1eYqTsRPTVQdltYiLUsIipvSFXswRcWQPtCW9M/edit#gid=1761542436), tab "cost of usage".
- This is calculated from carrying out several runs (or from runs saved in the DB).<!-- @Zuzka can help if needed. [Information in this table](https://docs.google.com/spreadsheets/d/1NOkob1eYqTsRPTVQdltYiLUsIipvSFXswRcWQPtCW9M/edit#gid=1761542436), tab "cost of usage". -->
- Here’s an example for this section:

> ## How much will it cost me to scrape Google Maps reviews?
Expand Down Expand Up @@ -94,4 +94,4 @@

## Next up {#next}

If you followed all the tips described above, your Actor README is almost good to go! In the [next lesson](./guidelines_for_writing.md) we will give you a few instructions on how you can create a tutorial for your Actor.
If you followed all the tips described above, your Actor README is almost good to go! In the [next lesson](./guidelines_for_writing.md) we will give you a few instructions on how you can create a tutorial for your Actor.

Check warning on line 97 in sources/academy/platform/get_most_of_actors/actor_readme.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/platform/get_most_of_actors/actor_readme.md#L97

[write-good.Weasel] 'few' is a weasel word!
Raw output
{"message": "[write-good.Weasel] 'few' is a weasel word!", "location": {"path": "sources/academy/platform/get_most_of_actors/actor_readme.md", "range": {"start": {"line": 97, "column": 156}}}, "severity": "WARNING"}
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

---

The most popular way of [integrating](https://help.apify.com/en/collections/1669767-integrating-with-apify) the Apify platform with an external project/application is by programmatically running an [Actor](/platform/actors) or [task](/platform/actors/running/tasks), waiting for it to complete its run, then collecting its data and using it within the project. Though this process sounds somewhat complicated, it's actually quite easy to do; however, due to the plethora of features offered on the Apify platform, new users may not be sure how exactly to implement this type of integration. So, let's dive in and see how you can do it.
The most popular way of [integrating](https://help.apify.com/en/collections/1669769-integrations) the Apify platform with an external project/application is by programmatically running an [Actor](/platform/actors) or [task](/platform/actors/running/tasks), waiting for it to complete its run, then collecting its data and using it within the project. Though this process sounds somewhat complicated, it's actually quite easy to do; however, due to the plethora of features offered on the Apify platform, new users may not be sure how exactly to implement this type of integration. So, let's dive in and see how you can do it.

Check warning on line 12 in sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md#L12

[write-good.TooWordy] 'however' is too wordy.
Raw output
{"message": "[write-good.TooWordy] 'however' is too wordy.", "location": {"path": "sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md", "range": {"start": {"line": 12, "column": 433}}}, "severity": "WARNING"}

Check warning on line 12 in sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md#L12

[write-good.Weasel] 'exactly' is a weasel word!
Raw output
{"message": "[write-good.Weasel] 'exactly' is a weasel word!", "location": {"path": "sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md", "range": {"start": {"line": 12, "column": 535}}}, "severity": "WARNING"}

Check warning on line 12 in sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md#L12

[write-good.TooWordy] 'implement' is too wordy.
Raw output
{"message": "[write-good.TooWordy] 'implement' is too wordy.", "location": {"path": "sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md", "range": {"start": {"line": 12, "column": 546}}}, "severity": "WARNING"}

Check warning on line 12 in sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md#L12

[write-good.TooWordy] 'type of' is too wordy.
Raw output
{"message": "[write-good.TooWordy] 'type of' is too wordy.", "location": {"path": "sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md", "range": {"start": {"line": 12, "column": 561}}}, "severity": "WARNING"}

Check failure on line 12 in sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md#L12

[write-good.So] Don't start a sentence with 'So,'.
Raw output
{"message": "[write-good.So] Don't start a sentence with 'So,'.", "location": {"path": "sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md", "range": {"start": {"line": 12, "column": 582}}}, "severity": "ERROR"}

> Remember to check out our [API documentation](/api/v2) with examples in different languages and a live API console. We also recommend testing the API with a nice desktop client like [Postman](https://www.getpostman.com/) or [Insomnia](https://insomnia.rest).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jq.src = 'https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js';
document.getElementsByTagName('head')[0].appendChild(jq);
```

If that doesn't work because of CORS violation, you can install [this extension](https://chrome.google.com/webstore/detail/jquery-inject/iibfbhlfimdnkinkcenncoeejnmpemof) that injects jQuery on a button click.
If that doesn't work because of CORS violation, you can install [this extension](https://chrome.google.com/webstore/detail/ekkjohcjbjcjjifokpingdbdlfekjcgi) that injects jQuery on a button click.

There are 2 main ways how to test a pageFunction code in your console:

Expand Down
2 changes: 1 addition & 1 deletion sources/academy/tutorials/node_js/optimizing_scrapers.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Now, if you want to build your own game and you are not a C/C++ veteran with a t

## Back to scrapers {#back-to-scrapers}

What are the engines of the scraping world? A [browser](https://github.com/puppeteer/puppeteer/blob/master/docs/api.md), an [HTTP library](https://www.npmjs.com/package/@apify/http-request), an [HTML parser](https://github.com/cheeriojs/cheerio), and a [JSON parser](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse). The CPU spends more than 99% of its workload in these libraries. As with engines, you are not likely gonna write these from scratch - instead you'll use something like [Crawlee](https://crawlee.dev) that handles a lot of the overheads for you.
What are the engines of the scraping world? A [browser](https://github.com/puppeteer/puppeteer?tab=readme-ov-file#puppeteer), an [HTTP library](https://www.npmjs.com/package/@apify/http-request), an [HTML parser](https://github.com/cheeriojs/cheerio), and a [JSON parser](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse). The CPU spends more than 99% of its workload in these libraries. As with engines, you are not likely gonna write these from scratch - instead you'll use something like [Crawlee](https://crawlee.dev) that handles a lot of the overheads for you.

It is about how you use these tools. The small amount of code you write in your [`requestHandler`](https://crawlee.dev/api/http-crawler/interface/HttpCrawlerOptions#requestHandler) is absolutely insignificant compared to what is running inside these tools. In other words, it doesn't matter how many functions you call or how many variables you extract. If you want to optimize your scrapers, you need to choose the lightweight option from the tools and use it as little as possible. A crawler scraping only JSON API can be as much as 200 times faster/cheaper than a browser based solution.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

Sometimes, you need your custom code to run before any other code is run on the page. Perhaps you need to modify an object's prototype, or even re-define certain global variables before they are used by the page's native scripts.

Luckily, Puppeteer and Playwright both have functions for this. In Puppeteer, we use the [`page.evaluateOnNewDocument()`](https://puppeteer.github.io/puppeteer/docs/puppeteer.page.evaluateonnewdocument/) function, while in Playwright we use [`page.addInitScript()`](https://playwright.dev/docs/api/class-page#page-add-init-script). We'll use these functions to override the native `addEventListener` function, setting it to a function that does nothing. This will prevent event listeners from being added to elements.
Luckily, Puppeteer and Playwright both have functions for this. In Puppeteer, we use the [`page.evaluateOnNewDocument()`](https://pptr.dev/api/puppeteer.page.evaluateonnewdocument) function, while in Playwright we use [`page.addInitScript()`](https://playwright.dev/docs/api/class-page#page-add-init-script). We'll use these functions to override the native `addEventListener` function, setting it to a function that does nothing. This will prevent event listeners from being added to elements.

Check warning on line 25 in sources/academy/webscraping/puppeteer_playwright/executing_scripts/injecting_code.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/webscraping/puppeteer_playwright/executing_scripts/injecting_code.md#L25

[write-good.Weasel] 'Luckily' is a weasel word!
Raw output
{"message": "[write-good.Weasel] 'Luckily' is a weasel word!", "location": {"path": "sources/academy/webscraping/puppeteer_playwright/executing_scripts/injecting_code.md", "range": {"start": {"line": 25, "column": 1}}}, "severity": "WARNING"}

Check warning on line 25 in sources/academy/webscraping/puppeteer_playwright/executing_scripts/injecting_code.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/webscraping/puppeteer_playwright/executing_scripts/injecting_code.md#L25

[write-good.Passive] 'being added' may be passive voice. Use active voice if you can.
Raw output
{"message": "[write-good.Passive] 'being added' may be passive voice. Use active voice if you can.", "location": {"path": "sources/academy/webscraping/puppeteer_playwright/executing_scripts/injecting_code.md", "range": {"start": {"line": 25, "column": 472}}}, "severity": "WARNING"}

<Tabs groupId="main">
<TabItem value="Playwright" label="Playwright">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@

With `page.click()`, Puppeteer and Playwright actually drag the mouse and click, allowing the bot to act more human-like. This is different from programmatically clicking with `Element.click()` in vanilla client-side JavaScript.

Notice that in the Playwright example, we are using a different selector than in the Puppeteer example. This is because Playwright supports [many custom CSS selectors](https://playwright.dev/docs/selectors#text-selector), such as the **has-text** pseudo class. As a rule of thumb, using text selectors is much more preferable to using regular selectors, as they are much less likely to break. If Google makes the sibling above the **I agree** button a `<div>` element instead of a `<button>` element, our `button + button` selector will break. However, the button will always have the text **I agree**; therefore, `button:has-text("I agree")` is more reliable.
Notice that in the Playwright example, we are using a different selector than in the Puppeteer example. This is because Playwright supports [many custom CSS selectors](https://playwright.dev/docs/other-locators#css-matching-by-text), such as the **has-text** pseudo class. As a rule of thumb, using text selectors is much more preferable to using regular selectors, as they are much less likely to break. If Google makes the sibling above the **I agree** button a `<div>` element instead of a `<button>` element, our `button + button` selector will break. However, the button will always have the text **I agree**; therefore, `button:has-text("I agree")` is more reliable.

Check warning on line 58 in sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md#L58

[Microsoft.FirstPerson] Use first person (such as ' I ') sparingly.
Raw output
{"message": "[Microsoft.FirstPerson] Use first person (such as ' I ') sparingly.", "location": {"path": "sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md", "range": {"start": {"line": 58, "column": 1}}}, "severity": "WARNING"}

Check warning on line 58 in sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md#L58

[write-good.Weasel] 'many' is a weasel word!
Raw output
{"message": "[write-good.Weasel] 'many' is a weasel word!", "location": {"path": "sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md", "range": {"start": {"line": 58, "column": 142}}}, "severity": "WARNING"}

Check warning on line 58 in sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md#L58

[write-good.Weasel] 'likely' is a weasel word!
Raw output
{"message": "[write-good.Weasel] 'likely' is a weasel word!", "location": {"path": "sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md", "range": {"start": {"line": 58, "column": 389}}}, "severity": "WARNING"}

Check warning on line 58 in sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md#L58

[write-good.TooWordy] 'However' is too wordy.
Raw output
{"message": "[write-good.TooWordy] 'However' is too wordy.", "location": {"path": "sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md", "range": {"start": {"line": 58, "column": 557}}}, "severity": "WARNING"}

Check warning on line 58 in sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md

View workflow job for this annotation

GitHub Actions / vale

[vale] sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md#L58

[write-good.TooWordy] 'therefore' is too wordy.
Raw output
{"message": "[write-good.TooWordy] 'therefore' is too wordy.", "location": {"path": "sources/academy/webscraping/puppeteer_playwright/page/interacting_with_a_page.md", "range": {"start": {"line": 58, "column": 616}}}, "severity": "WARNING"}

> If you're not already familiar with CSS selectors and how to find them, we recommend referring to [this lesson](../../web_scraping_for_beginners/data_extraction/using_devtools.md) in the **Web scraping for beginners** course.

Expand Down
Loading
Loading