Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(chromeredir) added some limits to number of open chrome pages #12

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 31 additions & 29 deletions chromeredir/checkredir.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
const puppeteer = require('puppeteer')
const readline = require('readline');

const MAX_WORKERS = 20
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
Expand All @@ -10,45 +11,46 @@ const rl = readline.createInterface({
var urls = []
var reading = true
rl.on('line', async (url) => {
urls.push(url) // start queuing the read urls right away
urls.push(url) // start queuing the read urls right away
});

(async ()=> { // kick off an async "thread" to read from the queue
(async ()=> { // kick off an async "thread" to read from the queue
const browser = await puppeteer.launch({ignoreHTTPSErrors: true}) // build the browser once
let working = new Set() // maybe not the most memory efficient to make two datastructures
while (urls.length) { // but the list as a queue is helpful and the set is helpful for different reasons
let url = urls.shift() // grab the first URL
working.add(url) // mark that we are working on that URL
; // so we can call another async func inline
(async () => { // check the redirects in another "thread" so we can check multiple at a time
const page = await browser.newPage()
try {
await page.goto(url)
var destination = await page.evaluate(() => {
return {"domain": document.domain, "href": document.location.href}
})
let working = new Set() // maybe not the most memory efficient to make two datastructures
for (let i = 0; i < MAX_WORKERS; i++) { // but the list as a queue is helpful and the set is helpful for different reasons
(async ()=>{ // only make MAX_WORKERS tasks ("threads") so we do not crash chrome
const page = await browser.newPage()
while (urls.length) {
let url = urls.shift() // grab the first URL
working.add(url) // mark that we are working on that URL
try {
await page.goto(url)
var destination = await page.evaluate(() => {
return {"domain": document.domain, "href": document.location.href}
})

var u = new URL(url)
var u = new URL(url)

if (u.host != destination.domain){
console.log(`${url} redirects to ${destination.href}`)
} else {
console.log(`${url} does not redirect`)
if (u.host != destination.domain){
console.log(`${url} redirects to ${destination.href}`)
} else {
console.log(`${url} does not redirect`)
}
} catch {
// should an error just pass?
console.log(`error checking ${url}`)
} finally {
working.delete(url) // we are no longer working on that URL
}
}
} catch {
// should an error just pass?
console.log(`error checking ${url}`)
} finally {
await page.close() // clean up the page object (we make a new one for each URL)
working.delete(url) // we are no longer working on that URL
if (!reading && !working.size) { // I think this will prevent premature browser closure and issues with list/set desync
await page.close() // clean up the page object, potentially an issue if page crashes in loop
if (!reading && !working.size) {// I think this will prevent premature browser closure and issues with list/set desync
browser.close()
}
}
})()
})()
}
})()

rl.on('close', async () => {
reading = false // make sure that our queue and set do not get desynced
reading = false // make sure that our queue and set do not get desynced
})