Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled reject Error: Failed to load url #188

Open
winghouchan opened this issue May 27, 2016 · 62 comments
Open

Unhandled reject Error: Failed to load url #188

winghouchan opened this issue May 27, 2016 · 62 comments

Comments

@winghouchan
Copy link

@minhchu has reported a bug they are facing where an Unhandled rejection Error: Failed to load url is thrown. This is similar to #180 however the case is not due to SlimerJS in which #180 is specifically for. This new issue will track the new case @minhchu is facing.

The test case which can reproduce the issue is below:

var Horseman = require('node-horseman');
var horseman = new Horseman();

horseman
  .userAgent('Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0')
  .open('https://www.google.com/search?site=&tbm=isch&source=hp&biw=1366&bih=640&q=moon')
  .screenshot('example.jpg')
  .close();

The output is below:

  horseman using PhantomJS from $PATH +0ms
  horseman .setup() creating phantom instance on 12406 +4ms
  horseman phantom created +673ms
  horseman phantom version 2.1.1 +17ms
  horseman page created +11ms
  horseman phantomjs onLoadFinished triggered +13ms success NaN
  horseman injected jQuery +46ms
  horseman .userAgent() set +20ms Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0
  horseman .open() +1ms https://www.google.com/search?site=&tbm=isch&source=hp&biw=1366&bih=640&q=moon
  horseman phantomjs onLoadFinished triggered +899ms success 1
  horseman phantomjs onLoadFinished triggered +103ms fail 2
  horseman injected jQuery +1ms
  horseman .close(). +4ms
Unhandled rejection Error: Failed to load url
    at checkStatus (/Users/user/Projects/node_modules/node-horseman/lib/index.js:276:16)
    at tryCatcher (/Users/user/Projects/node_modules/bluebird/js/release/util.js:16:23)
    at Function.Promise.attempt.Promise.try (/Users/user/Projects/node_modules/bluebird/js/release/method.js:39:29)
    at Object.loadFinishedSetup [as onLoadFinished] (/Users/user/Projects/node_modules/node-horseman/lib/index.js:274:43)
    at /Users/user/Projects/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
    at Array.forEach (native)
    at IncomingMessage.<anonymous> (/Users/user/Projects/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
    at emitNone (events.js:72:20)
    at IncomingMessage.emit (events.js:166:7)
    at endReadableNT (_stream_readable.js:913:12)
    at nextTickCallbackWith2Args (node.js:442:9)
    at process._tickCallback (node.js:356:17)
@winghouchan
Copy link
Author

Duplicates #187 😵 which @minhchu was able to submit 40 seconds before this one.

@winghouchan
Copy link
Author

It's interesting that the onLoadFinished event is triggered twice:

  horseman phantomjs onLoadFinished triggered +899ms success 1
  horseman phantomjs onLoadFinished triggered +103ms fail 2

One with success and the other with fail.

@winghouchan
Copy link
Author

Test case above submitted by @minhchu does not reliably reproduce the issue. 😕

@winghouchan
Copy link
Author

Another case where the onLoadFinished event was triggered more than once:

  horseman using PhantomJS from $PATH +0ms
  horseman .setup() creating phantom instance on 12406 +5ms
  horseman phantom created +492ms
  horseman phantom version 2.1.1 +19ms
  horseman page created +11ms
  horseman phantomjs onLoadFinished triggered +14ms success NaN
  horseman injected jQuery +28ms
  horseman .userAgent() set +19ms Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0
  horseman .open() +1ms https://www.google.com/search?site=&tbm=isch&source=hp&biw=1366&bih=640&q=moon
  horseman phantomjs onLoadFinished triggered +904ms success 1
  horseman phantomjs onLoadFinished triggered +65ms fail 2
  horseman phantomjs onLoadFinished triggered +3ms success 3
  horseman injected jQuery +0ms
  horseman .close(). +5ms
  horseman jQuery not injected - already exists on page +10ms
Unhandled rejection Error: Failed to load url
    at checkStatus (/Users/user/Projects/node_modules/node-horseman/lib/index.js:276:16)
    at Object.loadFinishedSetup [as onLoadFinished] (/Users/user/Projects/node_modules/node-horseman/lib/index.js:274:43)
    at /Users/user/Projects/node_modules/node-phantom-simple/node-phantom-simple.js:636:30
    at Array.forEach (native)
    at IncomingMessage.<anonymous> (/Users/user/Projects/node_modules/node-phantom-simple/node-phantom-simple.js:617:17)
    at emitNone (events.js:72:20)
    at IncomingMessage.emit (events.js:166:7)
    at endReadableNT (_stream_readable.js:913:12)
    at nextTickCallbackWith2Args (node.js:442:9)
    at process._tickCallback (node.js:356:17)
From previous event:
    at Object.page.loadedPromise (/Users/user/Projects/node_modules/node-horseman/lib/index.js:233:27)
    at Horseman.<anonymous> (/Users/user/Projects/node_modules/node-horseman/lib/actions.js:70:26)
From previous event:
    at Horseman.exports.open (/Users/user/Projects/node_modules/node-horseman/lib/actions.js:60:20)
    at Horseman.(anonymous function) [as open] (/Users/user/Projects/node_modules/node-horseman/lib/index.js:398:17)
    at Horseman.<anonymous> (/Users/user/Projects/node_modules/node-horseman/lib/index.js:406:22)
    at processImmediate [as _immediateCallback] (timers.js:383:17)
From previous event:
    at Promise.HorsemanPromise.(anonymous function) [as open] (/Users/user/Projects/node_modules/node-horseman/lib/index.js:404:15)
    at Object.<anonymous> (/Users/user/Projects/foo.js:6:4)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Function.Module.runMain (module.js:441:10)
    at startup (node.js:139:18)
    at node.js:968:3

@minhchu
Copy link

minhchu commented May 27, 2016

wow, haha 😁

@awlayton
Copy link
Collaborator

This is an odd one. When I run the script it works fine and onLoadFInished is only triggered once after the open. What node version and operating system are you using?

@winghouchan
Copy link
Author

My Node.js version is v4.4.4 and my OS is OS X El Capitan v10.11.4.

@minhchu
Copy link

minhchu commented May 30, 2016

node -v 5.3.0, npm -v 3.3.12, Windows 7 64 bit.

@ianalexander
Copy link

For what it's worth: I'm having this issue as well but it is very indeterminate for me. I've been unable to reproduce it reliably. My code runs on a loop and I see this error pop up around 25% of the time. I've used this to help reproduce it:

function main() {
  // ...horseman do stuff...
  horseman.finally(() => {
    console.log(`Waiting ${process.env.DELAY} minute(s)...`);
    setTimeout(main, process.env.DELAY*60*1000);
  });
}

Hope this helps, and for the record:

$ node --version
v6.2.2
$ npm --version
3.9.5
$ npm list node-horseman
...
└── [email protected] 

@bmills22
Copy link

bmills22 commented Jul 14, 2016

I also have this issue. For me it happens more times than it works. After a click() and waitForNextPage(), it throws the failed url error.

node: v5.6.0
horseman: [email protected]
OS: Ubuntu 14.04LTS

@abagh0703
Copy link

abagh0703 commented Jul 19, 2016

I'm having this issue too: .value --> .click --> .waitForNextPage() --> "Failed to load url". Sometimes it happens after the click before it can even reach the wait. I can't identify any pattern in when it works and when it doesn't.

node: v4.2.3
npm: v3.10.3
horseman: v3.1.1
OS: Windows 10 64 bit

@sayanriju
Copy link

Same issue here. Works intermittently, but mostly it doesn't. Makes horseman unusable for /me. 😢

@awlayton
Copy link
Collaborator

Why can't you just catch the rejection @sayanriju? I have still been unable to reproduce this.

@grimaldello
Copy link

grimaldello commented Aug 3, 2016

Me too have this problem. It happens randomly.
I'm trying to login several user to a page using these statements:

var horseman = new Horseman(Global.horsemanOptions);
usersList.forEach(function(user){        
    horseman
               // Open the page
               .open(WEB_PAGE)
               .value('#loginform > input[name=username]', user.username)
               .value('#loginform > input[name=password]', user.password)
               .click('#loginform > button[type=submit]')
               .waitForNextPage()
               .screenshot(user.username + '.png')
               .then(function(){
                   console.log(user.username + ' LOGGED IN')
               })
               .....
               .....

});

Error seems to be raised for the sequence:

.click('#loginform > button[type=submit]')
.waitForNextPage()

because commenting those statements, error is not raised (but others yes).

The strange thing is that screenshots are taken correctly, they are images of a successfully login, but next statements are not correctly executed.

I'm running it in:

Arch Linux
nodejs: v6.3.1
node-horseman: 3.1.1"

Does anybody know a solution for that?

@johntitus
Copy link
Owner

@OrtoNormale I don't think this is related to the bug you're seeing, but - you're trying to use horseman, which is asynchronous, in a synchronous loop. Each .action() takes time, but by the time the first page is open, your loop has probably completed.

@grimaldello
Copy link

There was an error in my code snippet. At every iteration a new horseman object is initialized. The statement with new Horseman() has to be moved inside the loop. So every user has its own instance. But is'nt .open() waiting for page load before go on?

@johntitus
Copy link
Owner

Yes, but by the time the first user's page is open, the loop has likely completed, and the user variable will be the last user in the list. Horseman doesn't block the loop. You could move the horseman stuff inside a function, and then just send the user to the function. That should fix your scope issue.

function doStuff(user){
  var horseman = new Horseman(Global.horsemanOptions);
  horseman
               // Open the page
               .open(WEB_PAGE)
               .value('#loginform > input[name=username]', user.username)
               .value('#loginform > input[name=password]', user.password)
               .click('#loginform > button[type=submit]')
               .waitForNextPage()
               .screenshot(user.username + '.png')
               .then(function(){
                   console.log(user.username + ' LOGGED IN')
               })
}
usersList.forEach(doStuff);

@grimaldello
Copy link

Thanks for the answer. As soon as possible I'll try and report.

@grimaldello
Copy link

grimaldello commented Aug 4, 2016

Unfortunately problem persits. Here the code I tried:

var performLoginTask = function(usersList){

    var task = function(user){
        var horseman = new Horseman(Global.horsemanOptions);
        horseman
            .on('consoleMessage', function( msg ){
                console.log(msg);
            })
            // Open page
            .open(WEB_PAGE)
            .value('#loginform > input[name=username]', user.username)
            .value('#loginform > input[name=password]', user.password)
            .click('#loginform > button[type=submit]')
            .waitForNextPage()
            .....
            .....
            .....
    };

    usersList.forEach(task);


};

performLoginTask(usersList);

anyway it seems very similar to my version.

Error is the following:

Unhandled rejection Error: Failed to load url

and it's thrown randomly (sometimes yes and sometimes not) and from not a specific user everytime.

It's a really powerful tool and It is a shame not use this tool due to this problem.

@johntitus
Copy link
Owner

at the bottom of your horseman chain, can you add a .catch()?

On Thu, Aug 4, 2016 at 1:06 PM, OrtoNormale [email protected]
wrote:

Unfortunately problem persits. Here the code I tried:

var performLoginTask = function(usersList){

var task = function(user){
    var horseman = new Horseman(Global.horsemanOptions);
    horseman
        .on('consoleMessage', function( msg ){
            console.log(msg);
        })
        // Open page
        .open(WEB_PAGE)
        .value('#loginform > input[name=username]', user.username)
        .value('#loginform > input[name=password]', user.password)
        .click('#loginform > button[type=submit]')
        .waitForNextPage()
        .....
        .....
        .....
};

usersList.forEach(task);

};

performLoginTask(usersList);

anyway it seems very similar to my version.

Error is the following:

Unhandled rejection Error: Failed to load url

It's a really powerful tool and It is a shame not use this tool due to
this problem.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#188 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAkB_t3nKUgZCtgWygOAbxB2nlNgfFyvks5qchwYgaJpZM4IoXRH
.

@grimaldello
Copy link

grimaldello commented Aug 4, 2016

Added catch, but errors still throwed:

 var performLoginTask = function(usersList){

     var task = function(user){
         var horseman = new Horseman(Global.horsemanOptions);
         horseman
             .on('consoleMessage', function( msg ){
                 console.log(msg);
             })
             // Open page
             .open(WEB_PAGE)
             .value('#loginform > input[name=username]', user.username)
             .value('#loginform > input[name=password]', user.password)
             .click('#loginform > button[type=submit]')
             .waitForNextPage()
             .....
             .....
             .....
             .catch(function(error){console.log(error)})
     };

     usersList.forEach(task);


 };

 performLoginTask(usersList);

and console.log() in block catch print the same text as the one in throwed error.

@grimaldello
Copy link

Could the problem be related to a redirect (HTTP code 302) that is done after login?

@johntitus
Copy link
Owner

If it's being caught in the catch, can you just ignore it? Does it get
thrown for every user, or just for one or two?

On Thu, Aug 4, 2016 at 1:45 PM, OrtoNormale [email protected]
wrote:

Added catch, but errors still throwed:

var performLoginTask = function(usersList){

var task = function(user){
var horseman = new Horseman(Global.horsemanOptions);
horseman
.on('consoleMessage', function( msg ){
console.log(msg);
})
// Open page
.open(WEB_PAGE)
.value('#loginform > input[name=username]', user.username)
.value('#loginform > input[name=password]', user.password)
.click('#loginform > button[type=submit]')
.waitForNextPage()
.....
.....
.....
.catch(function(error){console.log(error)})
};

usersList.forEach(task);

};

and console.log() in block catch print the same as throwed error.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#188 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAkB_j-dTO_NJKqNpblufriSpNx6l25Oks5qciVRgaJpZM4IoXRH
.

@grimaldello
Copy link

The error is thrown only for one or two and randomly, not always the same users.

Ok, I'll try to handle that error in catch().

@johntitus
Copy link
Owner

How many users are you trying to do at the same time? I wonder if it's
blowing up Phantom because of memory issues...

On Thu, Aug 4, 2016 at 2:29 PM, OrtoNormale [email protected]
wrote:

The error is thrown only for one or two and randomly, not always the same
users.

Ok, I'll try to handle that error in catch().


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#188 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAkB_qWT3btK5-Z1IMtHA6TGCi7FY26Uks5qci9ugaJpZM4IoXRH
.

@grimaldello
Copy link

At the moment max 10 users, in future maybe more.
However It happens even using only 1 user sometimes.
For now I'm handling it in catch() block by recalling the task.
Not the best solution, but it works.

@shockey
Copy link

shockey commented Aug 5, 2016

Echoing what's been said in the earlier comments.

Horseman appears to hang after this happens. For me, it's reliably breaking my application.

Note that I receive three onLoadFinished events after my .open().

My output, from the .catch(err => console.log(err.stack || err)) I added, along with Horseman and Bluebird debug flags:

  horseman .keyboardEvent() +3ms keypress w null
  horseman .keyboardEvent() +4ms keypress X null
  horseman .keyboardEvent() +4ms keypress 8 null
  horseman .click() +0ms #sign-in-button
  horseman .click() done +22ms
  horseman .open() +1ms [redacted URL]
  horseman phantomjs onLoadFinished triggered +4ms fail 2
  horseman phantomjs onLoadFinished triggered +2ms fail 3
Error: Failed to GET url: [redacted URL]
    at checkStatus ([cwd]/node_modules/node-horseman/lib/actions.js:78:16)
    at [cwd]/node_modules/node-phantom-simple/node-phantom-simple.js:60:18
    at IncomingMessage.<anonymous> ([cwd]/node_modules/node-phantom-simple/node-phantom-simple.js:645:9)
    at emitNone (events.js:72:20)
    at IncomingMessage.emit (events.js:166:7)
    at endReadableNT (_stream_readable.js:903:12)
    at doNTCallback2 (node.js:439:9)
    at process._tickCallback (node.js:353:17)
From previous event:
    at Horseman.<anonymous> ([cwd]/node_modules/node-horseman/lib/actions.js:76:5)
From previous event:
    at Horseman.exports.open ([cwd]/node_modules/node-horseman/lib/actions.js:60:20)
    at Horseman.(anonymous function) [as open] ([cwd]/node_modules/node-horseman/lib/index.js:402:17)
    at Horseman.<anonymous> ([cwd]/node_modules/node-horseman/lib/index.js:410:22)
From previous event:
    at Promise.HorsemanPromise.(anonymous function) [as open] ([cwd]/node_modules/node-horseman/lib/index.js:408:15)
    at mainRoutine (app.js:30:6)
From previous event:
    at app.js:24:6
    at processImmediate [as _immediateCallback] (timers.js:368:17)
  horseman phantomjs onLoadFinished triggered +8s success 4

@shockey
Copy link

shockey commented Aug 5, 2016

Worth noting: I, like @OrtoNormale, am .open()ing a page that may be responding with a 302 Found.

This is just a guess... but Phantom following the Location header could result in a state inconsistency between Horseman and the Phantom instance, if Horseman doesn't take note of the page change.

@grimaldello
Copy link

grimaldello commented Aug 5, 2016

As @shockey says, also in my case horseman jump directly in catch(){...} after that error is thrown without executing the rest of the code

@johntitus
Copy link
Owner

Can everyone running into this issue try setting their horseman timeout to something big, like 30 seconds, and retrying?

@MegaThorx
Copy link

The Error: Failed to load url gets thrown when i use the .close() method. Idk why? And i still can't really catch the error.

@design2dev
Copy link

We are getting this too. What's the ETA on the fix?

@awlayton
Copy link
Collaborator

There is no ETA @design2dev, we still have no way to reproduce and be sure there even is a bug. Sometimes PhantomJS can legitimately fail to load a URL for some reason.

@overflowz
Copy link

I get it too.

horseman .open() +2ms [url edited]                                                                          
  horseman phantomjs onLoadFinished triggered +236ms fail 3                                                                                                                             
  horseman phantomjs onLoadFinished triggered +2ms success 4 
Error: Failed to GET url: [url edited]  

But instead of loop, it throws up randomly (re-run application). Also, why onLoadFinished is being triggered twice?

@jakehm
Copy link

jakehm commented Oct 27, 2016

Failed to GET url error is happening to me when I do
horseman.open('www.google.com')
and not when I do
horseman.open('http://www.google.com')

@awlayton
Copy link
Collaborator

You have to include the protocol @jharrowmortelliti. If you don't give one, I think PhantomJS tries file:, but I am not positive.

@jakehm
Copy link

jakehm commented Oct 28, 2016

I try to run the example at node-horseman/examples/links.js and it gives me the 'Failed to get url' error.

@awlayton
Copy link
Collaborator

Strange @jharrowmortelliti. When I run that example I get an error because the click selector no longer works, but once I change the selector to "input[value='Google Search']" it works fine.

@himanshu-jain16
Copy link

Any workaround for this issue?
I am facing this issue because a page is getting redirected.

@berstend
Copy link

I encountered the same issue, in my case adding a .wait(3000) before .click helped.

@balazs4
Copy link

balazs4 commented Jan 1, 2017

Adding .wait(...) into the chain has also worked for me as workaround.

@GomuGomuMan
Copy link

I also get this issue. .wait(randomInt) seems to be a workaround solution.

@webdev
Copy link

webdev commented Apr 12, 2017

What if there is no clicking in the function?

All I'm doing is opening a page, and scraping content, but pretty regularly getting Unhandled rejection Error: Failed to load url

@awlayton
Copy link
Collaborator

The error means what it says @webdev, PhantomJS failed to load a URL for some reason. This can be caused by a variety of things.

@Skyerin
Copy link

Skyerin commented Apr 28, 2017

I've been experiencing this issue recently as well, whilst making an acceptance testing framework for my workplace. I did some digging into the issue using phantomjs directly, and it looks like phantomjs, during redirects from the page itself, cancels requests, and this is what causes the fail, even if the page loads successfully.

For context, in the application, when loading, the application will refresh. This is a legacy issue with it that currently won't change for a while. Doesn't cause any issues elsewhere though. Just with phantomjs

test case;

horseman
.viewport(1920, 1080)
.open(url)
.wait(5000)
.catch((error) => console.log(error))
.screenshot('./test.jpg')
.close();

DEBUG=horseman output (url obscured);

[jenkins@buildBox testing_framework]$ DEBUG=horseman node horseman_test.js
  horseman using PhantomJS from phantomjs-prebuilt module +0ms
  horseman .setup() creating phantom instance 1 +4ms
  horseman phantom created +127ms
  horseman phantom version 2.1.1 +16ms
  horseman page created +7ms
  horseman phantomjs onLoadFinished triggered success NaN +7ms
  horseman injected jQuery +16ms
  horseman .on() urlChanged set. +2ms
  horseman .viewport() set 1920 1080 +1ms
  horseman .open() http://url-of-application.com?action=new&username=testuser +4ms
  horseman phantomjs onLoadFinished triggered fail 1 +522ms
Error: Failed to GET url: http://url-of-application.com?action=new&username=testuser
    at checkStatus (/var/lib/jenkins/workspace/testing_framework/node_modules/node-horseman/lib/actions.js:78:16)
    at PassThroughHandlerContext.finallyHandler (/var/lib/jenkins/workspace/testing_framework/node_modules/bluebird/js/release/finally.js:57:23)
    at PassThroughHandlerContext.tryCatcher (/var/lib/jenkins/workspace/testing_framework/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/var/lib/jenkins/workspace/testing_framework/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/var/lib/jenkins/workspace/testing_framework/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/var/lib/jenkins/workspace/testing_framework/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/var/lib/jenkins/workspace/testing_framework/node_modules/bluebird/js/release/promise.js:693:18)
    at Promise._fulfill (/var/lib/jenkins/workspace/testing_framework/node_modules/bluebird/js/release/promise.js:638:18)
    at /var/lib/jenkins/workspace/testing_framework/node_modules/bluebird/js/release/nodeback.js:42:21
    at /var/lib/jenkins/workspace/testing_framework/node_modules/node-phantom-simple/node-phantom-simple.js:60:18
    at IncomingMessage.<anonymous> (/var/lib/jenkins/workspace/testing_framework/node_modules/node-phantom-simple/node-phantom-simple.js:645:9)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
    at endReadableNT (_stream_readable.js:974:12)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickCallback (internal/process/next_tick.js:98:9)
  horseman .close(). +5ms

Adding output for the network requests received, we can see the following;

{ contentType: 'image/png',
  headers:
   [ { name: 'Date', value: 'Fri, 28 Apr 2017 10:00:09 GMT' },
     { name: 'Server', value: 'Apache/2.4.7 (Ubuntu)' },
     { name: 'Last-Modified',
       value: 'Mon, 10 Apr 2017 09:13:13 GMT' },
     { name: 'Accept-Ranges', value: 'bytes' },
     { name: 'Content-Length', value: '6531' },
     { name: 'Keep-Alive', value: 'timeout=5, max=100' },
     { name: 'Connection', value: 'Keep-Alive' },
     { name: 'Content-Type', value: 'image/png' } ],
  id: 11,
  redirectURL: null,
  stage: 'end',
  status: 200,
  statusText: 'OK',
  time: '2017-04-28T10:00:10.237Z',
  url: 'http://static.resources-url.com/application/images/app-ui/splash/splash_screen.png' }
{ contentType: null,
  headers: [],
  id: 15,
  redirectURL: null,
  stage: 'end',
  status: null,
  statusText: null,
  time: '2017-04-28T10:00:10.239Z',
  url: 'http://fonts.gstatic.com/s/ubuntu/v9/0ihfXUL2emPh0ROJezvraKCWcynf_cDxXwCLxiixG1c.ttf' }

From phantom and horseman, we can access the ttf file above; it is just the network requests being cancelled because of the redirect.

Looking at the phantomjs issues, I found ariya/phantomjs#12750 which seems to be the same as the above here. It seems to specify that the issue is fixed in the 2.5 beta (which I haven't been able to test with unfortunately) but the issue doesn't feel to be in horseman for this.

@dhilditch
Copy link

The following fixed my issue:

QT_QPA_PLATFORM=offscreen

Set that environment variable and the PhantomJS starts working again as does Horseman. As found here: ariya/phantomjs#14376

@robertpallas
Copy link
Contributor

I have the issue similar to @Skyerin - one of the pages that I am scraping does HTTP 301 redirects after I fill out and submit a form causing the "Failed to load url" error.

With @dhilditch env QT_QPA_PLATFORM=offscreen I get "HeadlessError: Phantom immediately exited with: null" on any Phantom action. What version Phantom are you using with that?

For me its consistent to reproduce in both MacOS and Dockered Ubuntu. Phantom 2.1.1.

Out of ideas 🤔

@Skyerin
Copy link

Skyerin commented May 4, 2017

@dhilditch that unfortunately didn't work for me. I had the same issue as @robertpallas did :(

@robertpallas - on your environments, could you test it with the phantom 2.5 beta? you can provide the path to node-horseman; just see if that solves the issue at all for you (as it is stated in ariya/phantomjs#12750) ? Worth a shot at the very least... I didn't manage to get the beta working on both osx and centos, but you might have better luck than me.

@robertpallas
Copy link
Contributor

Issue

My issue is most probably related to one of the redirects being 0B HTTP 307 Internal Redirect from http to https. "Operation canceled" resourceError also comes as mentioned earlier in this thread. Recreation with HTTP only traffic with same HTTP redirects failed - Horseman works as expected.

Phantom 2.5

I was able to run Phantom 2.5 within Docker with Ubuntu 16.04 under it and also set ENV variable QT_QPA_PLATFORM=offscreen but with or without it the app fails on earlier HTTP 302 redirect on same site with "Failed to load url" so it actually took me back a few steps.

Next

Maybe I can somehow force the redirect to go directly to https and skip that Internal Redirect step? Any other workaround ideas are welcome.

@Skyerin
Copy link

Skyerin commented May 5, 2017

@robertpallas I can't think of anything else you could try apart from modifying how your redirects are happening as they're happening from the page itself; unless we could add a delay somewhere somehow?

One of the ideas I was going to look at (because I deal with a SPA) was to modify the reload code so that when it comes from phantom (based off a flag/specific user agent) so that it would either add a delay to the reload (allowing network requests to finish first), or stop the reload altogether and I would do that manually in my tests.

The only other thing I could think of was just changing away from phantom and experimenting with chrome headless - although that is in a very barebones state and I do absolutely love how horseman makes things simple and lets itself be easily extensible.

@sashberd
Copy link

sashberd commented Aug 6, 2017

I have also this issue.
Here my data from log:

{errorCode: 5,
errorString: 'Operation canceled',
id: 30,
url: 'https://fecdn.user1st.info/CommFrame/Activation?ver=0.1.5.3#[my url]' }

my code is:

 var location;
 horseman.on('resourceError', function(err) {
            console.log(err);
        });
        horseman.viewport(20000, 100000)
            .open(horseManURL)
            .scrollTo(100000, 2000)
            .url()
            .then(function(currentURL) { location =currentURL }) 
            .evaluate(function(selector) {           

                return {
                    html: $(selector).html()
                }
            }, 'html')
            .then(function(data) {              
                data.location = location;           
                horseman.close();
                waterfallCallback(null, data);
            })
            .catch(function(data) {  
                horseman.close();
                return waterfallCallback(true);
            })

Any ideas for workarounds?

@geek-caroline
Copy link

geek-caroline commented Sep 20, 2017

I found that these steps fixed this issue for me:

  • deleting node_modules
  • setting my node version to 8
  • npm install phantomjs
  • npm install node-horseman

I had previously had this issue repeatedly for any url that was complex (e.g. http://via.placeholder.com/350x150 worked fine, but the example with www.google.com didn't)

I'm not sure if it was something to do with the order in which I installed things? the fact that I'd installed node-horseman before phantom previously? Or that I'd previously installed packages that I was no longer using (e.g. horseman)

Anyway, that's how I fixed my world.

UPDATE: that's how I partially fixed my world - I am still seeing this issue intermittently :(

The thing about the complexity of the page / speed of the page loading / assets in the page etc seems to hold true though - a simple page doesn't throw this error

@felipemullen
Copy link

@geek-caroline
This happens to me if I do not specify the protocol, i.e. www.google.com will not work, but http://www.google.com works just fine

@kahirul
Copy link

kahirul commented Oct 22, 2017

Hi, I also experienced this. And I fixed this by changing my DNS to 8.8.8.8.

It seems very similar with issue at docker for mac docker/for-mac#1317

@geek-caroline
Copy link

@felipemullen you may well be right, I've not seen the problem again and I've used the fully qualified https:// address. Perhaps that is the solution for most?

@chiefsmurph
Copy link

chiefsmurph commented Nov 10, 2017

DNS set to 8.8.8.8, full https:// url, ignoreSSL: true, wait calls left and right ... getting this error fairly frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests