Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form POST request returns error page using BasicCrawler, but works when using node-fetch #2586

Open
1 task
Hamza5 opened this issue Jul 22, 2024 · 0 comments
Open
1 task
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@Hamza5
Copy link

Hamza5 commented Jul 22, 2024

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/basic (BasicCrawler)

Issue description

I am trying to scrap the resulting page from a simple form submission (it has no fields actually, an empty body) using a POST request.

I submitted this form using Postman, and it worked perfectly. I tried to run it using the node-fetch library, and it also worked perfectly.

However, when I tried to do the same using BasicCrawler, I got an error page from the website (with HTTP 200 status, but the content says there is an error). I attach the code in the two versions: using fetch and BasicCrawler.

You can compare the length of the two to see the difference. The error page is 13,340 characters long, while the correct page has 849,470 characters.

Code sample

// BasicCrawler

import {BasicCrawler} from "crawlee";

const crawler = new BasicCrawler({
    async requestHandler({sendRequest, log}) {
        const {body} = await sendRequest({
            'method': 'POST',
            'url': 'https://www.idealo.de/hp/prg/bargains',
            'headers': {
                'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0',
                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8',
                'Accept-Language': 'en-US,en;q=0.5',
                'Accept-Encoding': 'gzip, deflate, br, zstd',
                'Content-Type': 'application/x-www-form-urlencoded',
                'Content-Length': '0',
                'Origin': 'https://www.idealo.de',
                'Connection': 'keep-alive',
                'Referer': 'https://www.idealo.de/',
                'Upgrade-Insecure-Requests': '1',
                'Sec-Fetch-Dest': 'document',
                'Sec-Fetch-Mode': 'navigate',
                'Sec-Fetch-Site': 'same-origin',
                'Sec-Fetch-User': '?1',
                'Priority': 'u=0, i',
                'TE': 'trailers'
            },
            'body': ''
        });
        log.info(body.length);
    }
});

crawler.run(['https://www.idealo.de/hp/prg/bargains']);

// fetch (code exported from Postman)

import fetch from "node-fetch";

const myHeaders = new Headers();
myHeaders.append("User-Agent", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0");
myHeaders.append("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8");
myHeaders.append("Accept-Language", "en-US,en;q=0.5");
myHeaders.append("Accept-Encoding", "gzip, deflate, br, zstd");
myHeaders.append("Content-Type", "application/x-www-form-urlencoded");
myHeaders.append("Content-Length", "0");
myHeaders.append("Origin", "https://www.idealo.de");
myHeaders.append("Connection", "keep-alive");
myHeaders.append("Referer", "https://www.idealo.de/");
myHeaders.append("Upgrade-Insecure-Requests", "1");
myHeaders.append("Sec-Fetch-Dest", "document");
myHeaders.append("Sec-Fetch-Mode", "navigate");
myHeaders.append("Sec-Fetch-Site", "same-origin");
myHeaders.append("Sec-Fetch-User", "?1");
myHeaders.append("Priority", "u=0, i");
myHeaders.append("TE", "trailers");

const requestOptions = {
    method: "POST",
    headers: myHeaders,
    redirect: "follow"
};

fetch("https://www.idealo.de/hp/prg/bargains", requestOptions)
    .then((response) => response.text())
    .then((result) => console.log(result.length))
    .catch((error) => console.error(error));

Package version

3.11.0

Node.js version

v20.15.1

Operating system

Ubuntu 22.04

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

@Hamza5 Hamza5 added the bug Something isn't working. label Jul 22, 2024
@mtrunkat mtrunkat added the t-tooling Issues with this label are in the ownership of the tooling team. label Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants