Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookies in CustomHeaders not correctly used & building altered headers (Hybrid) #930

Open
alban-stourbe-wmx opened this issue Jun 17, 2024 · 3 comments · May be fixed by #936
Open

Cookies in CustomHeaders not correctly used & building altered headers (Hybrid) #930

alban-stourbe-wmx opened this issue Jun 17, 2024 · 3 comments · May be fixed by #936
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@alban-stourbe-wmx
Copy link

alban-stourbe-wmx commented Jun 17, 2024

katana version:

Katana version: v1.1.0

Current Behavior:

If cookies are added to customHeaders, they are not integrated directly into the browser. As a result, they are used when the first request is sent, but they are not directly added to the browser, so in some cases the information is lost for the rest of the crawling. This problem is very restrictive when trying to perform an authenticated crawling and the authentication vector is a cookie.

What's more, the reconstruction of the request to insert it into the output is based solely on the customHeaders and not on the headers linked to the request sent by the browser. As a result, the request written to the output doesn't really correspond to the request sent by the browser.

Detailed explanation of bug source

If we want to add custom cookies for an authenticated crawl, we need to use option H, CustomHeaders. This data is added to the Headers field of the Shared.
Capture d’écran 2024-06-17 à 09 35 49

Custom headers are then used when crawling a web page. They are added to the headers of the page in question using the Shared addHeadersToPage function. Custom headers are then used when crawling a web page. They are added to the headers of the page in question by the Shared addHeadersToPage function. This function calls page.SetExtraHeaders, which can lead to a bug.

During crawling, when accessing a certain page, there may be a Set-Cookie in the response. A cookie will be initialized in the browser. As a result, even if custom cookies are specified in the option, they will not be added to the page headers, as page.SetExtraHeaders only adds a value if it doesn't exist. In the case of a crawl authenticated via a certain cookie, this value may be lost during the crawl.

For example, During the first crawl, a foo=bar cookie is present, but during the next crawl this information has disappeared because cookies have been initialized. So SetExtraHeaders will not add Cookie because the value is already set.
Capture d’écran 2024-06-17 à 09 57 20
Capture d’écran 2024-06-17 à 09 58 09

In addition, adding headers to recreate the output request does not coincide with the real request sent by the browser. During the crawling, the browser can add dynamically headers and cookies, but the reconstruction is based solely on the custom headers entered as input.

Genuine request:
Capture d’écran 2024-06-17 à 10 10 47
Output request:
Capture d’écran 2024-06-17 à 10 10 06

Expected Behavior:

Create an option to load cookies when the browser is initialized. In hybrid mode, cookies can't simply be added to headers - they have to be inserted into the browser to emulate real browser behavior. The use of cookies and headers must be dissociated in this context.

To rebuild request headers, simply use the headers linked to the hijacked request (e proto....). The latter contains all information, including customHeaders.

Steps To Reproduce:

Example: steps to reproduce the behavior :

  1. Launch katana with a custom cookie in the Custom Headers option
  2. Notice that the cookie value disappears during crawling.
@alban-stourbe-wmx alban-stourbe-wmx added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Jun 17, 2024
@alban-stourbe-wmx
Copy link
Author

I've made the changes to fix this issue. I plan to do the PR later today. ;)

@GeorginaReeder
Copy link

Great, thank you for this @alban-stourbe-wmx - we'll look out for the PR! :)

@alban-stourbe-wmx
Copy link
Author

Hey everyone, I'm coming back to you because I may have gotten carried away with loading cookies in the browser. 🥹

I wanted to create this cookie loading option to have an authenticated browser. However, I found that it was possible to do this by creating a debug browser and passing it to katana (cwu option). So adding such an option doesn't seem really coherent for the project.

Nevertheless, a bug persists in hybrid request headers. When using a custom/authenticated browser, no associated header is written, only the customHeaders entered as input

@ehsandeep ehsandeep linked a pull request Jun 27, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants