You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If cookies are added to customHeaders, they are not integrated directly into the browser. As a result, they are used when the first request is sent, but they are not directly added to the browser, so in some cases the information is lost for the rest of the crawling. This problem is very restrictive when trying to perform an authenticated crawling and the authentication vector is a cookie.
What's more, the reconstruction of the request to insert it into the output is based solely on the customHeaders and not on the headers linked to the request sent by the browser. As a result, the request written to the output doesn't really correspond to the request sent by the browser.
Detailed explanation of bug source
If we want to add custom cookies for an authenticated crawl, we need to use option H, CustomHeaders. This data is added to the Headers field of the Shared.
Custom headers are then used when crawling a web page. They are added to the headers of the page in question using the Shared addHeadersToPage function. Custom headers are then used when crawling a web page. They are added to the headers of the page in question by the Shared addHeadersToPage function. This function calls page.SetExtraHeaders, which can lead to a bug.
During crawling, when accessing a certain page, there may be a Set-Cookie in the response. A cookie will be initialized in the browser. As a result, even if custom cookies are specified in the option, they will not be added to the page headers, as page.SetExtraHeaders only adds a value if it doesn't exist. In the case of a crawl authenticated via a certain cookie, this value may be lost during the crawl.
For example, During the first crawl, a foo=bar cookie is present, but during the next crawl this information has disappeared because cookies have been initialized. So SetExtraHeaders will not add Cookie because the value is already set.
In addition, adding headers to recreate the output request does not coincide with the real request sent by the browser. During the crawling, the browser can add dynamically headers and cookies, but the reconstruction is based solely on the custom headers entered as input.
Genuine request:
Output request:
Expected Behavior:
Create an option to load cookies when the browser is initialized. In hybrid mode, cookies can't simply be added to headers - they have to be inserted into the browser to emulate real browser behavior. The use of cookies and headers must be dissociated in this context.
To rebuild request headers, simply use the headers linked to the hijacked request (e proto....). The latter contains all information, including customHeaders.
Steps To Reproduce:
Example: steps to reproduce the behavior :
Launch katana with a custom cookie in the Custom Headers option
Notice that the cookie value disappears during crawling.
The text was updated successfully, but these errors were encountered:
Hey everyone, I'm coming back to you because I may have gotten carried away with loading cookies in the browser. 🥹
I wanted to create this cookie loading option to have an authenticated browser. However, I found that it was possible to do this by creating a debug browser and passing it to katana (cwu option). So adding such an option doesn't seem really coherent for the project.
Nevertheless, a bug persists in hybrid request headers. When using a custom/authenticated browser, no associated header is written, only the customHeaders entered as input
katana version:
Katana version: v1.1.0
Current Behavior:
If cookies are added to customHeaders, they are not integrated directly into the browser. As a result, they are used when the first request is sent, but they are not directly added to the browser, so in some cases the information is lost for the rest of the crawling. This problem is very restrictive when trying to perform an authenticated crawling and the authentication vector is a cookie.
What's more, the reconstruction of the request to insert it into the output is based solely on the customHeaders and not on the headers linked to the request sent by the browser. As a result, the request written to the output doesn't really correspond to the request sent by the browser.
Detailed explanation of bug source
If we want to add custom cookies for an authenticated crawl, we need to use option H, CustomHeaders. This data is added to the Headers field of the Shared.
![Capture d’écran 2024-06-17 à 09 35 49](https://private-user-images.githubusercontent.com/159776828/340219065-bdf69735-5bca-4ea7-851c-a77361f3128b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxMTM0MTAsIm5iZiI6MTcyMDExMzExMCwicGF0aCI6Ii8xNTk3NzY4MjgvMzQwMjE5MDY1LWJkZjY5NzM1LTViY2EtNGVhNy04NTFjLWE3NzM2MWYzMTI4Yi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzA0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwNFQxNzExNTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02Y2Q4NTk0MjExMjZlMjkzNDg4NjNjZThlNGIxNjg3OTY3ZTc3ZjEzZjk2OWVjMGJiOTJlZDdhYjg0NGZlNGQyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.kXZxIyAJxSFpqWx5SaVpCoLtpD13sccf1jayVZkisQw)
Custom headers are then used when crawling a web page. They are added to the headers of the page in question using the Shared addHeadersToPage function. Custom headers are then used when crawling a web page. They are added to the headers of the page in question by the Shared addHeadersToPage function. This function calls page.SetExtraHeaders, which can lead to a bug.
During crawling, when accessing a certain page, there may be a Set-Cookie in the response. A cookie will be initialized in the browser. As a result, even if custom cookies are specified in the option, they will not be added to the page headers, as page.SetExtraHeaders only adds a value if it doesn't exist. In the case of a crawl authenticated via a certain cookie, this value may be lost during the crawl.
For example, During the first crawl, a foo=bar cookie is present, but during the next crawl this information has disappeared because cookies have been initialized. So SetExtraHeaders will not add Cookie because the value is already set.
![Capture d’écran 2024-06-17 à 09 57 20](https://private-user-images.githubusercontent.com/159776828/340226482-f1d44cac-1c81-4278-8bbe-6a64e48cd3ca.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxMTM0MTAsIm5iZiI6MTcyMDExMzExMCwicGF0aCI6Ii8xNTk3NzY4MjgvMzQwMjI2NDgyLWYxZDQ0Y2FjLTFjODEtNDI3OC04YmJlLTZhNjRlNDhjZDNjYS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzA0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwNFQxNzExNTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lMGU4NTEwZGZmZjVmOGEwNmU0OTFhNGFlMzliY2JjMTEyMWMyMzc1MGFhZWRhNGNjMmVmNDM5NDMwMTYzMjY3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.s3RxnQIqF70UXaxmukh9ksOQjrrB6K9vY03mcIuC3IM)
![Capture d’écran 2024-06-17 à 09 58 09](https://private-user-images.githubusercontent.com/159776828/340226486-283c52c4-82fb-4c64-90d2-58954c4572b8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxMTM0MTAsIm5iZiI6MTcyMDExMzExMCwicGF0aCI6Ii8xNTk3NzY4MjgvMzQwMjI2NDg2LTI4M2M1MmM0LTgyZmItNGM2NC05MGQyLTU4OTU0YzQ1NzJiOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzA0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwNFQxNzExNTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iMTkxYTQxODQzMDRiNDNlNjhjNjczN2RkYjVhMmMwYTBlNzZkZGMyZWQxNzQ5YzQ1MmE2ZWI2NTRkMWFkYjZlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.wM1hUW7qZ94i-GGCZ4mA-FFHSo9FazBZC2lpVw-rnPY)
In addition, adding headers to recreate the output request does not coincide with the real request sent by the browser. During the crawling, the browser can add dynamically headers and cookies, but the reconstruction is based solely on the custom headers entered as input.
Genuine request:
![Capture d’écran 2024-06-17 à 10 10 47](https://private-user-images.githubusercontent.com/159776828/340229718-d3786235-2800-47c3-b9cb-942d10f81fd9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxMTM0MTAsIm5iZiI6MTcyMDExMzExMCwicGF0aCI6Ii8xNTk3NzY4MjgvMzQwMjI5NzE4LWQzNzg2MjM1LTI4MDAtNDdjMy1iOWNiLTk0MmQxMGY4MWZkOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzA0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwNFQxNzExNTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xMWE1YWMyZGJhOTgyZjQ2ZTczYzIzN2YzMmJlMWZlMDdkZmU0ZDJjNWIyOTU5MzVhODIzMjMzODg4YjBhODJiJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.cPfrekAly1LzbvfBIjCFCd62JRJdnHccHLUJqfTXwLk)
![Capture d’écran 2024-06-17 à 10 10 06](https://private-user-images.githubusercontent.com/159776828/340230994-b1474f9f-5052-4844-9a08-c60ad49adc9f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxMTM0MTAsIm5iZiI6MTcyMDExMzExMCwicGF0aCI6Ii8xNTk3NzY4MjgvMzQwMjMwOTk0LWIxNDc0ZjlmLTUwNTItNDg0NC05YTA4LWM2MGFkNDlhZGM5Zi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzA0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcwNFQxNzExNTBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mOWQyMmUyMTIxMzNjOTllNjljNDIxYTg5YzBiNWM3ZDBlOTU3NTAzODM4MjlhYzdmZmEzMWUwMTRmYmI0ZDA5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.E_3vN8JRz4-_vF27uFA9THC-KyVv2-HrB0YWnk7_tpw)
Output request:
Expected Behavior:
Create an option to load cookies when the browser is initialized. In hybrid mode, cookies can't simply be added to headers - they have to be inserted into the browser to emulate real browser behavior. The use of cookies and headers must be dissociated in this context.
To rebuild request headers, simply use the headers linked to the hijacked request (e proto....). The latter contains all information, including customHeaders.
Steps To Reproduce:
Example: steps to reproduce the behavior :
The text was updated successfully, but these errors were encountered: