-
-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent HttpParser fields #1437
Comments
@JJ-Author Unfortunately that's how it is currently. TLS interception was an after-thought and added on community request. It didn't exist in original open source version. As a result , its support was just monkey patched on top of existing request object. In its defence, I can say that, since code runs within a context capable of serving HTTP, HTTPS, PROXY, INTERCEPTED content, a single request fails to encapsulates everything in a consistent manner. May be in future releases we can carve out more specific request objects to provide better interface. |
IIRC, these issues are side-effect of how http parser works. To understand this, let's first remember We may need to dig further into this to provide a consistent interface. Due to backwards compatibility we cannot change the existing, but we can certainly add a helper method / property / attribute within parser object to provide better information for the current context (web, proxy, intercept) |
Thank you for your explanations. Nevertheless the port situation is a bug where there seems no workaround for - there seems no way to distinguish connections between https://example.org:443 and https://example.org:444/ and perform appropriate redirection for these different request targets. As a constructive feedback from our side (we value all effort that has been put so far in this project). ATM It seems hardly possible to write a plugin from the documentation or API hook definition. One needs to go through all the example plugins to grasp pieces of information how they can be used and then debug or "reverse-engineer" the values of the request object fields for the different request cases since with simple try and error you will get stuck because the behavior is unreasonable from the outside (we know that implementation limitations provide some reason but this can’t and should not be seen by plugin authors). Besides a clear documentation what an implementer can expect from the request object also a general request lifeycle/flow image that shows when which hook is triggered would be very helpful. |
Moving forward - based on your message - I think an entirely new request wrapper object that is linked from the current request object as a new field could be a good tradeoff. Given well-defined/typed getter (and if needed setter methods) with good documentation of each function and their return values (e.g. when values are optional) should allow people that are not http(s) experts like us to get started much easier, but also experts to write code faster and more robust. One option would be (probably this is more java then python style) that this request object could be based on a class hierarchy where each different class represents the different request types base, http, httpsPassthrough, httpsIntercepted, etc. As such one could get rid of optional/None values and it is very clear what you will get without too much documentation (and probably better IDE support). |
how do we proceed about this? split into 3 new issues?
|
I honestly think this Python Notebook already helps explain what to expect from HttpParser class. I have sent out a PR to add more clarity, but more of less, this notebook already seem to document all scenarios. We can further cover scenarios for:
Can you please reproduce this bug as either a test case (see test_http_parser.py) or may be via an example in the notebook?
Once we have enough clarity, we can propose an interface for various types of request objects, extending base |
Describe the bug
the field values of the request object (for a GET request via http and https) vary between http and https (intercepted) in an inconsistent and potentially incorrect manner that is hindering correct implementation of a custom archive redirection plugin
protocol
is None always -> is this supposed to carry scheme information? how to distinguish whether the request is https or not???_url
is incomplete (missing scheme+host) for https requestshost
is None for https requestport
is incorrectly reported as 80 for https request (that is actually sent to 443)To Reproduce
the issue can be reproduced with
poetry install
and thenpoetry shell
and thenpython proxy/request_proxy.py
in the root dir of https://github.com/kuefmz/https-interception-proxypy/Expected behavior
protocol
should not be None but http or https?_url
should be full request url (including scheme+host) for https requests that are interceptedhost
should be equivalent to the FQDN for https request as they are for http requestport
should be correctly reportedVersion information
Additional context
Log from the custom proxy with output of request object and selection of fields
The text was updated successfully, but these errors were encountered: