-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate use of cgi
#377
Eliminate use of cgi
#377
Changes from all commits
431173d
c1c1dc4
b4fc499
cd4c7f8
5cc43b8
852e343
e03348f
0a3d17b
c2210ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
:mod:`treq.content.text_content()` no longer generates deprecation warnings due to use of the ``cgi`` module. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
import unittest | ||
from unittest import mock | ||
from typing import Optional | ||
|
||
from twisted.python.failure import Failure | ||
|
||
|
@@ -11,6 +13,7 @@ | |
from twisted.web.server import NOT_DONE_YET | ||
|
||
from treq import collect, content, json_content, text_content | ||
from treq.content import _encoding_from_headers | ||
from treq.client import _BufferedResponse | ||
from treq.testing import StubTreq | ||
|
||
|
@@ -267,6 +270,59 @@ def error(data): | |
# being closed. | ||
stub.flush() | ||
self.assertEqual(len(resource.request_finishes), 1) | ||
self.assertIsInstance( | ||
resource.request_finishes[0].value, ConnectionDone | ||
self.assertIsInstance(resource.request_finishes[0].value, ConnectionDone) | ||
|
||
|
||
class EncodingFromHeadersTests(unittest.TestCase): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add a docstring to describe the criteria for grouping tests into this class. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure! In the future it's fine to remind me of the Twisted coding standard, no need to write so much. :) |
||
def _encodingFromContentType(self, content_type: str) -> Optional[str]: | ||
""" | ||
Invoke `_encoding_from_headers()` for a header value. | ||
|
||
:param content_type: A Content-Type header value. | ||
:returns: The result of `_encoding_from_headers()` | ||
""" | ||
h = Headers({"Content-Type": [content_type]}) | ||
return _encoding_from_headers(h) | ||
|
||
def test_rfcExamples(self): | ||
""" | ||
The examples from RFC 9110 § 8.3.1 are normalized to | ||
canonical (lowercase) form. | ||
""" | ||
for example in [ | ||
"text/html;charset=utf-8", | ||
'Text/HTML;Charset="utf-8"', | ||
'text/html; charset="utf-8"', | ||
"text/html;charset=UTF-8", | ||
]: | ||
self.assertEqual("utf-8", self._encodingFromContentType(example)) | ||
|
||
def test_multipleParams(self): | ||
"""The charset parameter is extracted even if mixed with other params.""" | ||
for example in [ | ||
"a/b;c=d;charSet=ascii", | ||
"a/b;c=d;charset=ascii; e=f", | ||
"a/b;c=d; charsEt=ascii;e=f", | ||
"a/b;c=d; charset=ascii; e=f", | ||
]: | ||
self.assertEqual("ascii", self._encodingFromContentType(example)) | ||
|
||
def test_quotedString(self): | ||
"""Any quotes that surround the value of the charset param are removed.""" | ||
self.assertEqual( | ||
"ascii", self._encodingFromContentType("foo/bar; charset='ASCII'") | ||
) | ||
self.assertEqual( | ||
"shift_jis", self._encodingFromContentType('a/b; charset="Shift_JIS"') | ||
) | ||
|
||
def test_noCharset(self): | ||
"""None is returned when no valid charset parameter is found.""" | ||
for example in [ | ||
"application/octet-stream", | ||
"text/plain;charset=", | ||
"text/plain;charset=''", | ||
"text/plain;charset=\"'\"", | ||
"text/plain;charset=🙃", | ||
]: | ||
self.assertIsNone(self._encodingFromContentType(example)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,11 @@ | ||
# Copyright (c) Twisted Matrix Laboratories. | ||
# See LICENSE for details. | ||
|
||
import cgi | ||
import sys | ||
from typing import cast, AnyStr | ||
|
||
from io import BytesIO | ||
|
||
from multipart import MultipartParser # type: ignore | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's ok to use multipart in the testing code, to double check the implementation... even if the code under tests would have use stdlib. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For what it's worth, the
While it suggests the A more ideal test suite would test against a variety of real-world multipart parsers, but I don't think that'd have great ROI. |
||
from twisted.trial import unittest | ||
from zope.interface.verify import verifyObject | ||
|
||
|
@@ -588,9 +587,10 @@ def test_newLinesInParams(self): | |
--heyDavid-- | ||
""".encode("utf-8")), output) | ||
|
||
def test_worksWithCgi(self): | ||
def test_worksWithMultipart(self): | ||
""" | ||
Make sure the stuff we generated actually parsed by python cgi | ||
Make sure the stuff we generated can actually be parsed by the | ||
`multipart` module. | ||
""" | ||
output = self.getOutput( | ||
MultiPartProducer([ | ||
|
@@ -612,23 +612,20 @@ def test_worksWithCgi(self): | |
) | ||
) | ||
|
||
form = cgi.parse_multipart(BytesIO(output), { | ||
"boundary": b"heyDavid", | ||
"CONTENT-LENGTH": str(len(output)), | ||
}) | ||
form = MultipartParser( | ||
stream=BytesIO(output), | ||
boundary=b"heyDavid", | ||
content_length=len(output), | ||
) | ||
|
||
# Since Python 3.7, the value for a non-file field is now a list | ||
# of strings, not bytes. | ||
if sys.version_info >= (3, 7): | ||
self.assertEqual(set(['just a string\r\n', 'another string']), | ||
set(form['cfield'])) | ||
else: | ||
self.assertEqual(set([b'just a string\r\n', b'another string']), | ||
set(form['cfield'])) | ||
self.assertEqual( | ||
[b'just a string\r\n', b'another string'], | ||
[f.raw for f in form.get_all('cfield')], | ||
) | ||
|
||
self.assertEqual(set([b'my lovely bytes2']), set(form['efield'])) | ||
self.assertEqual(set([b'my lovely bytes219']), set(form['xfield'])) | ||
self.assertEqual(set([b'my lovely bytes22']), set(form['afield'])) | ||
self.assertEqual(b'my lovely bytes2', form.get('efield').raw) | ||
self.assertEqual(b'my lovely bytes219', form.get('xfield').raw) | ||
self.assertEqual(b'my lovely bytes22', form.get('afield').raw) | ||
|
||
|
||
class LengthConsumerTestCase(unittest.TestCase): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a FYI.
In Twisted we/I went for stdlib usage https://github.com/twisted/twisted/pull/12016/files#diff-3093a8f64dec64c2305f6322f276a5cee1437f0037efc660ce6524ffa58017a1R229-R234
My reasoning for using stdlib, is that this is quite common usage and if there is a bug, it's best to have it fixed in stdlib rather than rely on 3rd party packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what experience this is based on. In my experience, known bugs often remain in the stdlib for years (perhaps because there are relatively many barriers to landing a change in the stdlib, as well as the fact that the stdlib can only change with a release which happens relatively infrequently - plus such a release can not fix the issue for already-released versions of python).
If there is a third-party module of reasonable quality that is practical to use then I would generally prefer this over something from the stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the source of the 3rd-party module. More dependencies = more attack surface, both in the literal sense (more people whose PyPI upload credentials can ruin your day), and in the metaphoric sense of unanticipated changes (what happens if a dependency goes AGPL3).
In this sense "the stdlib" or any existing dependency are roughly equivalent, and I'd probably prefer to depend more on our existing deps. Taking on a new dependency ought to give us a moment of pause, but really only a moment, especially if we know the maintainers or it's sufficiently widely used that we are not entirely on the hook for due diligence on its continued maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case this is a blocker to 3.13 support and the dependency is only for tests, so we should probably not block on this minor concern :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the dependency here is not just for tests. In 0a3d17b I switched to using the equivalent of
cgi.parse_header()
provided bymultipart
to avoid vendoring code under the PSF license as that seemed to give @glyph pause.I picked
multipart
because it has a few familiar faces attached, e.g. @cjwatson, who has PyPI upload permissions. It is hosted under an individual's GitHub account rather than an org, so that's a downside.I also considered pulling the relevant bits of
cgi
into a separate package and putting it on PyPI. I'd be happy to do that under the Twisted org if that's preferable to an external dependency.I can only echo @exarkun's skepticism of using the stdlib for this. The stdlib has an anti-
bytes
bias that is often incorrect and has led to a lot of makework for us in Twisted. Thecgi
module has seen some baffling refactors, then deprecation and removal. Now we're told to use theemail
package, which isn't explicitly implementing the HTTP RFCs. Seems risky. Also, it was recently rewritten! It's possible for code to be too maintained.