Skip to content
This repository has been archived by the owner on Aug 7, 2024. It is now read-only.

Update dependency MechanicalSoup to v1 [SECURITY] #190

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renovate[bot]
Copy link

@renovate renovate bot commented Jul 5, 2023

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
MechanicalSoup (source) ==0.11.0 -> ==1.3.0 age adoption passing confidence

GitHub Vulnerability Alerts

CVE-2023-34457

Summary

A malicious web server can read arbitrary files on the client using a <input type="file" ...> inside HTML form.

Details

This affects the extremely common pattern of form submission:

b = mechanicalsoup.StatefulBrowser()
b.select_form(...)
b.submit_selected()

The problem is with the code in browser.Browser.get_request_kwargs:

    if tag.get("type", "").lower() == "file" and multipart:
        filepath = value
        if filepath != "" and isinstance(filepath, str):
            content = open(filepath, "rb")
        else:
            content = ""
        filename = os.path.basename(filepath)
        # If value is the empty string, we still pass it
        # for consistency with browsers (see
        # https://github.com/MechanicalSoup/MechanicalSoup/issues/250).
        files[name] = (filename, content)

The file path is taken from the bs4 tag "value" attribute. However, this path will default to whatever the server sends. So if a malicious web server were to send something like:

<html><body>
  <form method="post" enctype="multipart/form-data">
    <input type="text" name="greeting" value="hello" />
    <input type="file" name="evil" value="/home/user/.ssh/id_rsa" />
  </form>
</body></html>

then upon .submit_selected() the mechanicalsoup browser will happily send over the contents of your SSH private key.

PoC

import attr
import mechanicalsoup
import requests

class NevermindError(Exception):
    pass

@&#8203;attr.s
class FakeSession:
    session = attr.ib()

    headers = property(lambda self: self.session.headers)

    def request(self, *args, **kwargs):
        print("requested", args, kwargs)
        raise NevermindError  # don't actually send request

def demonstrate(inputs=None):
    b = mechanicalsoup.StatefulBrowser(FakeSession(requests.Session()))
    b.open_fake_page("""\
<html><body>
<form method="post" enctype="multipart/form-data">
<input type="text" name="greeting" value="hello" />
<input type="file" name="evil" value="/etc/passwd" />
<input type="file" name="second" />
</form>
</body></html>
""", url="http://127.0.0.1:9/")
    b.select_form()
    if inputs is not None:
        b.form.set_input(inputs)
    try:
        b.submit_selected()
    except NevermindError:
        pass

# %%

# unpatched
demonstrate()

# OUTPUT: requested () {'method': 'post', 'url': 'http://127.0.0.1:9/', 'files': {'evil': ('passwd', <_io.BufferedReader name='/etc/passwd'>), 'second': ('', '')}, 'headers': {'Referer': 'http://127.0.0.1:9/'}, 'data': [('greeting', 'hello')]}

# %%

# with the patch, this now works. users MUST open the file manually and

# use browser.set_input() using the file object.
demonstrate({"greeting": "hiya", "evil": open("/etc/hostname", "rb").name, "second": open("/dev/null", "rb")})

# OUTPUT: requested () {'method': 'post', 'url': 'http://127.0.0.1:9/', 'files': {'evil': ('hostname', <_io.BufferedReader name='/etc/hostname'>), 'second': ('null', <_io.BufferedReader name='/dev/null'>)}, 'headers': {'Referer': 'http://127.0.0.1:9/'}, 'data': [('greeting', 'hiya')]}

# %%

# with the patch, this raises a ValueError with a helpful string
demonstrate({"evil": "/etc/hostname"})

# %%

# with the patch, we silently send no file if a malicious server tries the attack:
demonstrate()

Suggested patch

diff --git a/mechanicalsoup/browser.py b/mechanicalsoup/browser.py
index 285f8bb..68bc65e 100644
--- a/mechanicalsoup/browser.py
+++ b/mechanicalsoup/browser.py
@&#8203;@&#8203; -1,7 +1,8 @&#8203;@&#8203;
+import io
 import os
 import tempfile
 import urllib
 import weakref
 import webbrowser
 
 import bs4
@&#8203;@&#8203; -227,15 +228,21 @&#8203;@&#8203; class Browser:
                     value = tag.get("value", "")
 
                 # If the enctype is not multipart, the filename is put in
                 # the form as a text input and the file is not sent.
                 if tag.get("type", "").lower() == "file" and multipart:
                     filepath = value
                     if filepath != "" and isinstance(filepath, str):
-                        content = open(filepath, "rb")
+                        content = getattr(tag, "_mechanicalsoup_file", None)
+                        if content is False:
+                            raise ValueError(
+                                """From v1.3.0 onwards, you must pass an open file object directly, for example using `form.set_input({"name": open("/path/to/filename", "rb")})`. This change is to mitigate a security vulnerability where a malicious web server could read arbitrary files from the client."""
+                            )
+                        elif not isinstance(content, io.IOBase):
+                            content = ""
                     else:
                         content = ""
                     filename = os.path.basename(filepath)
                     # If value is the empty string, we still pass it
                     # for consistency with browsers (see
                     # https://github.com/MechanicalSoup/MechanicalSoup/issues/250).
                     files[name] = (filename, content)
diff --git a/mechanicalsoup/form.py b/mechanicalsoup/form.py
index a67195c..82f6015 100644
--- a/mechanicalsoup/form.py
+++ b/mechanicalsoup/form.py
@&#8203;@&#8203; -1,8 +1,9 @&#8203;@&#8203;
 import copy
+import io
 import warnings
 
 from bs4 import BeautifulSoup
 
 from .utils import LinkNotFoundError
 
 
@&#8203;@&#8203; -64,15 +65,24 @&#8203;@&#8203; class Form:
         give it the value ``password``.
         """
 
         for (name, value) in data.items():
             i = self.form.find("input", {"name": name})
             if not i:
                 raise InvalidFormMethod("No input field named " + name)
-            i["value"] = value
+
+            if isinstance(value, io.IOBase):
+                # Store the actual file object for <input type="file">
+                i._mechanicalsoup_file = value
+                i["value"] = value.name
+            else:
+                # We set `_mechanicalsoup_file` to `False` so that we can
+                # check for deprecated use of the API.
+                i._mechanicalsoup_file = False
+                i["value"] = value
 
     def uncheck_all(self, name):
         """Remove the *checked*-attribute of all input elements with
         a *name*-attribute given by ``name``.
         """
         for option in self.form.find_all("input", {"name": name}):
             if "checked" in option.attrs:
@&#8203;@&#8203; -257,20 +267,20 @&#8203;@&#8203; class Form:
         .. code-block:: python
 
             form.set("login", username)
             form.set("password", password)
             form.set("eula-checkbox", True)
 
         Example: uploading a file through a ``<input type="file"
-        name="tagname">`` field (provide the path to the local file,
+        name="tagname">`` field (provide an open file object,
         and its content will be uploaded):
 
         .. code-block:: python
 
-            form.set("tagname", path_to_local_file)
+            form.set("tagname", open(path_to_local_file, "rb"))
 
         """
         for func in ("checkbox", "radio", "input", "textarea", "select"):
             try:
                 getattr(self, "set_" + func)({name: value})
                 return
             except InvalidFormMethod:

Impact

All users of MechanicalSoup's form submission are affected, unless they took very specific (and manual) steps to reset HTML form field values.


Release Notes

MechanicalSoup/MechanicalSoup (MechanicalSoup)

v1.3.0: Version 1.3.0

Compare Source

Breaking changes

  • To prevent malicious web servers from reading arbitrary files from the client, files must now be opened explicitly by the user in order to upload their contents in form submission. For example, instead of:

    browser["upload"] = "/path/to/file"

    you would now use:

    browser["upload"] = open("/path/to/file", "rb")

    This remediates CVE-2023-34457. Our thanks to @​e-c-d for reporting and helping to fix the vulnerability!

Main changes

  • Added support for Python 3.11.

  • Allow submitting a form with no submit element. This can be achieved by passing submit=False to StatefulBrowser.submit_selected. Thanks @​alexreg! [#​480]

v1.2.0: Version 1.2.0

Compare Source

Main changes

  • Added support for Python 3.10.

  • Added support for HTML form-associated elements (i.e. input elements that are associated with a form by a form attribute, but are not a child element of the form). [#​380]

Bug fixes

  • When uploading a file, only the filename is now submitted to the server. Previously, the full file path was being submitted, which exposed more local information than users may have been expecting. [#​375]

v1.1.0: Version 1.1.0

Compare Source

Main changes

  • Dropped support for EOL Python versions: 2.7 and 3.5.

  • Increased minimum version requirement for requests from 2.0 to 2.22.0 and beautifulsoup4 from 4.4 to 4.7.

  • Use encoding from the HTTP request when no HTML encoding is specified. [#​355]

  • Added the put method to the Browser class. This is a light wrapper around requests.Session.put. [#​359]

  • Don't override Referer headers passed in by the user. [#​364]

  • StatefulBrowser methods follow_link and download_link now support passing a dictionary of keyword arguments to requests, via requests_kwargs. For symmetry, they also support passing Beautiful Soup args in as bs4_kwargs, although any excess **kwargs are sent to Beautiful Soup as well, just as they were previously. [#​368]

Many thanks to the contributors who made this release possible!

v1.0.0: Version 1.0.0

Compare Source

This is the last release that will support Python 2.7. Thanks to the many contributors that made this release possible!

Main changes:

  • Added support for Python 3.8 and 3.9.

  • StatefulBrowser has new properties page, form, and url, which can be used in place of the methods get_current_page, get_current_form and get_url respectively (e.g. the new x.page is equivalent to x.get_current_page()). These methods may be deprecated in a future release. [#​175]

  • StatefulBrowser.form will raise an AttributeError instead of returning None if no form has been selected yet. Note that StatefulBrowser.get_current_form() still returns None for backward compatibility.

Bug fixes

  • Decompose <select> elements with the same name when adding a new input element to a form. [#​297]

  • The params and data kwargs passed to submit will now properly be forwarded to the underlying request for GET methods (whereas previously params was being overwritten by data). [#​343]

v0.12.0: Version 0.12.0

Compare Source

Main changes:

  • Changes in official python version support: added 3.7 and dropped 3.4.

  • Added ability to submit a form without updating StatefulBrowser internal state: submit_selected(..., update_state=False). This means you get a response from the form submission, but your browser stays on the same page. Useful for handling forms that result in a file download or open a new tab.

Bug fixes

  • Improve handling of form enctype to behave like a real browser. [#​242]

  • HTML type attributes are no longer required to be lowercase. [#​245]

  • Form controls with the disabled attribute will no longer be submitted to improve compliance with the HTML standard. If you were relying on this bug to submit disabled elements, you can still achieve this by deleting the disabled attribute from the element in the Form object directly. [#​248]

  • When a form containing a file input field is submitted without choosing a file, an empty filename & content will be sent just like in a real browser. [#​250]

  • <option> tags without a value attribute will now use their text as the value. [#​252]

  • The optional url_regex argument to follow_link and download_link was fixed so that it is no longer ignored. [#​256]

  • Allow duplicate submit elements instead of raising a LinkNotFoundError. [#​264]

Our thanks to the many new contributors in this release!


Configuration

📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Mend Renovate. View repository job log here.

@renovate renovate bot changed the title chore(deps): update dependency mechanicalsoup to v1 [security] Update dependency MechanicalSoup to v1 [SECURITY] Apr 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants