Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH adding write_html to TableReport #1190

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
4 changes: 4 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ Release 0.4.1

Changes
-------
* :class: `TableReport` has `write_html` method
:pr:`1190` by :user: `Mojdeh Rastgoo<mrastgoo>`.

* A new parameter `verbose` has been added to the :class:`TableReport` to toggle on or off the
mrastgoo marked this conversation as resolved.
Show resolved Hide resolved
* A new parameter ``verbose`` has been added to the :class:`TableReport` to toggle on or off the
printing of progress information when a report is being generated.
:pr:`1182` by :user:`Priscilla Baah<priscilla-b>`.
Expand Down
40 changes: 40 additions & 0 deletions skrub/_reporting/_table_report.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
import codecs
import functools
import json
import locale
from pathlib import Path

from ._html import to_html
from ._serve import open_in_browser
Expand Down Expand Up @@ -197,6 +200,43 @@
def _repr_html_(self):
return self._repr_mimebundle_()["text/html"]

def write_html(self, file):
"""Store the report into an HTML file.

Parameters
----------
file : str, pathlib.Path or file object
The file object or path of the file to store the HTML output.
"""
html = self.html()
if isinstance(file, (str, Path)):
with open(file, "w", encoding="utf8") as stream:
stream.write(html)
return
try:
file.write(html.encode("utf-8"))
return
except TypeError:
pass

print(getattr(file, "encoding", None))
jeromedockes marked this conversation as resolved.
Show resolved Hide resolved
if (encoding := getattr(file, "encoding", None)) is not None:
try:
assert codecs.lookup(encoding).name == "utf-8"
except (AssertionError, LookupError):
raise ValueError(
"If `file` is a text file it should use utf-8 encoding; got:"
f" {encoding!r}"
)
elif locale.getencoding().lower() != "utf-8":
# when encoding=None, it will default on the platform-specific encoding
# raise if not utf-8
raise ValueError(

Check warning on line 234 in skrub/_reporting/_table_report.py

View check run for this annotation

Codecov / codecov/patch

skrub/_reporting/_table_report.py#L234

Added line #L234 was not covered by tests
f"Platform encoding is not utf-8; got {locale.getencoding()}"
)
mrastgoo marked this conversation as resolved.
Show resolved Hide resolved

file.write(html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining what html is expected (or not expected) to be at line 230?
Additionally, light inline documentation/comments on the steps above would help readability :)


def open(self):
"""Open the HTML report in a web browser."""
open_in_browser(self.html())
46 changes: 46 additions & 0 deletions skrub/_reporting/tests/test_table_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
import json
import re
import warnings
from pathlib import Path

import pytest

from skrub import TableReport, ToDatetime
from skrub import _dataframe as sbd
Expand Down Expand Up @@ -123,6 +126,49 @@ def test_duration(df_module):
assert re.search(r"2(\.0)?\s+days", TableReport(df).html())


@pytest.mark.parametrize(
"filename_type",
["str", "Path", "file_object", "binary_mode"],
)
def test_write_html(tmp_path, pd_module, filename_type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is looking great! the last thing we need to take care of is to make sure we close the file if we opened it (otherwise we "leak" a resource: the opened file handle. that could be a problem for example to clean up the temp directory on windows as it refuses to remove files that have an open file handle). Usually we ensure that with a simple context manager like:

with open(tmp_file_path, 'w', encoding='utf-8') as file:
    file.write('hello')

but here we have a tricky situation because in some cases we have a string or path (which require no closing) and sometimes we have a file object which does require closing.

The standard library module contextlib provides 2 ways to deal with that situation easily. The first is ExitStack: it creates a context and we can push as many context managers as we want to its stack; when it exits it unwinds the stack, calling each manager's __exit__ when it is popped. So we could use it like:

with contextlib.ExitStack() as stack:
    if file_type == 'str':
        file = str(tmp_file_path)
    elif file_type == 'text_file_object':
        file = stack.enter_context(open(tmp_file_path, 'w', encoding='utf-8'))
    # ...

    report.write_html(file)

# if we opened it the file is closed here when we exit the `with` block

This option using ExitStack is my favorite because the file is being managed by a context manager as soon as it is opened.

Another way is to use nullcontext in the cases where we do not open the file, so that later we can treat all options as if they were open files that implement the context manager protocol. nullcontext returns an object that implements the context manager protocol but whose __enter__ just returns the object we gave it and __exit__ does nothing:

if file_type == 'str':
    file = contextlib.nullcontext(str(tmp_file_path))
elif file_type == 'text_file_object':
    file = open(tmp_file_path, 'w', encoding='utf-8')

with file:
    report.write_html(file)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow contextlib.ExitStack() is very nice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add a comment on L139 to explain why it's a good idea to use contextlib here?

df = pd_module.make_dataframe({"a": [1, 2], "b": [3, 4]})
report = TableReport(df)

tmp_file_path = tmp_path / Path("report.html")

if filename_type == "str":
filename = str(tmp_file_path)
elif filename_type == "file_object":
filename = open(tmp_file_path, "w", encoding="utf-8")
elif filename_type == "binary_mode":
filename = open(tmp_file_path, "wb")
else:
filename = tmp_file_path

report.write_html(filename)
assert tmp_file_path.exists()

with open(tmp_file_path, "r", encoding="utf-8") as file:
saved_content = file.read()
assert "</html>" in saved_content


def test_write_html_with_not_utf8_encoding(tmp_path, pd_module):
df = pd_module.make_dataframe({"a": [1, 2], "b": [3, 4]})
report = TableReport(df)

filename = open(tmp_path / Path("report.html"), "w", encoding="latin-1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also we want to use a context manager to close the file

encoding = getattr(filename, "encoding", None)
with pytest.raises(
ValueError,
match=(
f"If `file` is a text file it should use utf-8 encoding; got: {encoding!r}"
),
):
report.write_html(filename)
assert not filename.exists()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
report.write_html(filename)
assert not filename.exists()
report.write_html(filename)
assert not filename.exists()

otherwise the assert is never executed because of the exception raised on the line above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually change this to check if the file doesn't contain html to make sure we don't modify it.



def test_verbosity_parameter(df_module, capsys):
df = df_module.make_dataframe(
dict(
Expand Down
Loading