Support for specifying explicit `Import` file formats #1164

aravindh-krishnamoorthy · 2024-11-12T17:19:50Z

Note

This PR is still a WIP. Some functionality, documentation, and tests are not yet implemented. However, community comments and suggestions are welcome.

Summary

Add support for:

~~Import[file, "fmt"], where "fmt" is a valid format from $ImportFormats. If not, "fmt" is treated as the "element" argument~~.
~~Import[file, {"fmt", element...}], where, again, "fmt" is a valid format. If not, "fmt" is again treated as an "element" argument.~~
The functionality is already implemented, but lacks documentation ~~and needs a bugfix~~.

Work progress

~~Implementation for Import[file, "fmt"]~~
~~Implementation for Import[file, {"fmt", element...}]~~
~~Bugfix for Import[file, {"fmt", element...}] - only check the first element for file format*.~~ (not needed as well with old implementation).
Documentation updates.
Test updates.
- Import["file.svg"] -> Error Import::fmtnosup for "SVG".
- Import["file.svg", "XML"] -> Imported as XML.
- Import["file.svg", {"XML"}] -> Imported as XML.
- Import["file.svg", {"XML", "XML"}] -> Error Import::noelem for the second "XML".

*Otherwise, it's not possible to get elements with the same names as file formats.

Test results

(mathics) ~/git/mathics/mathics-core/test/builtin/files_io$ python -m pytest test_importexport.py
=================================================== test session starts ====================================================
platform linux -- Python 3.11.10, pytest-8.3.3, pluggy-1.5.0
rootdir: ~/git/mathics/mathics-core
configfile: pyproject.toml
plugins: anyio-4.6.2.post1, typeguard-4.3.0
collected 39 items

test_importexport.py .......................................                                                         [100%]

===================================================== warnings summary =====================================================
../../../mathics/settings.py:13
  ~/git/mathics/mathics-core/mathics/settings.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

../../../mathics/core/parser/convert.py:41
  ~/git/mathics/mathics-core/mathics/core/parser/convert.py:41: DeprecationWarning: invalid escape sequence '\!'
    return s.encode("raw_unicode_escape").decode("unicode_escape")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================== 39 passed, 2 warnings in 3.73s ==============================================

General rant by the author

Using an ambiguous argument, which could mean "fmt" or "element" does not seem to be a good design choice by Wolfram language developers...

mmatera · 2024-11-13T01:42:54Z

mathics/builtin/files_io/importexport.py

-        return self.eval_elements(
-            filename, ListExpression(element), evaluation, options
-        )
+        if element.get_string_value() in IMPORTERS.keys():


maybe it would be a good idea to move the implementation of these methods to mathics.eval.files_io

Thank you for the comments, @mmatera. Would it make sense to move all related (to import/export) eval_xxx functions there?

First, I will implement the 2nd checkbox above in the PR description, and then, soon after receiving your response, will move the implementation to mathics.eval.files_io.

There are methods that start with eval in mathics.builtins and there are functions that start with eval_ in mathics.eval.

The methods have a funny docstring that indicates the function signature from Mathics3's perspective. This indicates to the Mathics3 interpreter when these methods get invoked. See https://mathics-development-guide.readthedocs.io/en/latest/extending/developing-code/extending/tutorial/1-builtin.html for an example.

Those kinds of evaluation methods can't be moved anywhere and have to be in mathics.builtins.

In the past everything was shoved into the class that implements a Mathics3 builtin function. These classes were big, harder to test in isolation, and were hard to understand.

We have been breaking these down. In particular, other than parameter checking and parameter conversion, any code in a Mathics3 Builtin Function (implemented as a Python class) that has any substance should be added as a (Python) function inside mathics.eval using the corresponding method name from the class. Using the same or similar name is intended to simplify understanding the correspondence.

In the future, if we are to be able to support instruction-like execution, we will need this kind of code in functions not as method objects of classes.

Does this answer your question and make sense?

Thank you for the explanation and the context, @rocky. Indeed, this helps. Since the changes for this PR are small, I'll add them to the existing methods in mathics.builtins. From the next PR on, I'll implement substantial code as helper functions in mathics.eval.

rocky · 2024-11-13T18:08:14Z

@aravindh-krishnamoorthy Why was it you were confused about whether Import had not been implemented for that aspect you wanted? Is there something we could have done, or should do, to make it more likely not to confuse others in the future?

aravindh-krishnamoorthy · 2024-11-13T18:54:25Z

@aravindh-krishnamoorthy Why was it you were confused about whether Import had not been implemented for that aspect you wanted? Is there something we could have done, or should do, to make it more likely not to confuse others in the future?

Thank you for your prompt response. The reason I started implementing this is because I had to import an "SVG" file as an "XML." Instead of trying it out (which would be better in retrospect), I looked at the documentation Sec. 29.3.9 (as of now) Import, which does not show Import[file, "fmt"] and Import[file, {"fmt", elements}]. So, I decided to implement them.

While implementing the second one, I saw that this was already (quite elegantly imho) done in Import._import:

mathics-core/mathics/builtin/files_io/importexport.py

Lines 1420 to 1427 in 275aa03

    
           # Determine file type 
        
           for el in elements: 
        
               if el in IMPORTERS.keys(): 
        
                   filetype = el 
        
                   elements.remove(el) 
        
                   break 
        
           else: 
        
               filetype = determine_filetype()

So, I redid this PR to a rather lackluster documentation update. Once I add the tests mentioned above, I think this PR will be ready for final review.

rocky · 2024-11-16T16:27:57Z

Since this has been hanging out for a while and the changes are pretty small, why not run black over test/builtin/files_io/test_importexport.py and then we can merge this in.

We can still keep the branch if there is further work you wan to do or we could do the remaining work in a new PR, up to you.

mmatera · 2024-11-16T16:58:07Z

LGTM, after black....

Allow "fmt" specification for Import.

6227e3f

mmatera reviewed Nov 13, 2024

View reviewed changes

Remove already implemented functionality + bugfix.

21d21a5

Remove debug code and revert to original.

06538ed

aravindh-krishnamoorthy added 2 commits November 15, 2024 18:37

Add tests for Import with format specification.

9cd25aa

Merge branch 'master' into import-fmt

4f5bd55

aravindh-krishnamoorthy marked this pull request as ready for review November 15, 2024 17:39

Black formatting

4d01c0f

mmatera merged commit d5b7564 into Mathics3:master Nov 16, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for specifying explicit `Import` file formats #1164

Support for specifying explicit `Import` file formats #1164

aravindh-krishnamoorthy commented Nov 12, 2024 •

edited

Loading

mmatera Nov 13, 2024

aravindh-krishnamoorthy Nov 13, 2024 •

edited

Loading

rocky Nov 13, 2024 •

edited

Loading

aravindh-krishnamoorthy Nov 13, 2024

rocky commented Nov 13, 2024

aravindh-krishnamoorthy commented Nov 13, 2024

rocky commented Nov 16, 2024

mmatera commented Nov 16, 2024

Support for specifying explicit Import file formats #1164

Support for specifying explicit Import file formats #1164

Conversation

aravindh-krishnamoorthy commented Nov 12, 2024 • edited Loading

Summary

Work progress

Test results

General rant by the author

mmatera Nov 13, 2024

Choose a reason for hiding this comment

aravindh-krishnamoorthy Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

rocky Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

aravindh-krishnamoorthy Nov 13, 2024

Choose a reason for hiding this comment

rocky commented Nov 13, 2024

aravindh-krishnamoorthy commented Nov 13, 2024

rocky commented Nov 16, 2024

mmatera commented Nov 16, 2024

Support for specifying explicit `Import` file formats #1164

Support for specifying explicit `Import` file formats #1164

aravindh-krishnamoorthy commented Nov 12, 2024 •

edited

Loading

aravindh-krishnamoorthy Nov 13, 2024 •

edited

Loading

rocky Nov 13, 2024 •

edited

Loading