Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save params in svg as json #344

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bleything
Copy link
Contributor

@bleything bleything commented Dec 27, 2022

Description

This reworks #327 to store the parameters in json rather than a serialized python dict. This allows for safer parsing without needing to eval python. It also refactors the save thread to pass the entire sketch object in so the thread can handle all the parsing/generating logic locally.

The only potential downside is that the quote marks in json are entity encoded in the svg, so it's quite a bit less readable for humans.

Before:

$ grep dc:source before.svg
        <dc:source>Vsketch with params {'__seed__': 330354623, 'units': 'in', 'paper_size': '10.5x14.8cm', 'landscape': False, 'centered': True, 'margin': 0.09999999999999996, 'fill_pen': 'Gelly Roll Metallic', 'stroke_pen': 'Sharpie Fine Point', 'show_layout': False, 'show_sublayout': False, 'vpype_preview': True, 'rows': 5, 'columns': 3, 'cell_padding': 0.1, 'radius_percent': 0.08, 'jitter_percent': 1.0, 'line_shorten_max': 0.2, 'ball_count': 3}

After:

$ grep dc:source after.svg
        <dc:source>{&quot;units&quot;: &quot;in&quot;, &quot;paper_size&quot;: &quot;10.5x14.8cm&quot;, &quot;landscape&quot;: false, &quot;centered&quot;: true, &quot;margin&quot;: 0.2, &quot;fill_pen&quot;: &quot;Gelly Roll Metallic&quot;, &quot;stroke_pen&quot;: &quot;Sharpie Fine Point&quot;, &quot;show_layout&quot;: true, &quot;show_sublayout&quot;: true, &quot;vpype_preview&quot;: false, &quot;rows&quot;: 1, &quot;columns&quot;: 1, &quot;cell_padding&quot;: 0.1, &quot;radius_percent&quot;: 0.08, &quot;jitter_percent&quot;: 1.0, &quot;line_shorten_max&quot;: 0.2, &quot;ball_count&quot;: 3, &quot;__seed__&quot;: 0}</dc:source>

Personally I think this is a small price to pay for easier programmatic extraction but I'd like to hear what @tyehle thinks.

Checklist

  • feature/fix implemented
  • mypy returns no error
  • tests added/updated and pytest --runslow succeeds
  • documentation added/updated and building with no error (make clean && make html in docs/)
  • examples added/updated
  • code formatting ok (black and isort)

@tyehle
Copy link
Contributor

tyehle commented Dec 28, 2022

The clash of double quotes having special meaning in both json and xml is highly annoying. I'm not sure I'm sold here though because a string.replace("'", '"') should fix problems in all the cases I've seen, and being able to read the file is nice.

Though I don't hold my opinion strongly. I think a use case where that simple string replace wouldn't work would change my tune pretty quick :)

@bleything
Copy link
Contributor Author

I'm not sure I'm following. What would that string replace do?

@tyehle
Copy link
Contributor

tyehle commented Dec 28, 2022

Sorry that should replace single quotes with double quotes so the value can be read by eg json.loads

something like this is what I was thinking of:

>>> import json
>>> import xml.etree.ElementTree as et
>>> 
>>> json.loads(et.parse(open("sample.svg")).getroot().find(".//{http://purl.org/dc/elements/1.1/}source").text[20:].replace("'", '"'))
{'__seed__': 921181624, 'page_size': '4.5inx6.25in', 'scale': 1.0, 'pen_width_mm': 0.25, 'width': 15.0, 'height': 11.0, 'stroke_width': 0.12, 'density': 1.8, 'fuel': 3.0, 'min_length': 0.0, 'frequency': 0.15}

But this will surely only work if the parameters don't contain anything too strange. Ints, floats, bools, and strings should be fine, but if params can hold more complicated values (eg with some kind of nesting) then this will definitely not work.

@bleything
Copy link
Contributor Author

Ah okay, I see what you mean. I'm hesitant on this because it's relying on a coincidence (that python's dict dump format happens to resemble json) rather than using an established, standardized format.

Ints, floats, bools, and strings should be fine...

bools won't work because python capitalizes their names and json is case-sensitive:

>>> json.loads('{"test": true}')
{'test': True}
>>> json.loads('{"test": True}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/homebrew/Cellar/[email protected]/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 10 (char 9)

And then there's the issue of non-json-primitive objects, like you mentioned.

Overall I'm not sure that it makes sense to prioritize human readability for metadata that's embedded in an image file. I'd prefer to have an easily parsed format and a tool that extracts it (which is what I was building in the first place 😊)

An alternative that preserves human readability while making the output more easily parseable would be to emit the attr values in XML. We could create a new namespace rather than overloading dc:source.

That said, I'd still prefer json 😊

@tyehle
Copy link
Contributor

tyehle commented Dec 28, 2022

Yeah good points all around. I agree that putting things into xml attributes isn’t better than json.

@tyehle
Copy link
Contributor

tyehle commented Dec 28, 2022

@abey79 this seems like an improvement. I’d you’d like to merge it that is great to me

@abey79
Copy link
Owner

abey79 commented Dec 29, 2022

It also refactors the save thread to pass the entire sketch object in so the thread can handle all the parsing/generating logic locally.

I'm just wondering if we should be worried with the saver thread and the main thread sharing this object? I haven't looked at the code. Is the _sketch instance recreated every time something changes? I'm thinking of two cases:

  1. the uses interacts with the sketch after clicking the Like button
  2. vsk save with parameter/seed ranges

@bleything
Copy link
Contributor Author

bleything commented Dec 29, 2022

I'm just wondering if we should be worried with the saver thread and the main thread sharing this object?

Good question. To test, I added a 10 second time.sleep to the thread's run method just before the with open("...") line.

the uses interacts with the sketch after clicking the Like button

the like button currently disables all controls until the completed event fires, at which point I don't think it matters any more?

vsk save with parameter/seed ranges

vsk save does not use this thread. whiiiiich also means that it doesn't save the params, so that's something we'll need to address.

@bleything bleything closed this Dec 29, 2022
@bleything bleything reopened this Dec 29, 2022
@bleything
Copy link
Contributor Author

whoops, guess I hit the wrong keyboard shortcut

@tyehle
Copy link
Contributor

tyehle commented Dec 30, 2022

vsk save does not use this thread. whiiiiich also means that it doesn't save the params, so that's something we'll need to address.

If I remember correctly, it does save the vsk save command line or something so you can reproduce

@bleything
Copy link
Contributor Author

If I remember correctly, it does save the vsk save command line or something so you can reproduce

yep, exactly. Which is useful data to have but I think can be reconstructed from the param data and stored seed?

I have some other thoughts about how to proceed but I need a day or two to get them gathered and written down.

@abey79
Copy link
Owner

abey79 commented Dec 31, 2022

I think all save methods (like, vsk save) should lead to the same result. I'm happy of vsk save to have the JSON params instead of the command line. Having both would be nice, but that would likely require changes in vpype as well.

@bleything
Copy link
Contributor Author

bleything commented Dec 31, 2022

I think there's two separate issues that need to be addressed: unifying the svg output codepaths; and reworking param saving. I've opened #345 to discuss the former, let's keep this issue focused on param saving.

I've given this a lot of thought and I think the best compromise is to store the parameters as XML. It's readily machine parseable, more human readable than entity-encoded JSON, and as an extra bonus it's more semantically valid/valuable. We can still store them as JSON as well, but I think the most utility comes from leaning on XML.

I'm not an expert and need to spend more time with the specs to understand what's really possible. I think we'd want to study the various namespaces to find appropriate elements or create a vsketch or vpype namespace, but I was thinking structurally structurally something like this:

<svg ...>
  <metadata>
    <rdf:RDF>
      <vsk:param name="margin" type="number" unit="mm">10</param>
      <vsk:param name="text" type="string">Lorem ipsum dolor sit amet...</param>
      <vsk:param name="show_layout" type="boolean">true</param>
      <vsk:params format="json">e21hcmdpbjogMTAsIHRleHQ6ICJMb3JlbSBpcHN1bSBkb2xvciBzaXQgYW1ldC4uLiJ9Cg==</params>
    </rdf:RDF>
  </metadata>
</svg>

The key things are that each parameter is its own element and the attributes give enough information to parse it. I also like the idea of having a separate element (like <params> above) that contains a serialized version of the params, in this case also base64-encoded to reduce noise and filesize.

What do y'all think? If this sounds good I'm happy to get started on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants