-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weldx.asdf.util.write_buffer
produces invalid yaml for empty inline ndarrays
#874
Comments
Hi @braingram , thank you for looking into it The code section your are referring to - replacing inline data with empty arrays - was introduced solely for displaying nicely formatted ASDF/YAML outputs without
you can run the following in IPython (or a jupyter notebook) import numpy as np
from weldx import WeldxFile
file = WeldxFile()
data = {"data_sets": {"first": np.random.random(100)}}
file["some_data"] = data
file.header(use_widgets=False) which should give you something like the following #ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
name: asdf, version: 2.15.0}
history:
extensions:
- !core/extension_metadata-1.0.0
extension_class: asdf.extension.BuiltinExtension
software: !core/software-1.0.0 {name: asdf, version: 2.15.0}
some_data:
data_sets:
first: !core/ndarray-1.0.0
data: []
datatype: float64
shape: [100] The idea here was to give a nice overview of an asdf-file layout without having to dive into blocks or very long outputs. note that this was done for building documentation before rework of how asdf files can be displayed nicely - including block info - using the However it is also nice to see if you are working in an IPython environment 😃 Regarding the block manager reworks: Feel free to adapt the block manager system regardless of failures related to this "hack", we are using it simple for educational and debugging purposes. I am sure there will be others ways to implement this again if needed. For comparison with the above example: #ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
name: asdf, version: 2.15.0}
history:
extensions:
- !core/extension_metadata-1.0.0
extension_class: asdf.extension.BuiltinExtension
software: !core/software-1.0.0 {name: asdf, version: 2.15.0}
some_data:
data_sets:
first: !core/ndarray-1.0.0
data: [0.5183324061707298, 0.5864153406728267, 0.43129883329448515, 0.2966977592020418,
0.8527967599717196, 0.23446144037963457, 0.9867198532590175, 0.06633670396891078,
0.8521992505792656, 0.18231458053099048, 0.12300367951872859, 0.3441692161814621,
0.3799700189678208, 0.013940636113156657, 0.15687623429607123, 0.40754367817412973,
0.28729585238378985, 0.5439791537481328, 0.16313199436442805, 0.4732761002950674,
0.9614629413891884, 0.18801270955047433, 0.9479895695724538, 0.1536923113058285,
0.07763350723046869, 0.9547515474500247, 0.40575583086458167, 0.2704415444577999,
0.8079131559671275, 0.8847253547307972, 0.10157186976238397, 0.7015105861330978,
0.547818421014551, 0.9254624653981415, 0.3404192348403935, 0.881353024095736,
0.8149270355334084, 0.6166549469105991, 0.15477183335846123, 0.5715250261569093,
0.6078691478358348, 0.06141479982329068, 0.31847574695092873, 0.4275685731151513,
0.48950691517656775, 0.643197261528769, 0.5173158893388426, 0.11736796192582533,
0.43069271688850064, 0.28736650208640435, 0.9168623508825732, 0.5670388920969732,
0.16125718985260573, 0.525039660922977, 0.6476174446555271, 0.5621777067737439,
0.42761478703587164, 0.9354381681812435, 0.638297006362459, 0.19433030672474427,
0.8693370415795185, 0.6921508043902841, 0.35408082903686244, 0.7549882446138901,
0.9536138354817663, 0.8308795572217292, 0.1700452644969126, 0.6110646056765298,
0.6091297397177062, 0.8306973093576878, 0.45809384566824796, 0.03964057529888587,
0.4900629464491799, 0.8502110168764621, 0.8457705024294935, 0.19468205940998007,
0.7361974986521657, 0.36191957170412603, 0.04762083582742371, 0.14554574036495704,
0.3088550924709902, 0.6866807948258878, 0.5397687189503869, 0.38912267341435913,
0.6044646474821497, 0.47002580735808297, 0.9188407991828724, 0.41494039111231307,
0.8112617882697444, 0.6081429598258545, 0.018017923126878665, 0.9161420392081203,
0.186062441001709, 0.24418392697834157, 0.09763210129872057, 0.8974307635776495,
0.6530967122831681, 0.7305082521858869, 0.5194708845648303, 0.5470488523041508]
datatype: float64
shape: [100]
...
or including block data #ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
name: asdf, version: 2.15.0}
history:
extensions:
- !core/extension_metadata-1.0.0
extension_class: asdf.extension.BuiltinExtension
software: !core/software-1.0.0 {name: asdf, version: 2.15.0}
some_data:
data_sets:
first: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [100]
...
ÓBLK 0 � � � �Äü�=þºý›¿»¼øKÕr æÚ_“›â?��‡�ÄŒÝ?—��žÚŸã?OHT£ýÙí?Nàçòc?ß?À~#û,�®? �DàÙœ?þ�@†±‡ß?ñ¡�çÆjå?¬��\›2î?¸z™À�êÔ?�Æ"Yw!Ü?À…ë`oP«? Êû™7Û?àØr��K·?(ÇÎ!CË?ŠN9�ÐÓé?Î!aë—‚ã?x�‡h‡¾Ø?àZ8µJ‹å?¤êÁ¬/vÂ?·»ÐËW2à?�‚yÍKÐ?xÙ-“�&Ö?�L’cg�ë?ÐáìÔÏ›¦?à
R¦&JÈ?Ÿìs��Äí?³�FB�šæ?y–�FȨé?ðš�åcÛ¥?4>�Çn�Æ?�‡H§ö×â?Ê
Ç�:áï?)2ΈAÿâ?Ä{Ožh+Â? vÞê¶êÛ?€–&jœàq?\bTç�[Û?ýˆç%�Üæ?bZ
Kó¡Ñ?êNápƒ@â?ìË�¢�Ì?ÀŽ;,m�É?–2Wiêä?��C’�‘é?ÀÂ�@�{Ñ?‹ÛE×ESæ?"Â}cP´Ú?ˆ
�–hÐÇ?†h�¢á-ì?0-Ù/þ�Ü?ñÝ4â£�á?^�ý˜2Äé?ðÄ
Æ��ª?ú„k)¼=Ò?‰²ˆýcëæ?��½3E˜á?£H˜<¿Fä?†TU‹¹äÚ?,”Ï�Ï|Ò?¥#½|ç�â?�HŒ�¤Dî?³ZhúBùè?Ó�ç¸Á}ê?.VÃj\õì? "9�‘;Ö?;�zà4äæ?K'³˜úÒî?Ä×v$¶¢Ó?�¦}û�ß?"û]€ÈÚ?Œ}²¥�·Ç?¡�3,M�é?Ÿ¥ /À~í?lf ýÍ°Ò?X\yôÏ�Å?TTȤTªê?÷o=S�üë?Õ'•’Ä+à?h}ï�7¶à?B_ž^±iÚ?@ÊbaÎAž?:£"Þ�“Ü?_—�Çë£æ?〠�ÖSá?„b$"u/Í?L&@¤i�é?�Ò�6�@?ä·òÀÏnÍ?Ng¦áöôÖ?Ü�Ÿ?�nÐ?f�¹ÔH�ê?ü��TšXØ?W×Ah�±ì?è?'ö•�³?¿È
ÅñUï? 'sKæ?Æ?p¢Q¼ø·¢?` W@¼1�?#ASDF BLOCK INDEX
%YAML 1.1
---
- 553
...
|
Thanks for the explanation. I will test out the example code you shared. It would be nice to have a nicer looking view of the blocks (or perhaps a way to hide them). Looking at the code that generates the header: Lines 983 to 994 in c2ee9b1
What about setting #ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
name: asdf, version: 3.0.0.dev327+g18ae7518.d20230504}
history:
extensions:
- !core/extension_metadata-1.0.0
extension_class: asdf.extension._manifest.ManifestExtension
extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev327+g18ae7518.d20230504}
some_data:
data_sets:
first: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [100] It looks like |
I think using this mock pattern was just a memory optimization. We are forced (are we?) to run through the serialization process to get the header. To avoid side effects during this phase, we take a deep copy of the current ASDF tree. When we serialize into the buffer we would effectively double the memory requirements, if we just copy the binary data of the ndarray buffers. As this method is never (ever) used to deserialize again, we are just fine. Also the context manager of mock.patch removes this behavior after the block has been left. Your suggestion to remove the optimization and solely rely on get_yaml_header would bloat the memory again as we need to write all arrays to the buffer.
This code (3 line) duplication is due to the fact, that the function "write_buffer" creates its own ASDF tree. It exists for the sake of completeness. |
Thanks for the detailed response!
You're correct that generating the header requires serializing the objects in the tree (more on this below).
There is one test where the result is deserialized: weldx/weldx/tests/asdf_tests/test_asdf_util.py Lines 168 to 171 in 7b5c0c1
This test is failing with some ASDF updates (which check the shape when the array is loaded). The above test is removed in: #875 so merging that PR should fix the this issue.
That's a good point about the memory usage and for large arrays seems problematic. One option that doesn't involve writing arrays to the buffer (as long as they're not inline) would be to use Extending the above example: import asdf
import io
import numpy as np
from weldx import WeldxFile
file = WeldxFile()
data = {"data_sets": {"first": np.random.random(100)}}
file["some_data"] = data
buff = io.BytesIO()
af = file._asdf_handle
asdf.yamlutil.dump_tree(af.tree, buff, af)
buff.seek(0)
print(buff.read().decode('ascii')) Produces: %YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
name: asdf, version: 2.15.0}
history:
extensions:
- !core/extension_metadata-1.0.0
extension_class: asdf.extension.BuiltinExtension
software: !core/software-1.0.0 {name: asdf, version: 2.15.0}
some_data:
data_sets:
first: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [100]
... I should note that |
thank you for pointing out |
It appears that ASDF internals are being patched to override writing of inline arrays.
weldx/weldx/asdf/util.py
Lines 142 to 143 in c2ee9b1
which was introduced in #469
This is producing yaml (like the following except taken from the
buff
generated duringtest_write_buffer_dummy_inline_arrays
that ASDF is unable to read:ASDF is unable to open this due to:
shape
(ASDF will raise aValueError
as seen here)Would you be able to help me understand the reasoning behind this patching so that hopefully I can figure out how to accommodate this use of ASDF? To provide some context, I am working on a rather major rewrite of the ASDF block management code to move ndarray support to a new style Converter. This has revealed numerous ASDF issues and some of the fixes are impacting weldx (including revealing this issue). Thanks!
The text was updated successfully, but these errors were encountered: