Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS3.5 H3.2 compliant #445

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

9enki
Copy link

@9enki 9enki commented Nov 27, 2023

H3 of PS3.5 indicates that multiple encoding formats are applied within the Patient Name.
https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_H.3.html
https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_H.3.2.html

Currently, although it is possible to obtain multiple Specification Character Sets as Strs, decoding in the above case seems to decode at the first element of the Specification Character Set.

I would like to be able to decode the Patient Name in this PR in the correct way when it is encoded in multiple encoding formats.

It is likely that PS3.5 will be able to promptly support PS3.5 Chapter I Korean and Chapter J Chinese by utilizing this support.

@Enet4 Enet4 mentioned this pull request Nov 27, 2023
6 tasks
@9enki
Copy link
Author

9enki commented Nov 27, 2023

@Enet4 Hello. I would like to do what we could not do in PR #444 here.

Let me ask you two questions.

  • I have a function called read_value_pns. I want to execute this function when VR::PN. If Patient Name does not contain delimiter =, read_value_pns executes read_value_strs as before. Is this an acceptable method of implementation for you? If this is a way to deviate from the design concept, I would be glad to get better advice.
    • If Patient Name does not contain delimiter =, read_value_pns executes read_value_str as before. I found that read_value_pns reads the value from the buffer, so it would be a bug if read_value_strs was executed afterwards. I will come up with a fix for this.
  • (If the answer to the first question is yes) Is there any way to get all the values of a Specification Character Set in a read_value_pns? The current implementation of StatefulDecoder keeps only the first one, even if there are multiple Specific Character Sets, as shown here. Since changing self.text from SpecificCharacterSet to Vec<SpecificCharacterSet> is a big change, if there is another way to get the value of the Specification Character Set I would prefer to use that method. I would prefer to use that method if I can get the Specification Character Set value another way.

@9enki 9enki force-pushed the japanese-character-improved branch 2 times, most recently from d31a5e5 to 1be93b0 Compare November 27, 2023 14:41
@9enki
Copy link
Author

9enki commented Nov 27, 2023

If Patient Name does not contain delimiter =, read_value_pns executes read_value_str as before. I found that read_value_pns reads the value from the buffer, so it would be a bug if read_value_strs was executed afterwards. I will come up with a fix for this.

The process after reading information from the buffer of read_value_strs is split into separate processes named read_value_strs_impl, which can be called from both read_value_strs and read_value_pns.
https://github.com/Enet4/dicom-rs/pull/445/files#diff-65231185718d602fd3a63db21c6076af076fd9afb6c3d4303c6baeaffa2e1e13R385

@9enki 9enki force-pushed the japanese-character-improved branch 2 times, most recently from c86f433 to fe26c98 Compare November 27, 2023 17:31
@9enki
Copy link
Author

9enki commented Nov 27, 2023

(If the answer to the first question is yes) Is there any way to get all the values of a Specification Character Set in a read_value_pns? The current implementation of StatefulDecoder keeps only the first one, even if there are multiple Specific Character Sets, as shown here. Since changing self.text from SpecificCharacterSet to Vec is a big change, if there is another way to get the value of the Specification Character Set I would prefer to use that method. I would prefer to use that method if I can get the Specification Character Set value another way.

  • I added a member of type Vec<TC> called cs to StatefulDecoder, where Specification Character Set is added each time set_character_set is executed. And modified read_value_pns to decode the character using the value of cs.
  • I understand that set_character_set is executed by read_value_cs. Fixed set_character_set to run on all character encoding information present in the Specification Character Set, since read_value_cs was implemented in such a way that only the first element was added to the text. Fixed.

@9enki 9enki changed the title [WIP] PS3.5 H3.2 compliant PS3.5 H3.2 compliant Nov 27, 2023
@9enki
Copy link
Author

9enki commented Nov 27, 2023

@Enet4 Implementation of this PR is complete. I would appreciate a review when you have time.

@9enki 9enki force-pushed the japanese-character-improved branch from fe26c98 to 1af117b Compare November 27, 2023 17:46
@Enet4 Enet4 self-requested a review November 28, 2023 11:32
@Enet4 Enet4 added A-lib Area: library C-encoding Crate: dicom-encoding labels Nov 28, 2023
@Enet4
Copy link
Owner

Enet4 commented Dec 7, 2023

Hello, @9enki! I'll be looking into DICOM-rs matters the upcoming weeks after a small hiatus.

  • I have a function called read_value_pns. I want to execute this function when VR::PN. If Patient Name does not contain delimiter =, read_value_pns executes read_value_strs as before. Is this an acceptable method of implementation for you? If this is a way to deviate from the design concept, I would be glad to get better advice.

The way I see it, the concept of Person Name should include all component groups, rather than just one (alphabetic, ideographic, and phonetic), so they would all sit in a single PrimitiveValue. So, once we find out that we have component groups other than the alphabetical one, we decode each group and merge them back together. From the little I saw of the code so far, that seems to be the intended way to go. In any case, I am likely to give it another look later.

  • Is there any way to get all the values of a Specification Character Set in a read_value_pns? The current implementation of StatefulDecoder keeps only the first one, even if there are multiple Specific Character Sets, as shown here. Since changing self.text from SpecificCharacterSet to Vec<SpecificCharacterSet> is a big change, if there is another way to get the value of the Specification Character Set I would prefer to use that method. I would prefer to use that method if I can get the Specification Character Set value another way.

The main problem is that SpecificCharacterSet is too simple right now. The way forward is to adjust this data type to support more than one character set value. The steps would be something like this:

  1. The existing idea of a SpecificCharacterSet would become a CharacterSetCode (a single encoding).
  2. SpecificCharacterSet is recreated as an opaque structure that can hold up to n codes (depending on which they are; in ISO_IR 192 and some others, only 1 value would be acceptable).

Aside from the fact that one would no longer be able to introspect into the variants available, I think that most of the API would stay intact.

#[derive(Debug)]
pub struct SpecificCharacterSet(SmallVec<[CharacterSetCode; 2]>);

As we are now gathering enough material for a major version (0.7), we can proceed with breaking changes on SpecificCharacterSet and PersonName to fulfill these needs.

If any further guidance is needed, feel free to let me know.

@Enet4 Enet4 added this to the DICOM-rs 0.7 milestone Mar 13, 2024
@Enet4
Copy link
Owner

Enet4 commented Mar 13, 2024

Hello again. This contribution is a candidate for inclusion in 0.7.0, but then we need to make the proposed change of changing SpecificCharacterSet so that the type becomes opaque and a single value encompasses the possibility of multiple character set codings. Please let me know if you would be able to fulfill this or you would prefer someone else to take over.

@Enet4 Enet4 added breaking change Hint that this may require a major version bump on release and removed breaking change Hint that this may require a major version bump on release labels Mar 28, 2024
@Enet4 Enet4 force-pushed the japanese-character-improved branch from c8de225 to 757e126 Compare March 28, 2024 12:09
@Enet4 Enet4 force-pushed the japanese-character-improved branch from 757e126 to 645cf11 Compare March 28, 2024 12:27
@Enet4 Enet4 removed this from the DICOM-rs 0.7 milestone Apr 10, 2024
@Enet4
Copy link
Owner

Enet4 commented Apr 13, 2024

Just letting you know that #489 was merged. It made SpecificCharacterSet opaque, so further contributions on text encoding can be brought in without breaking changes to the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lib Area: library C-encoding Crate: dicom-encoding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants