Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADF template updates to align with Healthcare data solution in MS Fabric #645

Merged

Conversation

baljeetsethi3
Copy link
Contributor

@baljeetsethi3 baljeetsethi3 commented Feb 7, 2024

This PR includes changes in Copy DICOM Metadata Changes to ADLS Gen2 in Delta Format ADF template to align with Healthcare data solution in MS Fabric.

  • Changes made to data flow for instance table
    • Names of all the columns in the table in the current implementation are made to lower case except partitionName and lastModifiedTimestamp. Example, studyDate changed to studydate.
    • None of the columns in the current implementation is removed. Whereas more dicom tags (around 15-20) are extracted from the metadata object and exposed as columns in the table. The below dicom tags were additionally exposed to the table as columns: patientbirthdate, accessionnumber, referringphysicianname, modalitiesinstudy, performedprocedurestepstartdate, manufacturermodelname, studytime, timezoneoffsetfromutc, numberofstudyrelatedseries, numberofstudyrelatedinstances, seriesnumber, seriesdescription, numberofseriesrelatedinstances, bodypartexamined, laterality, seriesdate, seriestime, instancenumber, documenttitle.
    • Additionally, there are three more columns exposed:
      • metadata - the metadata object we get from changefeed api
      • metadata_string - the stringified version of the metadata object
      • created_date - currentTimestamp()
    • The metadata column coming from AHDS changefeed api is projected differently than in original implementation as mentioned below:
      Current projection:
      image
      Proposed projection:
      image
    • In the current implementation, a column named studyDate is being formatted to a date type(yyyy-mm-dd format). Where as in the proposed changes, there are two columns studydate, studydate_formatted which contains the raw data (in yyyymmdd format) coming from the metadata and formatted column value (in yyyy-mm-dd format) respectively.
  • Changes made to data flow for series table
    • This dataflow reads from the instance table written. With the proposed changes instance columns are in lowercase. So, changes are made to project the columns as expected (camel case) by series dataflow.
  • No changes were made to data flow for study table.

@mmitrik
Copy link

mmitrik commented Feb 13, 2024

Whereas more dicom tags (around 15-20) are extracted from the metadata object and exposed as columns in the table.

@baljeetsethi3 The diff makes it challenging to see the list of columns that were added. Can you provide a summary in the description above or in a comment? 🙏 I think this would be helpful to other reviewers as well 😄 Thanks!

@baljeetsethi3 baljeetsethi3 reopened this Feb 27, 2024
@zhimadaren zhimadaren merged commit 02f9c60 into Azure:main Mar 1, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants