Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML reader is not working as expected #86

Open
shrutimantri opened this issue Feb 5, 2024 · 0 comments
Open

XML reader is not working as expected #86

shrutimantri opened this issue Feb 5, 2024 · 0 comments
Labels
area/plugin Plugin-related issue or feature request bug Something isn't working

Comments

@shrutimantri
Copy link
Contributor

Expected Behavior

When XML file with items are read, the records should be read in ion format without items or item in the ion file.
Example:
The following XML file:

<?xml version='1.0' encoding='UTF-8'?>
<items>
  <item>
    <job_title>BI Data Analyst</job_title>
    <avg_salary>836644.8</avg_salary>
  </item>
  <item>
    <job_title>ML Engineer</job_title>
    <avg_salary>679247.63</avg_salary>
  </item>
  <item>
    <job_title>Data Science Manager</job_title>
    <avg_salary>391371.17</avg_salary>
  </item>
  <item>
    <job_title>Business Data Analyst</job_title>
    <avg_salary>286000.0</avg_salary>
  </item>
  <item>
    <job_title>Data Scientist</job_title>
    <avg_salary>257422.32</avg_salary>
  </item>
  <item>
    <job_title>Computer Vision Engineer</job_title>
    <avg_salary>220583.33</avg_salary>
  </item>
  <item>
    <job_title>AI Scientist</job_title>
    <avg_salary>193666.67</avg_salary>
  </item>
  <item>
    <job_title>Applied Scientist</job_title>
    <avg_salary>190614.29</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Engineer</job_title>
    <avg_salary>175270.55</avg_salary>
  </item>
  <item>
    <job_title>Research Scientist</job_title>
    <avg_salary>161292.29</avg_salary>
  </item>
  <item>
    <job_title>Data Architect</job_title>
    <avg_salary>160283.26</avg_salary>
  </item>
  <item>
    <job_title>Data Engineer</job_title>
    <avg_salary>157510.03</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Scientist</job_title>
    <avg_salary>154638.64</avg_salary>
  </item>
  <item>
    <job_title>Research Engineer</job_title>
    <avg_salary>146618.11</avg_salary>
  </item>
  <item>
    <job_title>Analytics Engineer</job_title>
    <avg_salary>142703.15</avg_salary>
  </item>
  <item>
    <job_title>Data Science Consultant</job_title>
    <avg_salary>141937.5</avg_salary>
  </item>
  <item>
    <job_title>Data Analytics Manager</job_title>
    <avg_salary>141463.33</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Infrastructure Engineer</job_title>
    <avg_salary>141076.36</avg_salary>
  </item>
  <item>
    <job_title>BI Developer</job_title>
    <avg_salary>129846.15</avg_salary>
  </item>
  <item>
    <job_title>Data Specialist</job_title>
    <avg_salary>122083.33</avg_salary>
  </item>
  <item>
    <job_title>Data Manager</job_title>
    <avg_salary>120203.05</avg_salary>
  </item>
  <item>
    <job_title>Data Analyst</job_title>
    <avg_salary>116348.29</avg_salary>
  </item>
</items>

should be read by XML reader as:
Screenshot 2024-02-05 at 1 41 38 PM

Actual Behaviour

The following XML file:

<?xml version='1.0' encoding='UTF-8'?>
<items>
  <item>
    <job_title>BI Data Analyst</job_title>
    <avg_salary>836644.8</avg_salary>
  </item>
  <item>
    <job_title>ML Engineer</job_title>
    <avg_salary>679247.63</avg_salary>
  </item>
  <item>
    <job_title>Data Science Manager</job_title>
    <avg_salary>391371.17</avg_salary>
  </item>
  <item>
    <job_title>Business Data Analyst</job_title>
    <avg_salary>286000.0</avg_salary>
  </item>
  <item>
    <job_title>Data Scientist</job_title>
    <avg_salary>257422.32</avg_salary>
  </item>
  <item>
    <job_title>Computer Vision Engineer</job_title>
    <avg_salary>220583.33</avg_salary>
  </item>
  <item>
    <job_title>AI Scientist</job_title>
    <avg_salary>193666.67</avg_salary>
  </item>
  <item>
    <job_title>Applied Scientist</job_title>
    <avg_salary>190614.29</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Engineer</job_title>
    <avg_salary>175270.55</avg_salary>
  </item>
  <item>
    <job_title>Research Scientist</job_title>
    <avg_salary>161292.29</avg_salary>
  </item>
  <item>
    <job_title>Data Architect</job_title>
    <avg_salary>160283.26</avg_salary>
  </item>
  <item>
    <job_title>Data Engineer</job_title>
    <avg_salary>157510.03</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Scientist</job_title>
    <avg_salary>154638.64</avg_salary>
  </item>
  <item>
    <job_title>Research Engineer</job_title>
    <avg_salary>146618.11</avg_salary>
  </item>
  <item>
    <job_title>Analytics Engineer</job_title>
    <avg_salary>142703.15</avg_salary>
  </item>
  <item>
    <job_title>Data Science Consultant</job_title>
    <avg_salary>141937.5</avg_salary>
  </item>
  <item>
    <job_title>Data Analytics Manager</job_title>
    <avg_salary>141463.33</avg_salary>
  </item>
  <item>
    <job_title>Machine Learning Infrastructure Engineer</job_title>
    <avg_salary>141076.36</avg_salary>
  </item>
  <item>
    <job_title>BI Developer</job_title>
    <avg_salary>129846.15</avg_salary>
  </item>
  <item>
    <job_title>Data Specialist</job_title>
    <avg_salary>122083.33</avg_salary>
  </item>
  <item>
    <job_title>Data Manager</job_title>
    <avg_salary>120203.05</avg_salary>
  </item>
  <item>
    <job_title>Data Analyst</job_title>
    <avg_salary>116348.29</avg_salary>
  </item>
</items>

be read by XML reader as:

{"item":[{"avg_salary":836644.8,"job_title":"BI Data Analyst"},{"avg_salary":679247.63,"job_title":"ML Engineer"},{"avg_salary":391371.17,"job_title":"Data Science Manager"},{"avg_salary":286000,"job_title":"Business Data Analyst"},{"avg_salary":257422.32,"job_title":"Data Scientist"},{"avg_salary":220583.33,"job_title":"Computer Vision Engineer"},{"avg_salary":193666.67,"job_title":"AI Scientist"},{"avg_salary":190614.29,"job_title":"Applied Scientist"},{"avg_salary":175270.55,"job_title":"Machine Learning Engineer"},{"avg_salary":161292.29,"job_title":"Research Scientist"},{"avg_salary":160283.26,"job_title":"Data Architect"},{"avg_salary":157510.03,"job_title":"Data Engineer"},{"avg_salary":154638.64,"job_title":"Machine Learning Scientist"},{"avg_salary":146618.11,"job_title":"Research Engineer"},{"avg_salary":142703.15,"job_title":"Analytics Engineer"},{"avg_salary":141937.5,"job_title":"Data Science Consultant"},{"avg_salary":141463.33,"job_title":"Data Analytics Manager"},{"avg_salary":141076.36,"job_title":"Machine Learning Infrastructure Engineer"},{"avg_salary":129846.15,"job_title":"BI Developer"},{"avg_salary":122083.33,"job_title":"Data Specialist"},{"avg_salary":120203.05,"job_title":"Data Manager"},{"avg_salary":116348.29,"job_title":"Data Analyst"}]}
Screenshot 2024-02-05 at 1 43 14 PM

Steps To Reproduce

  1. Run the following flow:
id: xml-writer
namespace: company.team
description:  Analyse  data  salaries.
tasks:
  - id:  download_csv
    type:  io.kestra.plugin.fs.http.Download
    description:  Data  Job  salaries  from  2020  to  2023  (source  ai-jobs.net)
    uri:  https://gist.githubusercontent.com/Ben8t/f182c57f4f71f350a54c65501d30687e/raw/940654a8ef6010560a44ad4ff1d7b24c708ebad4/salary-data.csv

  - id:  average_salary_by_position
    type:  io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      data.csv:  "{{ outputs.download_csv.uri }}"
    sql:  |
      SELECT
        job_title,
        ROUND(AVG(salary),2)  AS  avg_salary
      FROM  read_csv_auto('{{workingDir}}/data.csv',  header=True)
      GROUP  BY  job_title
      HAVING  COUNT(job_title)  >  10
      ORDER  BY  avg_salary  DESC;
    store:  true
  - id:  export_result
    type: "io.kestra.plugin.serdes.xml.XmlWriter"
    from:  "{{ outputs.average_salary_by_position.uri }}"
  - id: xml_reader
    type: io.kestra.plugin.serdes.xml.XmlReader
    from: "{{ outputs.export_result.uri }}"
  1. Check the output of xml_reader task.

Environment Information

  • Kestra Version: 0.13.8
  • Plugin version: 0.13.8
  • Operating System (OS / Docker / Kubernetes): Docker
  • Java Version (If not docker):

Example flow

id: xml-writer
namespace: company.team
description:  Analyse  data  salaries.
tasks:
  - id:  download_csv
    type:  io.kestra.plugin.fs.http.Download
    description:  Data  Job  salaries  from  2020  to  2023  (source  ai-jobs.net)
    uri:  https://gist.githubusercontent.com/Ben8t/f182c57f4f71f350a54c65501d30687e/raw/940654a8ef6010560a44ad4ff1d7b24c708ebad4/salary-data.csv

  - id:  average_salary_by_position
    type:  io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      data.csv:  "{{ outputs.download_csv.uri }}"
    sql:  |
      SELECT
        job_title,
        ROUND(AVG(salary),2)  AS  avg_salary
      FROM  read_csv_auto('{{workingDir}}/data.csv',  header=True)
      GROUP  BY  job_title
      HAVING  COUNT(job_title)  >  10
      ORDER  BY  avg_salary  DESC;
    store:  true
  - id:  export_result
    type: "io.kestra.plugin.serdes.xml.XmlWriter"
    from:  "{{ outputs.average_salary_by_position.uri }}"
  - id: xml_reader
    type: io.kestra.plugin.serdes.xml.XmlReader
    from: "{{ outputs.export_result.uri }}"
@shrutimantri shrutimantri added the bug Something isn't working label Feb 5, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Issues Jun 10, 2024
@anna-geller anna-geller added the area/plugin Plugin-related issue or feature request label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/plugin Plugin-related issue or feature request bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

2 participants