Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml2json, deeply nested, only one record in results #20

Open
dmoco opened this issue Mar 8, 2014 · 14 comments
Open

xml2json, deeply nested, only one record in results #20

dmoco opened this issue Mar 8, 2014 · 14 comments

Comments

@dmoco
Copy link

dmoco commented Mar 8, 2014

I'm trying to convert the data from this resource: http://www.drugbank.ca/downloads and only seem to be getting the first record (DB00001, "Lepirudin") in the conversion. However, if I use DrungBank's on-line browser to search on "Lepirudin" it finds it but throws an error when trying to view the record so this may be an XML and not xml2json problem.

@knadh
Copy link
Owner

knadh commented Mar 9, 2014

Can you post an excerpt from the XML file?

@dmoco
Copy link
Author

dmoco commented Mar 10, 2014

Here's the first four records but it's still ~3000 lines: http://www.doconnel.force9.co.uk/t/sample.xml.tar.gz

Btw, if you are checking the source then v3 of the db also suffers from the same conversion problem, first record only, however the second record can actually be browsed on their website. I'm still awaiting feedback from them. Thanks for looking into this problem.

@knadh
Copy link
Owner

knadh commented Mar 13, 2014

Yeah, I was able to reproduce the problem. Only the first record is converted. Let me investigate.

@Zuela
Copy link

Zuela commented Mar 13, 2015

Having a similar issue with the following xml:
https://www.dropbox.com/s/0fdy2dks9pfdvhv/test.xml?dl=0

Conversion seems to stop after the second record...

Let me know if I can provide any other info!

@knadh
Copy link
Owner

knadh commented Mar 14, 2015

@Zuela, that's a gigantic XML file! Do you have a smaller sample?

@Zuela
Copy link

Zuela commented Mar 14, 2015

Sorry about that!! Will subset that file as soon as I can!
On Sat, Mar 14, 2015 at 1:17 AM Kailash Nadh [email protected]
wrote:

@Zuela https://github.com/Zuela, that's a gigantic XML file! Do you
have a smaller sample?


Reply to this email directly or view it on GitHub
#20 (comment).

@Zuela
Copy link

Zuela commented Mar 16, 2015

@knadh
Copy link
Owner

knadh commented Mar 19, 2015

@Zuela, I can confirm that only the first record gets converted, like the last issue. Not able to pinpoint where exactly the issue is.

@onemoretime
Copy link
Contributor

Hello!
I had the same prob with some other files.
My proposed workaround in xml2json.py

            val = None
            if elem.text:
                val = elem.text.strip()
                val = val if len(val) > 0 else None
+           elif elem.attrib:
+               val = elem.attrib
+               val = val if len(val) > 0 else None
            block[elem.tag] = val
                    
        return block

Thanx for your work.

@knadh
Copy link
Owner

knadh commented Jun 28, 2015

@onemoretime could you please submit a pull request?

onemoretime added a commit to onemoretime/xmlutils.py that referenced this issue Jul 12, 2015
knadh added a commit that referenced this issue Jul 12, 2015
@snowch
Copy link

snowch commented Nov 18, 2015

This xml still fails even with onemoretime@d1661e0

@dunglehome
Copy link
Contributor

@snowch, it does not fail after 1st record and it processes more than 1 record (look carefully in the resulting .json file, and do not be mislead by one field vs one record). But the issue I am facing is the script only processes up to about 26KB file size. Attached is the fruits.xml file that i made bigger for testing. and the fruits.json resulting file. I have to rename them to be accepted by this upload.
fruits.resulting.json.txt
fruits.large.xml.txt

@dunglehome
Copy link
Contributor

I have found a fix for this, below code tested with python 2.7.8 - note that i commented out the old code.

`
def get_json(self, pretty=True):

    """
        Convert an XML file to json string

         Keyword arguments:
        pretty -- pretty print json (default=True)

    """

    #self.context = iter(self.context)
    iterator = iter(self.context)
    #event, root = self.context.next()
    try:
        while True:
            event, root = iterator.next()
            #item = next(iterator)
            #do_stuff(item)
    except StopIteration:
        print("Event StopIteration found, done!")
        #pass
    finally:
        return self._elem2json(root, pretty)
        #del iterator
    #return self._elem2json(root, pretty)

`
ps: will submit pull request later.

dunglehome added a commit to dunglehome/xmlutils.py that referenced this issue Mar 22, 2016
@knadh
Copy link
Owner

knadh commented Mar 23, 2016

Thanks @dunnleaddress -- Merged a58d7be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants