Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

firstNodeMatchingSelector return nil when looking for node which exist #61

Open
kivu opened this issue Feb 9, 2016 · 5 comments
Open

Comments

@kivu
Copy link

kivu commented Feb 9, 2016

Hi
I'm trying to parse some HTML document to get two texts from tags:
"Some text to display.image_name_to_display.jpg"
so I use this code :
HTMLDocument *document = [HTMLDocument documentWithString:self.content]; //content is html above
NSString *handAndImageStr = [document firstNodeMatchingSelector:@"hand"].textContent;
if (handAndImageStr) {
NSString *imgStr = [document firstNodeMatchingSelector:@"image"].textContent;

and then imgStr is null instead of "image_name_to_display.jpg"

I'm using HTMLReader 0.9.4

@nolanw
Copy link
Owner

nolanw commented Feb 9, 2016

Hello! Did your HTML make it into the issue intact? If not, try surrounding it in backticks or triple-backticks to preserve the formatting.

Otherwise I'm left to guess at what's going on. Is it possible that it's an img element, not an image element, that you're looking for? And <img> doesn't generally have any text content, so I'm suspicious of that too. Is it possible you're looking to get at the src attribute? i.e. [document firstNodeMatchingSelector:@"img"][@"src"].

Let me know if any of that is helpful, or if I've misunderstood the HTML you're trying to scrape!

@kivu
Copy link
Author

kivu commented Feb 9, 2016

ahh Sorry I didn't notice that my tags are gone ;/ this is my hmtl with tags:
<hand>Text to display <image>image to display.jpg</image></hand>

I tried to use
[document firstNodeMatchingSelector:@"image"][@"src"] or
[document firstNodeMatchingSelector:@"img"][@"src"]
but this won't work

@nolanw
Copy link
Owner

nolanw commented Feb 9, 2016

I'm still a bit suspicious that your text is actually HTML. Is it possible it's actually XML, or something else entirely?

If you put <image> into an HTML document, it'll get parsed as if you put <img>. You'll probably notice that [document firstNodeMatchingSelector:@"image"] returns nil. This is why.

Additionally, <img> elements simply aren't allowed to have text or child elements or anything like that. Anything you try to put inside <img> get moved outside of it.

Putting the above two points together, If your document looks (in part) like this:

Ahoy <img>there</img> sailor

it actually gets parsed as if you wrote this:

Ahoy <img />there sailor

(See how the "there" popped out of the <img>?) Unfortunately, if you just want the text that looked like it was between <image> and </image>, you probably can't do it reliably.

I hope that all made sense, I realize it's pretty confusing. Can you share the full document you're trying to parse (obfuscating any private data of course)? Maybe I can think of a more suitable tool.

@kivu
Copy link
Author

kivu commented Feb 11, 2016

Thanks for your fast answer
I had to check what exactly app received from server and you were right this is not a html :(
app received a dictionary with some xml text:

{
            elements =             (
                                {
                    text = "<hand>some text to display <image> file_name.jpg </image></hand>";
                }
            );
            id = "204";
            time = "2016-02-11 12:15:00";
            timeSort = "2016-02-11 12:15:00";
        },

and made some parse to get info from tags "hand" and "image"
as I checked what might be issue of this I just found that when app use old version 0.5.9 somehow the text from "image" tag was parsed without problems.
I will try to get this image name out in some other way

@nolanw
Copy link
Owner

nolanw commented Feb 11, 2016

It looks like XML to me, so you could try using NSXMLParser (built in to iOS and OS X) or a library like KissXML (there are many, many XML libraries for iOS and OS X, that's just one I looked up).

Are you saying the current version of HTMLReader parses that text differently from version 0.5.9? If so I should take a look at that, there might be a bug there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants