Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i want to test your crawler help me!!! #632

Open
DeveloperYoungsuk opened this issue Aug 19, 2019 · 5 comments
Open

i want to test your crawler help me!!! #632

DeveloperYoungsuk opened this issue Aug 19, 2019 · 5 comments

Comments

@DeveloperYoungsuk
Copy link

i`m planning to test your crawler against Scrapy

but i don`t have enough example code.

can you provide some example code for me??

thank you for making an awesome software ~~

@essiembre
Copy link
Contributor

What type of code sample are you looking for?

We try to make the HTTP Collector easy to extend and work with for coders, but it is used by most as a stand-alone command-line application.

You can find here how to get started: https://www.norconex.com/collectors/collector-http/getting-started

If you wonder how to handle specific use cases not addressed in the documentation or other tickets, feel free to describe them here.

@DeveloperYoungsuk
Copy link
Author

how can i parse json??

@essiembre
Copy link
Contributor

JSON files will get crawled. Can you elaborate on what you are trying to do exactly? If you mean parsing a JSON file so some fields can be stored as metadata fields, for instance, you can do so by implementing your own IDocumentTagger.

If you know the pattern of just a few items to extract, you can use another approach such as with a TextPatternTagger. Alternatively, we can make it a feature request to have JSON parsing options built-in.

@DeveloperYoungsuk
Copy link
Author

DeveloperYoungsuk commented Aug 31, 2019 via email

@essiembre
Copy link
Contributor

What do you want to do with the information? Store each field/value in a separate field in your database or search engine? Something else?

I suggest you look at available options for manipulating content (e.g. taggers, splitters, etc.) here: https://www.norconex.com/collectors/importer/configuration

Maybe you will find ways to break the JSON the way you want. Otherwise, you will have to create your own or wait for that feature to be implemented.

I am marking this as a feature request to have a tagger that can extract data from JSON (similar to DOMTagger).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants