How to structure multiple parsers? #227
Replies: 1 comment
-
Just following up with the solution I'm currently using: @ors_home_page "https://www.oregonlegislature.gov/bills_laws/Pages/ORS.aspx"
@chapter_root "https://www.oregonlegislature.gov/bills_laws/ors/ors"
@anno_root "https://www.oregonlegislature.gov/bills_laws/ors/ano"
@impl Crawly.Spider
def base_url(), do: "https://www.oregonlegislature.gov/"
@impl Crawly.Spider
def init() do
[start_urls: [@ors_home_page]]
end
@impl Crawly.Spider
def parse_item(%{request_url: @ors_home_page} = response) do
Logger.info("Parsing #{response.request_url}...")
Parser.parse_home_page(response)
end
def parse_item(%{request_url: @chapter_root <> _} = response) do
Logger.info("Parsing #{response.request_url}...")
ChapterFile.parse(response)
end
def parse_item(%{request_url: @anno_root <> _} = response) do
Logger.info("Parsing #{response.request_url}...")
AnnotationFile.parse(response)
end |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've read the docs, but I'm still a little unsure about it. Say I have two page types: a home page and item pages. My non-framework way to handle this is:
So, I'm using plain Elixir pattern matching on response properties to choose a parser. What would this code look like if implemented using Response Parsers? Could someone expand on the expected return type? (Maybe Crawly would benefit from a ResponseParser behavior?)
So it must return a tuple and the first item must be a
ParsedItem
?How should Response Parsers choose to not process a Response? I'm guessing by just returning an empty
ParsedItem
? Will the framework call each Response Parser in turn?Beta Was this translation helpful? Give feedback.
All reactions