Cannot set Mechanize page via Metadata's page method #42

derantell · 2014-12-28T21:04:25Z

When trying to use a pre-fethed Mechanize page as described in the wiki:

Wombat.crawl do
  m = Mechanize.new 
  mp = m.get 'http://www.google.com'
  page mp
end

I get an error with this stack trace:

crawler.rb:8:in `block in <main>': wrong number of arguments (1 for 0) (ArgumentError)
    from /Users/derantell/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/wombat-2.3.0/lib/wombat/crawler.rb:22:in `instance_eval'
    from /Users/derantell/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/wombat-2.3.0/lib/wombat/crawler.rb:22:in `crawl'
    from /Users/derantell/.rbenv/versions/2.1.2/lib/ruby/gems/2.1.0/gems/wombat-2.3.0/lib/wombat.rb:13:in `crawl'
    from crawler.rb:4:in `<main>'

Using @metadata_dup.page mp or renaming Metadata::page to something else works, therefore my guess is that the attr_accessor :page which Crawler includes from Parser is found and method_missing is never invoked.

Versions used: ruby 2.1.2, mechanize 2.7.3 and wombat 2.3.0

The text was updated successfully, but these errors were encountered:

acidghost · 2015-07-14T07:15:33Z

@felipecsl I'm having the same problem here... @derantell did you ever resolved this?

shashwatsingh · 2015-09-05T13:23:36Z

Same issue and gem versions same as derantell listed but on Linux 3.16 x86_64 instead.

Did anyone else have this issue? I tried following - which works, but if someone has a better way, that would be great:

module Wombat
  module DSL
    class Metadata
      alias_method :mech_page_ref, :page
    end
  end
end

and then used it as follows:

data = Wombat.crawl do
  mech_page_ref mech
end

cyu · 2015-10-18T13:48:26Z

I just ran into this issue as well. The issue is that Wombat::Processing::Parser defines a page accessor (https://github.com/felipecsl/wombat/blob/master/lib/wombat/processing/parser.rb#L24). Here's my workaround to the issue:

# cannot use Wombat#crawl
class Crawler
  include Wombat::Crawler
  og_image xpath: '//html/head/meta[@property = "og:image"]/@content'
  twitter_image_src xpath: '//html/head/meta[@name = "twitter:image:src"]/@content'
end

crawler = Crawler.new
page = crawler.mechanize.get(uri.to_s)
crawler.metadata[:page] = page
result = crawler.crawl

smileart · 2016-04-14T14:47:37Z

Workarounds are great and all that, but is there any chance to get it fixed in the project itself? ☹️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot set Mechanize page via Metadata's page method #42

Cannot set Mechanize page via Metadata's page method #42

derantell commented Dec 28, 2014

acidghost commented Jul 14, 2015

shashwatsingh commented Sep 5, 2015

cyu commented Oct 18, 2015

smileart commented Apr 14, 2016

Cannot set Mechanize page via Metadata's page method #42

Cannot set Mechanize page via Metadata's page method #42

Comments

derantell commented Dec 28, 2014

acidghost commented Jul 14, 2015

shashwatsingh commented Sep 5, 2015

cyu commented Oct 18, 2015

smileart commented Apr 14, 2016