Skip to content
This repository has been archived by the owner on Nov 29, 2019. It is now read-only.

extract-bib option fails #20

Open
rtalexander opened this issue Oct 31, 2014 · 13 comments
Open

extract-bib option fails #20

rtalexander opened this issue Oct 31, 2014 · 13 comments

Comments

@rtalexander
Copy link

Hi,

Executing pdf-extract as follows:

pdf-extract extract-bib --resolved_references   bibpro.pdf

fails with the following error:

error: input must be an IO-like object or a filename. Use --trace to view backtrace

I added puts input.inspect as instrumentation in object_hash.rb to the extract_io_from(input) method of the class ObjectHash, as follows:

def extract_io_from(input)
  puts input.inspect
  if input.respond_to?(:seek) && input.respond_to?(:read)
    input
  elsif File.file?(input.to_s)
    StringIO.new read_as_binary(input)
  else
    raise ArgumentError, "input must be an IO-like object or a filename"
  end
end

The output emitted was "extract-bib", suggesting that the argument is being misinterpreted to be a file name.

Any thoughts/suggestions on the matter?

Thanks!

@jdherman
Copy link
Contributor

jdherman commented Nov 3, 2014

The version on the package repository doesn't contain the extract-bib option yet. (See #14). If you build the gem yourself it should have it.

@rschwiebert
Copy link

@jdherman How do you build the gem yourself? Are the instructions short enough to explain here?

@jdherman
Copy link
Contributor

Yep, it's pretty easy if you have all of the command line tools. A while back I wrote a blog post with instructions:
https://waterprogramming.wordpress.com/2014/07/24/pdfextract-get-a-list-of-bibtex-references-from-a-scholarly-pdf/

I think this should still work. @kjw, are there plans to bump the version on the package repository?

@rschwiebert
Copy link

@jdherman Thank you! I followed those instructions, and the tool worked beautifully on an article with 50 citations. I need at least a dozen of them, and it was going to be a pain to do them one at a time.

@jdherman
Copy link
Contributor

Great, it's saved me a ton of time too!

@philippoertle
Copy link

When running

pdf-extract extract-bib --resolved_references --trace 10_pdfsam_Halfter_et_al-2015-FEBS_Journal.pdf

I encounter the following error:

/var/lib/gems/2.1.0/gems/pdf-reader-1.2.0/lib/pdf/reader/object_hash.rb:337:in `extract_io_from': input must be an IO-like object or a filename (ArgumentError)
    from /var/lib/gems/2.1.0/gems/pdf-reader-1.2.0/lib/pdf/reader/object_hash.rb:43:in `initialize'
    from /var/lib/gems/2.1.0/gems/pdf-reader-1.2.0/lib/pdf/reader.rb:115:in `new'
    from /var/lib/gems/2.1.0/gems/pdf-reader-1.2.0/lib/pdf/reader.rb:115:in `initialize'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf.rb:168:in `new'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf.rb:168:in `invoke_calls'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf-extract.rb:42:in `block in parse'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf-extract.rb:38:in `each'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf-extract.rb:38:in `parse'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf-extract.rb:53:in `view'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/bin/pdf-extract:115:in `block (4 levels) in <top (required)>'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/bin/pdf-extract:112:in `each'
    from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/bin/pdf-extract:112:in `block (3 levels) in <top (required)>'
    from /var/lib/gems/2.1.0/gems/commander-4.3.5/lib/commander/command.rb:178:in `call'
    from /var/lib/gems/2.1.0/gems/commander-4.3.5/lib/commander/command.rb:178:in `call'
    from /var/lib/gems/2.1.0/gems/commander-4.3.5/lib/commander/command.rb:153:in `run'
    from /var/lib/gems/2.1.0/gems/commander-4.3.5/lib/commander/runner.rb:428:in `run_active_command'
    from /var/lib/gems/2.1.0/gems/commander-4.3.5/lib/commander/runner.rb:68:in `run!'
    from /var/lib/gems/2.1.0/gems/commander-4.3.5/lib/commander/delegates.rb:15:in `run!'
    from /var/lib/gems/2.1.0/gems/commander-4.3.5/lib/commander/import.rb:5:in `block in <top (required)>'

I use pdf-extract 0.1.1 and tried to execute the above error with pdf-reader versions 1.1.1, 1.2.0 and 1.3.3 but always get the above error.

Does anybody have an idea how to solve this?
Best regards

@jdherman
Copy link
Contributor

Hi @philippoertle ... this could be a tricky one, since it's coming from pdf-reader not pdf-extract. I assume it doesn't have anything to do with the extract-bib option specifically.

This is a shot in the dark, but are there problems with the filename? Double check the path, and try removing the numbers from the beginning of the filename so that it starts with a letter.

@AnikoG
Copy link

AnikoG commented Jan 11, 2016

@philippoertle I am pretty sure this is caused by the pdf-reader. You may try this:

1; You may need ttfunk-1.4.0
gem install ttfunk

2; Get the cloned pdf-reader-1.3.3 (and uninstall all other versions)
git clone https://github.com/yob/pdf-reader
cd pdf-reader
gem build pdf-reader.gemspec
gem install pdf-reader-1.3.3.gem # check version number

3; Then get the cloned pdf-extract (not the same as zip)

I hope it would help... btw which ruby version you use?
Bests

@Phyks
Copy link

Phyks commented Jan 19, 2016

Same here:

% pdf-extract extract-bib --resolved_references Gaunt_Hadzibabic.pdf --trace
/home/phyks/.gem/ruby/2.3.0/gems/pdf-reader-1.3.3/lib/pdf/reader/object_hash.rb:337:in `extract_io_from': input must be an IO-like object or a filename (ArgumentError)
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-reader-1.3.3/lib/pdf/reader/object_hash.rb:43:in `initialize'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-reader-1.3.3/lib/pdf/reader.rb:117:in `new'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-reader-1.3.3/lib/pdf/reader.rb:117:in `initialize'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/lib/pdf.rb:168:in `new'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/lib/pdf.rb:168:in `invoke_calls'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/lib/pdf-extract.rb:42:in `block in parse'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/lib/pdf-extract.rb:38:in `each'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/lib/pdf-extract.rb:38:in `parse'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/lib/pdf-extract.rb:53:in `view'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/bin/pdf-extract:115:in `block (4 levels) in <top (required)>'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/bin/pdf-extract:112:in `each'
    from /home/phyks/.gem/ruby/2.3.0/gems/pdf-extract-0.1.1/bin/pdf-extract:112:in `block (3 levels) in <top (required)>'
    from /home/phyks/.gem/ruby/2.3.0/gems/commander-4.3.5/lib/commander/command.rb:178:in `call'
    from /home/phyks/.gem/ruby/2.3.0/gems/commander-4.3.5/lib/commander/command.rb:153:in `run'
    from /home/phyks/.gem/ruby/2.3.0/gems/commander-4.3.5/lib/commander/runner.rb:428:in `run_active_command'
    from /home/phyks/.gem/ruby/2.3.0/gems/commander-4.3.5/lib/commander/runner.rb:68:in `run!'
    from /home/phyks/.gem/ruby/2.3.0/gems/commander-4.3.5/lib/commander/delegates.rb:15:in `run!'
    from /home/phyks/.gem/ruby/2.3.0/gems/commander-4.3.5/lib/commander/import.rb:5:in `block in <top (required)>'
zsh: exit 1     pdf-extract extract-bib --resolved_references Gaunt_Hadzibabic.pdf --trace

Not sure if I should report here or to pdf-reader.

Versions should match what @AnikoG was saying:

% gem list

*** LOCAL GEMS ***

afm (0.2.2)
Ascii85 (1.0.2)
bigdecimal (1.2.8)
commander (4.3.5)
did_you_mean (1.0.0)
hashery (2.1.1)
highline (1.7.8)
io-console (0.4.5)
json (1.8.3)
libsvm-ruby-swig (0.4.0)
mini_portile2 (2.1.0, 2.0.0)
minitest (5.8.3)
net-telnet (0.1.1)
nokogiri (1.6.7.1)
pdf-core (0.6.0)
pdf-extract (0.1.1)
pdf-reader (1.3.3, 1.2.0)
power_assert (0.2.7, 0.2.6)
prawn (2.0.2)
psych (2.0.17)
rake (10.5.0, 10.4.2)
rdoc (4.2.1)
ruby-rc4 (0.1.5)
sqlite3 (1.3.11)
test-unit (3.1.7, 3.1.5)
ttfunk (1.4.0)

Thanks!

EDIT: Oops, missed the fact that pdf-extract version installed with gem install pdf-extract was lacking this feature, as per #14. =(

@philippoertle
Copy link

Hi everybody,

after following the steps described by @AnikoG , the extract-bib argument works just fine for me.
Thanks a lot.

@AnikoG
Copy link

AnikoG commented Jan 19, 2016

@Phyks @philippoertle You're welcome! This tool is really worth of the struggle... : )

@M4lk4v
Copy link

M4lk4v commented Nov 23, 2016

Hi guys!
Don´t know if anyone checks this threat anymore, anyway, I followed the instructions on @jdherman 's blog but I've been unsuccessful in executing pdf-extract extract-bib --resolved_references I'm a total noob in Ruby and in programming in general, so it's possible I'm doing something ridiculously wrong. Here is my error message:
C:\Downloads>pdf-extract extract-bib --resolved_references lol.pdf
C:/Ruby21/lib/ruby/gems/2.1.0/gems/commander-4.4.0/lib/commander/runner.rb:407:i
n block in require_program': program version required (Commander::Runner::Comma ndError) from C:/Ruby21/lib/ruby/gems/2.1.0/gems/commander-4.4.0/lib/commander/ru nner.rb:406:in each'
from C:/Ruby21/lib/ruby/gems/2.1.0/gems/commander-4.4.0/lib/commander/ru
nner.rb:406:in require_program' from C:/Ruby21/lib/ruby/gems/2.1.0/gems/commander-4.4.0/lib/commander/ru nner.rb:52:in run!'
from C:/Ruby21/lib/ruby/gems/2.1.0/gems/commander-4.4.0/lib/commander/de
legates.rb:15:in run!' from C:/Ruby21/lib/ruby/gems/2.1.0/gems/commander-4.4.0/lib/commander/im port.rb:5:in block in <top (required)>'
C:/Ruby21/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in require': 12 6: The specified module could not be found. - C:/Ruby21/lib/ruby/gems/2.1.0/ex tensions/x86-mingw32/2.1.0/rb-libsvm-1.4.0/libsvm/libsvm_ext.so (LoadError) from C:/Ruby21/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in require'
from C:/Ruby21/lib/ruby/gems/2.1.0/gems/rb-libsvm-1.4.0/lib/libsvm.rb:2:
in <top (required)>' from C:/Ruby21/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in require'
from C:/Ruby21/lib/ruby/2.1.0/rubygems/core_ext/kernel_require.rb:55:in
require' from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extrac t/references/score.rb:1:in <top (required)>'
from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extrac
t/references/references.rb:3:in require_relative' from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extrac t/references/references.rb:3:in <top (required)>'
from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extrac
t.rb:10:in require_relative' from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extrac t.rb:10:in <top (required)>'
from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf-extrac
t.rb:1:in require_relative' from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf-extrac t.rb:1:in <top (required)>'
from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/bin/pdf-extrac
t:5:in require_relative' from C:/Ruby21/lib/ruby/gems/2.1.0/gems/pdf-extract-0.1.1/bin/pdf-extrac t:5:in <top (required)>'
from C:/Ruby21/bin/pdf-extract:23:in load' from C:/Ruby21/bin/pdf-extract:23:in

'

C:\Downloads>

I'm running ruby 2.1.9p490 (2016-03-30 revision 54437) [i386-mingw32]. Tried with the same result in the Ruby 2.2 and the latest one. Thanks for any advice!!

@andreifoldes
Copy link

andreifoldes commented May 4, 2018

Same as above, problem appears at the commander gem.

pdf-extract extract-bib --resolved_references --trace /home/sinandrei/Downloads/martins.pdf /usr/lib/ruby/2.3.0/open-uri.rb:225:in 'open_loop': redirection forbidden: http://search.crossref.org/dois?q=%2A+Ali%2C+F.%2C+Amorim%2C+I.+S.%2C+%26+Chamorro-Premuzic%2C+T.+%282009%29.+Empathy+de%EF%AC%81cits+and+trait+emotional+intelligence+in+psychopathy+and+Machiavellianism.+Personality+and+Individual+Differences%2C+47%2C+758%E2%80%93762.&rows=1 -> https://search.crossref.org/dois?q=%2A+Ali%2C+F.%2C+Amorim%2C+I.+S.%2C+%26+Chamorro-Premuzic%2C+T.+%282009%29.+Empathy+de%EF%AC%81cits+and+trait+emotional+intelligence+in+psychopathy+and+Machiavellianism.+Personality+and+Individual+Differences%2C+47%2C+758%E2%80%93762.&rows=1 (RuntimeError)
from /usr/lib/ruby/2.3.0/open-uri.rb:151:inopen_uri'
from /usr/lib/ruby/2.3.0/open-uri.rb:717:in open' from /usr/lib/ruby/2.3.0/open-uri.rb:35:in open'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/references/resolve.rb:14:in find' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/references/resolve.rb:127:in block in find'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/references/resolve.rb:126:in each' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/references/resolve.rb:126:in find'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/references/resolved_references.rb:12:in block (2 levels) in include_in' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:92:in block (3 levels) in call_object_listeners'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:92:in each' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:92:in block (2 levels) in call_object_listeners'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:91:in each' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:91:in block in call_object_listeners'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:90:in each_pair' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:90:in call_object_listeners'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:158:in invoke_calls' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract.rb:43:in block in parse'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract.rb:39:in each' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract.rb:39:in parse'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract.rb:54:in view' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/bin/pdf-extract:121:in block (4 levels) in <top (required)>'
from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/bin/pdf-extract:118:in each' from /var/lib/gems/2.3.0/gems/pdf-extract-0.1.1/bin/pdf-extract:118:in block (3 levels) in <top (required)>'
from /var/lib/gems/2.3.0/gems/commander-4.4.4/lib/commander/command.rb:178:in call' from /var/lib/gems/2.3.0/gems/commander-4.4.4/lib/commander/command.rb:153:in run'
from /var/lib/gems/2.3.0/gems/commander-4.4.4/lib/commander/runner.rb:446:in run_active_command' from /var/lib/gems/2.3.0/gems/commander-4.4.4/lib/commander/runner.rb:68:in run!'
from /var/lib/gems/2.3.0/gems/commander-4.4.4/lib/commander/delegates.rb:15:in run!' from /var/lib/gems/2.3.0/gems/commander-4.4.4/lib/commander/import.rb:5:in block in <top (required)>'
`

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants