Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error crawling URLs #707

Open
saloneerege opened this issue Oct 12, 2015 · 4 comments
Open

Error crawling URLs #707

saloneerege opened this issue Oct 12, 2015 · 4 comments

Comments

@saloneerege
Copy link

I get an error in the Crawl log after starting the crawl as follows:
~/miniconda3/envs/memex/lib/nutch ~/memex-explorer/source
Injecting seed URLs
/home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds
Error running:
/home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds
Failed with exit value 127.
~/memex-explorer/source
~/miniconda3/envs/memex/lib/nutch ~/memex-explorer/source
Injecting seed URLs
/home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds
Error running:
/home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds
Failed with exit value 127.
~/memex-explorer/source
~/miniconda3/envs/memex/lib/nutch ~/memex-explorer/source
Injecting seed URLs
/home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds
Error running:
/home/salonee/miniconda3/envs/memex/lib/nutch/bin/nutch inject /home/salonee/memex-explorer/source/resources/crawls/armslist/crawldb /home/salonee/memex-explorer/source/resources/crawls/Armslist/seeds/seeds
Failed with exit value 127.
~/memex-explorer/source

@brittainhard
Copy link
Contributor

Can you do me a favor and type printenv in your terminal and paste the output?

@saloneerege
Copy link
Author

HOME=/home/salonee
SHLVL=1
LANGUAGE=en_US
GNOME_DESKTOP_SESSION_ID=this-is-deprecated
CONDA_ENV_PATH=/home/salonee/miniconda3/envs/memex
LOGNAME=salonee
COMPIZ_BIN_PATH=/usr/bin/
XDG_DATA_DIRS=/usr/share/ubuntu:/usr/share/gnome:/usr/local/share/:/usr/share/
QT4_IM_MODULE=xim
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-D3UDCG61nA
CONDA_DEFAULT_ENV=memex
LESSOPEN=| /usr/bin/lesspipe %s
INSTANCE=
TEXTDOMAIN=im-config
XDG_RUNTIME_DIR=/run/user/1000
DISPLAY=:0
XDG_CURRENT_DESKTOP=Unity
GTK_IM_MODULE=ibus
LESSCLOSE=/usr/bin/lesspipe %s %s
TEXTDOMAINDIR=/usr/share/locale/
COLORTERM=gnome-terminal
XAUTHORITY=/home/salonee/.Xauthority
_=/usr/bin/printenv

@brittainhard
Copy link
Contributor

My gut instinct here is that you do not have JAVA_HOME set on your path. There's some documentation on how to do this here: http://wiki.apache.org/nutch/NutchTutorial. Look at the "Verifying your Nutch Installation" section. I think you can ignore the part about messing with etc/hosts

@ahmadia
Copy link
Contributor

ahmadia commented Oct 17, 2015

I agree with Brittain, this looks symptomatic of JAVA_HOME not being set. I think this is something where Nutch itself could be more robust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants