Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Problems with webhdfs #50

Open
iaroslav-ai opened this issue Apr 20, 2016 · 7 comments
Open

Problems with webhdfs #50

iaroslav-ai opened this issue Apr 20, 2016 · 7 comments

Comments

@iaroslav-ai
Copy link

iaroslav-ai commented Apr 20, 2016

So far I was not able to use webhdfs with docker version of hadoop [on Ubuntu]. Here is what I tried:

  1. Add a text file at user/root/f.txt :
curl -i -X PUT -T f.txt "http://172.17.0.2:50070/webhdfs/v1/user/root/f.txt?op=CREATE&user.name=root&overwrite=true"
  1. Try reading contents of the file from hdfs:
curl -i -L "http://172.17.0.2:50070/webhdfs/v1/user/root/f.txt?op=OPEN&user.name=root"

For which I get

{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File /user/root/f.txt not found."}}

I tried using 3 different python libraries for webhdfs, but none work either. All of them stop with message similar to

Max retries exceeded with url: /webhdfs/v1/example_dir/example.txt?op=CREATE&user.name=root&namenoderpcaddress=d85d3582cf58:9000&overwrite=false
Failed to establish a new connection: [Errno -2] Name or service not known

when trying to create a file or folder.
I also tried rebuilding the docker image to account for port 9000 not exposed, but that did seem to help.
Am I doing something utterly wrong? I expect this to be likely given that I am a total had00p n00b :)

@yeiniel
Copy link

yeiniel commented Oct 19, 2016

I was trying to use webhdfs too and found a problem. In my case the problem is that webhdfs redirect to a data node every time i try to write to a file. And it seems that the redirection URL use the internal machine name of the docker image (something like a65ec753065c). Any idea about this?

The following is an example request:

curl -i -X PUT -T ~/Downloads/JEA_BLOWER_DEFINITION.csv "http://localhost:50070/webhdfs/v1/user/root/f.txt?op=CREATE&user.name=root&overwrite=true"
HTTP/1.1 100 Continue

HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 19 Oct 2016 03:45:02 GMT
Date: Wed, 19 Oct 2016 03:45:02 GMT
Pragma: no-cache
Expires: Wed, 19 Oct 2016 03:45:02 GMT
Date: Wed, 19 Oct 2016 03:45:02 GMT
Pragma: no-cache
Set-Cookie: hadoop.auth="u=root&p=root&t=simple&e=1476884702571&s=n+WgHqacT3Q5OthGXHXPBtD2YlQ="; Path=/; Expires=Wed, 19-Oct-2016 13:45:02 GMT; HttpOnly
Location: http://a65ec753065c:50075/webhdfs/v1/user/root/f.txt?op=CREATE&user.name=root&namenoderpcaddress=a65ec753065c:9000&overwrite=true
Content-Type: application/octet-stream
Content-Length: 0
Server: Jetty(6.1.26)

@ericjang96
Copy link

I am having the same issue as above, will it be addressed soon?

@pierorex
Copy link

I also have the same problem, already tried all python libraries available.
Did anyone solve this with a magical workaround?

@PhilipMourdjis
Copy link

Not sure how this would translate if using docker-compose but can get this to work using:
docker run -h localhost -p 50070:50070 -p 50075:50075 <<Container_Name>>

@deryrahman
Copy link

@PhilipMourdjis if you're using docker-compose you can put hostname localhost like this:

hadoop:
  image: <image_name>
  hostname: localhost
  ports:
    - 50070:50070
    - 50075:50075

@g10guang
Copy link

Just follow the redirect message Location

@zakicheung
Copy link

Notice:Step 2: Submit another HTTP PUT request using the URL in the Location header (or the returned response in case you specified noredirect) with the file data to be written.
FYI Link

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants