Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted HBase regionserver #94

Open
krisskross opened this issue Nov 23, 2015 · 6 comments
Open

Unwanted HBase regionserver #94

krisskross opened this issue Nov 23, 2015 · 6 comments

Comments

@krisskross
Copy link

I noticed that the HBase region server installation gets messed up when doing a manual amb-shell installation using a json gist.

source ambari-functions
amb-start-cluster 3
amb-shell

blueprint add --url https://gist.githubusercontent.com/krisskross/901ed8223c1ed1db80e3/raw/869327be9ad15e6a9f099a7591323244cd245357/ambari-hdp2.3
cluster build --blueprint hdp-2.3
cluster assign --hostGroup master --host amb1.service.consul
cluster assign --hostGroup slave_1 --host amb2.service.consul
cluster create

host list
amb1.service.consul [ALERT] 172.17.0.79 centos6:x86_64
amb2.service.consul [ALERT] 172.17.0.80 centos6:x86_64

First, one extra container 12311dd1655c gets created that i'm not sure is needed?

$ sudo docker ps

CONTAINER ID        IMAGE                         COMMAND                CREATED             STATUS              PORTS                                                              NAMES
12311dd1655c        sequenceiq/ambari:2.1.2-v1    "/bin/sh -c /tmp/amb   42 minutes ago      Up 42 minutes       8080/tcp                                                           loving_stallman     
ff51cc267878        sequenceiq/ambari:2.1.2-v1    "/start-agent"         43 minutes ago      Up 43 minutes       8080/tcp                                                           amb2                
b50f6b429c61        sequenceiq/ambari:2.1.2-v1    "/start-agent"         43 minutes ago      Up 43 minutes       8080/tcp                                                           amb1                
78b6c91713ae        sequenceiq/ambari:2.1.2-v1    "/start-server"        43 minutes ago      Up 43 minutes       8080/tcp                                                           amb-server          
1169a087ce4a        sequenceiq/consul:v0.5.0-v6   "/bin/start -server    43 minutes ago      Up 43 minutes       53/tcp, 53/udp, 8300-8302/tcp, 8400/tcp, 8301-8302/udp, 8500/tcp   amb-consul          

The HBase installation creates two instead of one region servers on the slave. This messes with regions in transition and the servers generally unstable.

amb2.node.dc1.consul
amb2.service.consul

I noticed that "node.dc1" comes from the start-agent and start-server scripts, but i'm not sure they are to blame. Any way, the amb2.node.dc1.consul region server must go.

@krisskross
Copy link
Author

Nevermind container 12311dd1655c. Noticed that its the actual ambari-shell.

@krisskross
Copy link
Author

I tried to remove node.dc1.consul (below) from the start-agent and start-server script but that made cluster creation to fail.

# --dns isn't available for: docker run --net=host
# sed -i /etc/resolf.conf fails:
# sed: cannot rename /etc/sedU9oCRy: Device or resource busy
# here comes the tempfile workaround ...
local-nameserver() {
  cat>/etc/resolv.conf<<EOF
nameserver $BRIDGE_IP
search service.consul node.dc1.consul
EOF
}

@krisskross
Copy link
Author

Tried to install HBase manually afterwards with the same result, i.e. regions in transition causing region servers to halt.

@krisskross
Copy link
Author

The HBase logs indicate that hostnames are playing tricks on Zookeeper and Master+RS. Hosts themselves seems to think they're on *.service.consul, while externally thinking one another is on *.node.dc1.consul. Anybody know the relevance of node.dc1.consul and if there's some way to disable it?

ping amb1 from amb1.service.consul

PING amb1.service.consul (172.17.0.89) 56(84) bytes of data.
64 bytes from amb1.service.consul (172.17.0.89): icmp_seq=1 ttl=64 time=0.059 ms

ping amb2 from amb1.service.consul

PING amb2.service.consul (172.17.0.90) 56(84) bytes of data.
64 bytes from amb2.node.dc1.consul (172.17.0.90): icmp_seq=1 ttl=64
time=0.069 ms

ping amb1 from amb2.service.consul

PING amb1.service.consul (172.17.0.89) 56(84) bytes of data.
64 bytes from amb1.node.dc1.consul (172.17.0.89): icmp_seq=1 ttl=64
time=0.070 ms

ping amb2 from amb2.service.consul

PING amb2.service.consul (172.17.0.90) 56(84) bytes of data.
64 bytes from amb2.service.consul (172.17.0.90): icmp_seq=1 ttl=64 time=0.054 ms

@krisskross
Copy link
Author

Ok, so I got it working by removing the nameserver from /etc/resolv.conf and adding the correct hostnames to /etc/hosts on both servers.

Still, is it safe to remove the nameserver? What function/importance does it have?

@krisskross
Copy link
Author

For a local development environment that is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant