-
Notifications
You must be signed in to change notification settings - Fork 4
Error creating Kubernetes cluster #116
Comments
You are missing many settings. here is an example that works... photon cluster create -t foo -p dev -n seven -k KUBERNETES -m service-master-vm -W cluster-worker -d service-vm-disk -w 2b5b0ad7-059d-4670-908f-909117f5ce62 -c 4 --dns 10.0.7.1 --gateway 10.0.7.1 --netmask 255.255.255.0 --master-ip 10.0.7.9 -master-ip2 10.0.7.8 --load-balancer-ip 10.0.7.5 --container-network '10.2.0.0/16' --number-of-etcds 3 --etcd1 10.0.7.20 --etcd2 10.0.7.21 --etcd3 10.0.7.22 --ssh-key ~/.ssh/id_rsa.pub --batchSize 1 --registry-ca-cert ~/foo.com.crt That bold section is where you specify what network to use. I think yours is failing to find the default network. Or maybe set your photon console network to default like so: |
Thanks for replying @pompomJuice , but even explicit specifying the Network ID wit th -w switch still got the smae error: "Unsupported operation GET_NETWORKS". Any other ideas? |
@tnbaeta No problem. Past your entire log and command please. This is a process of elimination. |
I'm getting the same error, after digging around the photon-controller logs I see the following error:
I'm using the following create command My host has 16GB so there should be enough, I even tried setting the cluster-small flavor to use 1GB of ram only
I am running the latest version of ESXI 6.5 |
Do you have any other VMs on the host? The scheduler does not use active memory in deciding VM placement, it looks at the configured memory for all VMs - whether powered on or not - in determining if there is available resource. No overcommit allowed. |
@mwest44 ah okay that makes sense, I've got a quite a few others running, I'll spin up another host to test, thanks |
#LIV2 Yes. What I do is I take stock of all my physical resources, then program that into photon at a say 5/1 contention ratio. Meaning you enter the resources what you have times 5. Otherwise virtualization benefits are lost, |
@pompomJuice How do you configure photon to overprovision? I haven't been able to find how to do so in the documentation. |
in 1.1.1 it is not possible, in 1.2.0 there is a config file on the ESXi host that you can update to enable over-provisioning, I'm currently on the road not able to look at the code to double check, will update the thread once I find the value. |
@LIV2 It's not the official way, as you can see there is no official documented way that I know of. I will probably replace this technique with @schadr method. But you can just set the quotas 'photon tenant quota' and 'photon project quota'. If you set your tenant to say have 5 times more resources than it actually has, you will be able to deploy workers that are over provisioned. ESxi will handle it automatically. But performance issues will be more difficult to debug because of noisy neighbors and such so just understand the consequences. |
Hi, I got Photon Controller 1.2, ESXis 6.5.0 (Build 4887370). When I run : DNS is the Lightwave's IP address. Gateway, is my gateway address (actually a pfsense used to route/firewall/dhcp in a test env). Others are juste random IPs in 10.0.0.0/16 I got the same issue :
Can't find anything else in the logs. Here is my config :
Any ideas ? I keep on reading logs... |
Let me try and break it down, maybe we can find the issue.
My advice is recheck your gateway and dns settings so that they are on the same network as 10.0.40/24. I would guess those settings to be |
I'm able to provision single hosts with these settings. (EDIT : I mean, a single VM, coming from the officiel PhotonOVA, automatically getting an IP from my dhcp (pfsense)) My network is a /16, not /24 (mask 255.255.0.0, you've pasted it), so 10.0.40.1 should actually be able to join 10.0.0.1. I'll try with a /24 anyway. I didn't copy/paste all different commands i tried before posting, but the batch-size option does not change anything. From now, I'll add it systematically. |
Aah I see. I missed the /16. Let me rethink this. |
No problem, thank you for your answers :) I'm stayin tuned (and keep on searching and making tests...) |
Ok your 10.0.0.0/16 wont work. Your DHCP server wont know how to provision the worker nodes. That's what I am sensing here. Because how would the DHCP know that the kubernetes worker nodes need to go to the 10.0.40.0/16 network. Those get random MACs generated (that you might be able to detect with some pattern, then your DCHP might work). Secondly, photon-controller does not have a setting for kubernetes cluster IP. And those are set to 10.0.0.0/24. So I am not sure if that will clash with your 10.0.0.0/16 network. |
But with regards to point 1, I don't think those worker nodes come into play yet in your situation. So am not so sure about that. What I do know is that you really need to have your network config set up right. And the documentation does not cover that part. It does not mention for example that your kubernetes network needs an DHCP that they don't provide. |
But I just don't understand the need of DHCP if we specify static IP addresses ? Is that for further containerized applications ? 'cause we manually specify static addresses for etcd, master and load-balancer... I don't know if I was clear before but I'm able to provision single docker host from Photon Controller, with PhotonOS OVA, and it automatically gets an IP. My PFSense gives IP on 10.0.0.0/16 network and delivers leases between 10.0.50.0 and 10.0.100.255. I actually have a working DHCP in my 10.0.0.0/16 private subnet. So, I'm not sure i understood. What is exactly your advice ? Just forget about /16 and only use /24 ? |
Apologies, I got that backwards. photon worker nodes require a DHCP server, not the kubernetes network. I got confused there for a second. Because when you specify a photon worker count of say 2, kubernetes spawns it's pods via the photon-controller interface on top of these workers. photon-controller's cloud config for workers sets them to get DHCP (unlike for it's masters and etcd's who get a static configuration as dictated by the photon setup yaml). And that DHCP lease they receive must be the same as the rest of your photon network. |
That photon DHCP lease must also provide your kubernetes cluster DNS ip, which you must set manually on both sides. If 10.0.0.0/24 clashes with your network I have no idea how that would affect your install, but I had a clash and the install worked the cluster's dns was just completely messed up. Kubernetes dns services were not working because pods inherit resolv.conf from their docker host and those in turn are provided by this DHCP config that you are missing. |
Okay, acknowledged, it's for the workers. But i just don't get "your kubernetes cluster DNS ip, which you must set manually on both sides."... What ? How can I manually set an address on a non-existing machine ? I can't predict its MAC so I can't use a static DHCP lease It is still failing with the following My DHCP now delivers addresses between 10.0.40.0 and 10.0.100.255. When you say "10.0.0.0/24 clashes with your network", you were talking about my container network ? I changed this. Still the same error. |
Your container network must be /16. we use Flannel to handle the container networks. This /16 is carved up into individual /24 networks for each worker node. |
Kubernetes has this contruct of a cluster ip. It uses this IP to route things around internally with iptables. The ip therefor does not exist in a physical device. Those routing rules do not work when one of your interfaces also thinking it can provide 10.0.0.0/24. iptables is a chain and if the packet does not reach the kubernetes nat chains because some forward table consumed the packet because one of the interfaces can also provide it. In my case all calls from kubernetes's containers to reach kubernetes DNS (statically set to 10.0.0.10 in photon-controller's case) ended up at our company DNS, 10.0.0.10. ( it was forwarded over the nic instead of reaching the nat table) |
That should work. As long as your routers make everyone on that network able to communicate. I am not so clued up on iptables /16 network nic routing match behaviour (clearly) but from what I understand it should work. Because maybe iptables sends those packets to your gateway instead of just communicating on the same network. |
Still failing with DHCP does not change anything. Nor using /16 instead of /24 for the container network. Mmh I don't really understand what's happening here. My router :
My lab is only composed of 1 ESXi. For test purposes. What I meant is that EVERYTHING is local here : PFSense acts as firewall, router, DHCP for 10.0.0.0/16. The nested ESXis are also in this 10.0.0.0/16 (and only in this subnet). |
I thought I read somewhere that nested ESXi wont work. Or I have definitely read that nested ESXi causes some problems for some kind of cluster solution. Can't remember if it was tectonic or photon. |
Mmmmh okay... But only for clusters ? Because every other actions seems to be successful. Well, thanks for your answers. I'm staying tuned if someone else has ideas. |
I can't remember. Al I remember is that nested ESXi jammed some network construct. |
I have followed the step-by-step guide on how to create a Kubernetes cluster in Photon Platform (https://github.com/vmware/photon-controller/wiki/Creating-a-Kubernetes-Cluster) and I got en error essentially saying "Unsupported operation GET_NETWORKS". I am using Mac OS for deployment. Has anyoine saw this before?
Follow the output of the command:
./photon service create -n kube-socrates -k KUBERNETES --master-ip 10.1.0.200 --load-balancer-ip 10.1.0.201 --etcd1 10.1.0.202 --dns 10.1.0.137 --gateway 10.1.0.2 --netmask 255.255.255.0 -c 1 --vm_flavor cluster-small
Error: photon: Task 'be72ba9d-73f1-4200-9c44-4038fab7c48a' is in error state: {@step=={"sequence"=>"1","state"=>"ERROR","errors"=>[photon: { HTTP status: '0', code: 'InternalError', message: 'Failed to rollout KubernetesEtcd. Error: MultiException[java.lang.IllegalStateException: VmProvisionTaskService failed with error [Task "GET_NETWORKS": step "GET_NETWORKS" failed with error code "StateError", message "Unsupported operation GET_NETWORKS for vm/8988e61a-4685-4f94-8e44-7f5aebfed6a6 in state ERROR"]. /photon/servicesmanager/vm-provision-tasks/48ea197554eca473c1ee3]', data: 'map[]' }],"warnings"=>[],"operation"=>"CREATE_KUBERNETES_SERVICE_SETUP_ETCD","startedTime"=>"1494005568944","queuedTime"=>"1494005568912","endTime"=>"1494005578947","options"=>map[]}}
API Errors: [photon: { HTTP status: '0', code: 'InternalError', message: 'Failed to rollout KubernetesEtcd. Error: MultiException[java.lang.IllegalStateException: VmProvisionTaskService failed with error [Task "GET_NETWORKS": step "GET_NETWORKS" failed with error code "StateError", message "Unsupported operation GET_NETWORKS for vm/8988e61a-4685-4f94-8e44-7f5aebfed6a6 in state ERROR"]. /photon/servicesmanager/vm-provision-tasks/48ea197554eca473c1ee3]', data: 'map[]' }]
The text was updated successfully, but these errors were encountered: