Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade OpenStack support from AMI compatibility to qcow2 native #22

Merged
merged 1 commit into from
Mar 31, 2017

Conversation

vondrt4
Copy link
Contributor

@vondrt4 vondrt4 commented Apr 28, 2016

Issues fixed:
Output of kernel, init and inithooks correctly displayed in console.
SystemD working again.
Root disk is grown to match flavor size; done without reboot.
Kernel updates possible.

Old AMI version retained as openstack-ami.

Issues fixed:
Output of kernel, init and inithooks correctly displayed in console.
SystemD working again.
Root disk is grown to match flavor size; done without reboot.
Kernel updates possible.

Old AMI version retained as openstack-ami.

# remove systemd (sysvinit used in container)
dpkg --purge systemd-sysv systemd || true
#on the other hand, inithooks needs it and the other implementation is written by Alon Swartz anyway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that inithooks needs ec2metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Several inithooks need the package ec2metadata from your custom repository and include the Python library from it.
The generic package cloud-utils (which gets pulled in as a dependency of cloud-initramfs-growroot, which resizes the root partition, but not yet the file system) also includes ec2metadata, but only the CLI version.
This hack installs both and overwrites the CLI ec2metadata with the TKL package. It is just one file that conflicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

BTW, hopefully Anton will post regarding his inithooks/cloud-init stuff so you can have a look

@JedMeister
Copy link
Member

Great work! Thanks for your contribution! 😄

@JedMeister
Copy link
Member

Just doing a test build now. I had some issues initially but then I read https://github.com/vondrt4/buildtasks/blob/openstack-work/docs/setup Doh! 😄

Thanks again for all the work you've done.

I would really like to get at least one other OpenStack user to test and confirm that they work well for them before we release this as the new improved OpenStack build. I have emailed the other guy I mentioned but haven't heard back yet. So I'm thinking that perhaps we should do a blog post about the work that you've done (and why) and see if we can attract some interest and community involvement!? Would you be interested in writing a guest blog post (or at least some of the content) for us about what you did and the rationale? IIRC you are involved with a hosting firm, is that right? Assuming it is we'd be totally cool for you to give yourself a plug (or I could do it as an introduction)!? What do you think?

@JedMeister
Copy link
Member

Once I worked out the dependency issues I still couldn't get it to build for me... 😢 It seems that when it tried to unmount the loopback device it failed (as it's busy for some reason). :

+ fsck.ext3 -f /dev/mapper/loop0p1
e2fsck 1.42.12 (29-Aug-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/loop0p1: 38329/56112 files (0.4% non-contiguous), 165288/224256 blocks
+ sync
+ kpartx -d turnkey-core-14.1-jessie-amd64.rootfs.img
device-mapper: remove ioctl on loop0p1 failed: Device or resource busy
loop deleted : /dev/loop0
+ cleanup
+ error=1
+ '[' '!' -d /tmp/build-debimg.Q1jZLc ']'
+ return

The comment in the code ("#the next command failed once, so") suggests that you had a similar issue at least once?!

FWIW I resolved it manually with the following:

dmsetup table        
loop0p1: 0 1794048 linear 7:0 2048

dmsetup clear loop0p1 # should work...

# but obviously didn't:
dmsetup table
loop0p1: 0 1794048 linear 7:0 2048

# but this will!:
dmsetup remove loop0p1

dmsetup table
No devices found

Once I did that the build completed fine and I have my qcow2 image (and sig/sum file) ready to go! TBH though I'm not sure on why it had that issue in the first place. Perhaps it was my initial broken build (before I installed all the deps)??

Actually I did have an error right at the end of the process, but I'm not sure if it was coincidental or if it was related. Regardless it seemed to complete ok, just threw an error right at the end:

INFO [generate-signature]: writing signature file
Traceback (most recent call last):
  File "/usr/bin/ec2metadata", line 59, in <module>
    main()
  File "/usr/bin/ec2metadata", line 55, in main
    ec2metadata.display(metaopts)
  File "/usr/lib/python2.7/dist-packages/ec2metadata/__init__.py", line 91, in display
    m = EC2Metadata()
  File "/usr/lib/python2.7/dist-packages/ec2metadata/__init__.py", line 32, in __init__
    raise Error("could not establish connection to: %s" % self.addr)
ec2metadata.Error: could not establish connection to: 169.254.169.254

Weird thing is that it has a zero exit code...

@JedMeister
Copy link
Member

JedMeister commented May 3, 2016

@vondrt4 - if you'd rather discuss the blog post idea privately, please shoot me an email: jeremy AT turnkeylinux.org

@JedMeister
Copy link
Member

Also digging a little deeper it appears that the python error that occurs right near the end is coming from a new feature Alon implemented to check the build environment...

@vondrt4
Copy link
Contributor Author

vondrt4 commented May 3, 2016

I hope it was because of the previous failed build. The debian build script has:
...install... sync e2fsck umount sync kpartx -d
so no other magic there. The device should not be open by anything else.

I put in a cleanup() function in case the shell receives a signal, but maybe if a command was missing, it was terminated another way..

I'll send an e-mail on the other matters today.

@JedMeister
Copy link
Member

Awesome I'll keep an eye out for the email.

FWIW I had a quick chat to Alon about the python stack trace that I got at the end. It's to do with a build environment audit feature that he has implemented. It assumes that TKLDev is being run from an Amazon server (as we do all our builds on AWS). Because it wasn't, it errored out. We'll improve that, but it's nothing to do with your work.

Out of interest I'll do another build tomorrow (with all the deps pre-installed this time) and I expect it to go flawlessly! 😄

@JedMeister
Copy link
Member

JedMeister commented Jan 17, 2017

Hi @vondrt4,

I deeply apologise that this has stalled for so long. I hope you can understand that we are a small team with tons of great ideas and plans, but very limited resources. Unfortunately, this just kept getting pushed aside for other higher priority items (for us).

However, I have started working on our v14.2 release, and would ideally love to include your updated/improved OpenStack build (finally).

Whether or not we push ahead with that depends on your position. If you are still using TurnKey I assume you have been building your own images (using your code)? Or have you moved on altogether and not using TurnKey at all?

[update/addendum]
Also re the inclusion of cloud-init; we have discussed that internally and we have a preference for using @qrntz's approach to cloud-init integration (#26) vs your suggested integration (#23). Primarily that is because it doesn't require tweaking of the inithooks (and thus lower maintenance overhead). Additionally, considering that none of us have OpenStack infrastructure handy for testing, it makes confirmation that all is well quite tricky.

If you were willing to make a commitment to maintain the OpenStack build and the cloud-init inclusion, we would be willing to reconsider that (although I can't offer any guarantees at this point.

Finally; to clarify our position:

If you are ok with using @qrntz's cloud-init integration and will use our pre-built images, we will definitely switch to your OpenStack build for the v14.2 release. We would love further input from you but would have no expectation that you would be responsible for all maintenance.

Alternatively, if you have moved on altogether (and no longer using TurnKey at all) then I'm not 100% sure whether we'll build your OpenStack image at all (FWIW you are the only person who has provided any feedback of significance regarding our current OpenStack build and we've had no feedback from anyone regarding your proposed build).

If you are building your own images currently and wouldn't use our pre-built images unless we implement your cloud-init integration; but you are willing to take the lead on maintenance (of both OpenStack build and cloud-init integration), then we would consider that. If you are unwilling to take on maintenance, then I'm not quite sure what we'll do...

@vondrt4
Copy link
Contributor Author

vondrt4 commented Jan 17, 2017

Hi!
It's nice to hear that you're considering to use the OpenStack work I've done. We are indeed using images built from it at Homeatcloud. You may test them using the Free VPS feature. It is really free (for a week).

I am willing to maintain the OpenStack buildtask as long as we're using it. Maybe even personally, outside of work. It is not a lot of trouble. However, I do not insist on maintaining a fork. We have clients using TKL in the VPS service, where cloud-config is not used at all - the sales portal sends normal inithooks preseeding in user-data. Still, there is the first patch I submitted that adds qcow2 images and a few other things. That is a must.

We have a client who wanted LAMP installed on a physical server. A bug against your installer under UEFI is filled.

Although I don't think we have someone using TKL in Public Cloud, I would still like for TKL to behave like any other cloud image, so that the OpenStack users don't have problems using it. qrntz's approach is not bad, but a few bugs need to be addressed. I have summarized them in my review in his pull request. From my point of view, resizing of the root device is absolutely critical. Setting the hostname is very good to have.

And if cloud-config is used and the user logs in with a user different from root, the user will skip the turnkey-init screens, which means that the appliance will be broken on install, unless preseeding is also present. These are the inithooks changes you are afraid of? I think you could merge them globally and be done with it. It is useful on all other cloud platforms as well. Otherwise, we could declare that this is a corner case - a user who sends cloud-config is a power user and knows what he is doing. I would be fine with leaving it be.. and documenting it carefully.

@JedMeister
Copy link
Member

JedMeister commented Jan 18, 2017

Thanks for your speedy response and apologies on my lengthy one...! 😄

I am willing to maintain the OpenStack buildtask as long as we're using it. Maybe even personally, outside of work. It is not a lot of trouble. However, I do not insist on maintaining a fork. We have clients using TKL in the VPS service, where cloud-config is not used at all - the sales portal sends normal inithooks preseeding in user-data. Still, there is the first patch I submitted that adds qcow2 images and a few other things. That is a must.

Awesome, thanks for that generous assurance. II assume you are referring to this pull request? If so, that's no issue, so long as you are using it, we are more than happy to use your work and switch to producing your OpenStack build.

Although I don't think we have someone using TKL in Public Cloud, I would still like for TKL to behave like any other cloud image, so that the OpenStack users don't have problems using it. qrntz's approach is not bad, but a few bugs need to be addressed. I have summarized them in my review in his pull request. From my point of view, resizing of the root device is absolutely critical. Setting the hostname is very good to have.

I understand and for more advanced/experienced users, that makes complete sense. You want to support them to do things the way that they are used to. It reduces their friction...

Re @qrntz's alternate cloud-init approach, I'm not sure if you saw, but he posted some response to your feedback on his PR - #26 (comment)

And if cloud-config is used and the user logs in with a user different from root, the user will skip the turnkey-init screens, which means that the appliance will be broken on install, unless preseeding is also present. These are the inithooks changes you are afraid of? I think you could merge them globally and be done with it. It is useful on all other cloud platforms as well. Otherwise, we could declare that this is a corner case - a user who sends cloud-config is a power user and knows what he is doing. I would be fine with leaving it be.. and documenting it carefully.

Ah ok, that's a good point, which I hadn't really considered (my experience with servers is VERY TurnKey centric). FWIW AWS require that instances launched from the Marketplace do not have root enabled, so we have an inithook to specifically address that, it's called sudoadmin. Perhaps we could tweak that to be a little more agnostic and somehow get cloud-init to leverage that if/when a non-root user is created via cloud-init? I haven't really looked at the code and have no experience with cloud-init, so perhaps that's a dumb suggestion!? 😄

Regarding merging the changes to all appliances, considering that cloud-init requires ~50MB when installed (including dependencies) and is irrelevant for everything other than cloud builds, we wouldn't want to include it in all builds. I know 50MB isn't a lot these days, but considering that the Core ISO is only just over 200MB, that's a 25% size increase for something that would likely never be used in that scenario.

If we could merge the updated inithooks across all builds, without breaking anything and without requiring installation of cloud-init, then that may be a possibility? Again though I don't know enough about cloud-init and haven't looked close enough at the code to know if that's something worth pursuing or not...?!

Bottom line; given your response, we will plan to switch to your improved OpenStack build (qcow2 format). I will discuss further with my colleagues to see whether there is a consensus on including cloud-init. At the very least, you will be able to use the OpenStack builds we provide (using your buildcode) for your VPS service.

If we decide to not implement your cloud-init integration (and you remain unsatisfied with our implementation) then we will probably just leave cloud-init out for now. That means you will still need to build your own images for your Public Cloud but perhaps you could create a script to directly patch the qcow2, rather than needing to rebuild from scratch?

@vondrt4
Copy link
Contributor Author

vondrt4 commented Jan 18, 2017

OK, I have probably forgotten qrntz's comment. Root resizing with inithooks is fine, just don't forget to include cloud-initramfs-growroot to grow the root partition in the partitioned layout (it's in my first commit). Datasources in per-build files is a good point.
echo "datasource_list: [ConfigDrive, Openstack, Ec2]" >/etc/cloud/cloud.cfg.d/90_dpkg.cfg

About "admin":
The change I referred to when I said "include in all builds" is to run turnkey-init for all possible users. sudoadmin works just for admin, while my patch adds this feature. I didn't mean to add cloud-init to the iso, for example.

As per the log on the second commit, there are changes to turnkey-init-fence and turnkey-sudoadmin would be obsolete:
rm /usr/lib/inithooks/firstboot.d/29sudoadmin

There are two new files:
https://github.com/vondrt4/buildtasks/blob/e8b612020ad68e4c6ea1d8ef409372dc16f2bc38/patches/openstack/overlay/usr/lib/inithooks/firstboot.d/29sudoadmin-fencethemall

https://github.com/vondrt4/buildtasks/blob/e8b612020ad68e4c6ea1d8ef409372dc16f2bc38/patches/openstack/overlay/usr/lib/inithooks/firstboot.d/30turnkey-init-fence

And the last commit includes bug fixes. For example booting the rootfs by label is important, because in OpenStack, you can either use the virtio-blk driver and get /dev/vda1 or the virtion-scsi driver, which produces /dev/sda1.

Removing qcow2 compression is my personal tweak because we have deduplicated image storage. It's bad for general distribution.

lvcreate -Zn - unrelated. Probably something didn't work without it.

commenting out git pull - I have unofficial appliances (e.g. Nextcloud) in my TKLDev and this was not working with them.

And a bug fix for the proposed inithooks change is one more file:
https://github.com/vondrt4/buildtasks/blob/ca9417633c29ba0a17c287e83f53ae8beff8cddd/patches/openstack/overlay/usr/lib/inithooks/firstboot.d/97turnkey-init-fence-disable

@JedMeister
Copy link
Member

JedMeister commented Jan 19, 2017

Thanks for your detailed response including deeper explanation of what you have done and why.

As noted we will replace our existing OpenStack build with yours for the v14.2 release. I will handball this back to @qrntz for further consideration.

[update]

commenting out git pull - I have unofficial appliances (e.g. Nextcloud) in my TKLDev and this was not working with them.

So long as the branch being built has a remote tracking branch configured, then it should work regardless. However, as you noted, commenting it out is a simple workaround! 😄

@JedMeister JedMeister merged commit 124fcd4 into turnkeylinux:master Mar 31, 2017
@JedMeister
Copy link
Member

Finally merged...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants