Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kata Containers metrics CI Jenkins slave request #83

Open
grahamwhaley opened this issue Aug 30, 2018 · 43 comments
Open

Kata Containers metrics CI Jenkins slave request #83

grahamwhaley opened this issue Aug 30, 2018 · 43 comments

Comments

@grahamwhaley
Copy link

Please fill out the details below to file a request for access to the CNCF Community Infrastructure Lab. Please note that access is targeted to people working on specific open source projects; this is not designed just to get your feet wet. The most important answer is the URL of the project you'll be working with. If you're looking to learn Kubernetes and related technologies, please try out Katacoda.

First and Last Name

Graham Whaley

Email

[email protected]

Company/Organization

Intel

Job Title

Senior Software Engineer

Project Title

Kata Containers

Briefly describe the project

Open source multi architecture community collaboration to develop virtual machine based container runtimes and deliver their integration into common container infrastructures and orchestration (OCI, Docker, Kubernetes etc.)

Which members of the CNCF community and/or end-users would benefit from your work?

The obvious member is Kubernetes, who already work closely in conjunction with the Kata Containers community to ensure Kubernetes and virtual machine container runtimes are a natural and compatible fit.

Is the code that you’re going to run 100% open source? If so, what is the URL or URLs where it is located? What is your association with that project?

Yes, 100% open source and up on github:
https://github.com/kata-containers

What kind of machines and how many do you expect to use (see: https://www.packet.net/bare-metal/)?

Prediction is 2x t1.small.x86 machines, running 24/7-ish. Our current jenkins CI backlog across all the repositories pretty much consumes one whole machine (and it is not compute bound).

We can start/trial with just one t1.small.x86 (for PR CI), and later add another to support master branch merge regression checking.

What OS and networking are you planning to use (see: https://help.packet.net/technical/infrastructure/supported-operating-systems)?

I would expect Ubuntu 18.04

Please state your contributions to the open source community and any other relevant initiatives

Previously having worked on a new architecture addition to the Linux kernel which eventually made it into the upstream, for the last 2+ years I have been focussed on the open source Clear Containers (https://github.com/clearcontainers), now Kata Containers.

Any other relevant details we should know about?

I expect us to tie the machines as Jenkins slaves into our existing Jenkins master at http://jenkins.katacontainers.io/, and dedicate them to metrics CI builds only.

Kata Containers is umbrella'd under the OpenStackFoundation, but is not part of the OpenStack project.

@dankohn
Copy link
Contributor

dankohn commented Aug 30, 2018

Could you please describe what you'd like to actually do? We're open to supporting you, but would like to confirm that Intel or OpenStack infrastructure cannot meet your needs.

@jacobsmith928
Copy link

@dankohn if this use case (e.g. ongoing CI infra) doesn't fit well into the CIL, we can work with @grahamwhaley separately on an arrangement.

@dankohn
Copy link
Contributor

dankohn commented Aug 30, 2018

@grahamwhaley Thanks for the reference to kata-containers/ci#6

I checked with @jacobsmith928 and we have a thumbs up for you to go forward.

My request (both for Community Infrastructure Lab policy and for best practice) is that you make 100% of your continuous integration code open source (other than confidential tokens, obviously).

+1

@grahamwhaley
Copy link
Author

That's fantastic news @dankohn @jacobsmith928
I know you've probably gotten the details from kata-containers/ci#6, but I'll drop a summary here for the record.

For Kata Containers we have a set of metrics tests that we'd like to run in a CI to both:

  • check for regressions on each new PR request/update
  • log the results for master branch merges so we can collect and plot/analyse historical data

Due to the nature of the majority of the tests, we need to run these in a reproducible manner (otherwise we cannot regression check or compare over time), and that thus mandates either bare metal machines or dedicated cloud servers (that support nested VMs), with no noisy neighbour effects etc.

We have struggled to find any suitable hardware so far, and hence the request here. OSF does not have access to such hardware.
We plan to run this all under Jenkins. These machines will be new dedicated metrics slaves controlled by the existing kata Jenkins CI master (that is hosted under the OSF resource kindly donated by vexxhost).

All of the code and configs will be fully open sourced. Almost all of it is already open:

and we intend to publish all the Jenkins details/configs we can, like we already publish all the Jenkins configs (apart from the secrets ;-) ) for the parallel QA CI: https://github.com/kata-containers/ci/tree/master/jenkins

Thanks!

@taylorwaggoner
Copy link
Contributor

@grahamwhaley - I have invited you to the Kata Containers project in Packet. Please let me know if you have any questions!

@grahamwhaley
Copy link
Author

Thanks @taylorwaggoner I've accepted the invite, and can see the Kata Containers project within the CNCF org on packet.net.
It's my first time deploying on packet.net, so I need to go do a bit of readup on how to set up the deployment and then how we tie that into Jenkins (if we have a nodepool for instance etc.).
That'll take me a little bit of time. I'll post a status update here when things are up (or stuck ;-) ).

Many thanks everybody!

@jacobsmith928
Copy link

@grahamwhaley definitely ping @vielmetti or me in our community slack and we can help as needed.

@vielmetti
Copy link
Collaborator

A specific issue we are tracking is JCLOUDS-1219 , and its related Github issue jclouds/jclouds-labs#337

vielmetti referenced this issue in jenkinsci/jclouds-plugin Aug 31, 2018
@grahamwhaley
Copy link
Author

I just noticed on slack that the t1.small.x86 machine now come in two flavours (4 and 8 core?). Given I need repeatability of metrics tests in order for the CI to spot regressions, and the fact that right now I cannot get jenkins to deploy a packet machine on-demand via the jclouds plugin, I think the prudent way forwards is to deploy and assign a t1.small.x86 machine 24/7 to the task of Kata metrics CI.
Any objections to that on resource sharing/utilisation/cost grounds etc.? @dankohn @jacobsmith928 @vielmetti

@jacobsmith928
Copy link

@grahamwhaley yes that is fine.

@grahamwhaley
Copy link
Author

Hi.
I'm seeing more variance (noise) in the metrics results than I expected on the t1.small.x86 machine. It also seems that the machine is running Kata itself very slowly (8s to get into a container, rather than the <1.5s I was expecting).
Can I request access to an x86 machine from the next tier up (I would grab the name, but I'm having difficulty getting to the packet web pages with that info right now - it might be the c1.small ?), the goals being:

  • confirm if the variance and slowness is specific to that machine type/tier
  • If not, then debug using the new machine, whilst leaving the t1.small running the CI builds still)
  • if the problems do only manifest on the t1.small.x86, then it is likely I will request if we can move from the t1.small to the next tier.

Many thanks.

@dankohn
Copy link
Contributor

dankohn commented Oct 4, 2018 via email

@jacobsmith928
Copy link

I would recommend a c2.medium.x86 (if you need more cores) or a c1.small.x86 if you just need faster cores.

@lukaszgryglicki
Copy link
Member

I've found arm64 one VERY fast and cheap - but all your packages/tools need to support ARM then.
It has 96 core for instance and 128G of memory.

@grahamwhaley
Copy link
Author

Thanks. We don't generally need amazing speed or number of cores (I test on my desk with an i5 2/4 core NUC for instance), which is why I thought we'd be fine on the t1.small. I'll start with the c1.small and see what I find.
@lukaszgryglicki - our sw stack up does support ARM (and IBM Power and Z as well as x86). I'll /cc in @kalyxin02 here for reference, who has been running up the Jenkins QA CI on ARM for Kata. Some of that, or closely related, work I believe is via Packet/WorksOnARM. Once the QA CI is stable on ARM then I'd expect we would move to looking at a metrics CI setup as well.

Thanks folks - and the speedy replies appreciated.

@grahamwhaley
Copy link
Author

Update time then.
We are still utilising a t1.small 24/7 for our jenkins metrics tracking (http://jenkins.katacontainers.io/computer/x86_packet01/builds), now running the jenkins jobs on the bare metal (which involves us trying to keep the machine clean after each run, which is fun). I'd move to an 'on demand' model if we had a way to deploy from Jenkins (jclouds plugin still not working for packet.net afaik).
I had another t1.small up for quite some time whilst I was debugging some of the jenkins hangups we had on that machine. I've taken that down now.
I've just run up a c1.small 24/7 as another jenkins slave (http://jenkins.katacontainers.io/computer/x86_packet_elk01/) to track master branch merges, with the intention of injecting the results into an ELK stack for metrics tracking of the project over time. Again, we'd do that on-demand if we had a method via Jenkins.
I should also note I put together some ansible scripts to deploy the slaves in the correct configurations to run our Jenkins CI slave tasks.

@grahamwhaley
Copy link
Author

Update. I'm going to move the Kata PR metrics CI slave from a t1 to a c1 instance. The results from the t1 have ended up being just too 'noisy' to make reasonable regression checks, and the c1 looks to produce much more stable results (for our Kata tests at least). For reference, a couple of examples of our memory footprint and 'boot container' measures on the two systems. Note, in this instance it is not so much the absolute figures obtained, but the repeatability between runs that matters.

footprint

time

@dankohn
Copy link
Contributor

dankohn commented Mar 4, 2019

+1. Thanks for letting us know.

@grahamwhaley
Copy link
Author

Hi. Can I request we add another Kata member to the CNCF/Kata org on packet.com so we have more than one point of failure (me :-) ) for (re-)creating instances? I'd like to suggest we add @chavafg, who is the high level owner of the Kata CI systems. Let me know if you'd like me to open a fresh Issue for this.
And, continued many thanks for access to the resources :-).

@taylorwaggoner
Copy link
Contributor

@grahamwhaley I've added [email protected] to the Kata Containers project in Packet. Thanks!

@chavafg
Copy link

chavafg commented Mar 30, 2020

Hi @taylorwaggoner,

I logged into Packet, but seems that I still cannot see Kata project. I only see a Create Organization option. Any idea? is there something else I should do?

Thank you!

@taylorwaggoner
Copy link
Contributor

@chavafg I believe you should have received an invitation to that specific Packet project. You would need to click the link in the email to accept the invitation. Did you do that? Thanks!

@chavafg
Copy link

chavafg commented Mar 30, 2020

hmm, searching through my inbox (and junk email) I can't see it. Could you please help me re-sending the invitation? Thanks :)

@taylorwaggoner
Copy link
Contributor

Please confirm that [email protected] is the correct email address. I tried to resend it and got an error message that it was unable to send, so I'm guessing the invitation also did not go through the first time I tried.

@chavafg
Copy link

chavafg commented Mar 30, 2020

yes, that is the correct email address: [email protected]

BTW, I created the packet account using that email last Friday, and remember that I had to use the re-send confirmation email as it didn't arrive the first time.

Thanks for your help.

@vielmetti
Copy link
Collaborator

@chavafg I resent the invitation at about 4:22 pm Eastern on 2020-03-30 (i.e. just now), let me know when you are in.

@chavafg
Copy link

chavafg commented Mar 30, 2020

@taylorwaggoner
I just received the invitation and could access the Kata project.
Thanks for your help :)

@chavafg
Copy link

chavafg commented Mar 30, 2020

@vielmetti, thanks both, I am now in :)

@devimc
Copy link

devimc commented Apr 24, 2020

Hi everybody, I'm starting to play with VFIO/SRIOV in Kata Containers. I already have some VFIO tests using virtio devices, I was planning to add more VFIO tests but now with real hw (gpus, nics, etc), I was wondering if the metrics node could be upgraded to a node that supports SRIOV/VFIO with an extra NIC or GPU, this way I could use it to test VFIO/SRIOV.
Thanks in advance, any comment/help would be appreciated.

@chavafg
Copy link

chavafg commented Jul 29, 2020

Hello,

Can I request access to another member of the Kata team to the CNCF/Kata org on packet.com? @grahamwhaley is now retired and I am the only point of contact for this, so would like to have someone else accessing the servers in case I am unavailable. I'd like to suggest we add @amshinde - [email protected].

Thank you very much for all your support.
/cc @vielmetti @taylorwaggoner

@taylorwaggoner
Copy link
Contributor

@chavafg I've invited [email protected] to the Kata project in Packet. Thanks!

@amshinde
Copy link

thanks @taylorwaggoner

@chavafg
Copy link

chavafg commented Jul 29, 2020

thanks @taylorwaggoner :)

@chavafg
Copy link

chavafg commented Jan 26, 2021

Hello,

I would like to check with if you are ok with us deploying an additional c1.small.x86 server for running our metrics CI for the 2.x branch of Kata.
We currently use one c1.small.x86 for the 1.x branch, but now we are in the phase of supporting 1.x and 2.x versions of Kata for a period of 6 months. For us it would be better to have another machine so we do not pollute the environments between both kata versions.
After those 6 months we plan to deprecate 1.x and we will be able to shutdown one of the 2 machines.

Thanks in advance for your support.
cc @taylorwaggoner @vielmetti @dankohn @jacobsmith928

@vielmetti vielmetti reopened this Jan 27, 2021
@vielmetti
Copy link
Collaborator

Sounds like a good approach to me, @taylorwaggoner @idvoretskyi can you confirm?

@taylorwaggoner
Copy link
Contributor

Sounds good to me @vielmetti

@chavafg
Copy link

chavafg commented Feb 4, 2021

@vielmetti @taylorwaggoner thanks for your support. I have deployed the new server.

@idvoretskyi
Copy link
Member

This can be closed.

@vielmetti vielmetti added this to the Kata Containers milestone Oct 21, 2021
@vielmetti vielmetti reopened this Nov 30, 2022
@vielmetti
Copy link
Collaborator

@chavafg @jeefy

Reopening this to handle a data center migration task.

There is a single machine currently in use in the Kata Containers project, "kata-metric6", in our SJC1 data center. That data center is closing.

We have capacity in our SV data center (Silicon Valley, same metro) available for you to set up a new system in.

Our hardware options have changed somewhat, and the legacy c1.small system you have is no longer in our current stock. I would recommend one of our m3.small systems as a likely option as an alternative.

thanks!

@vielmetti vielmetti added the IBX Migration Moving to new Equinix IBX data centers label Nov 30, 2022
@vielmetti
Copy link
Collaborator

Bringing this to the attention of @GabyCT who hopefully can direct appropriately.

@vielmetti
Copy link
Collaborator

The Kata Containers data center migration per above has completed successfully.

There's one more administrative thing to do, to move this project from CNCF sponsorship to OpenInfra Foundation sponsorship. No action is necessary on your part at this time, it's fundamentally an accounting issue and not a technical issue. When the time comes I'll coordinate with the OpenInfra team to take over administration of the account. There should be no need to change any of the machines.

@vielmetti vielmetti removed the IBX Migration Moving to new Equinix IBX data centers label Aug 24, 2023
@idvoretskyi
Copy link
Member

A kind check with @vielmetti if any progress has happened here :)

@vielmetti
Copy link
Collaborator

@idvoretskyi Meeting scheduled for later this week to discuss, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants