-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Terra Architecture #67
Comments
I see a lot of overlap with Aegir 4 and this proposal. Aegir 4 development is in progress (though we're also working on cleaning up some of the more "soft" stuff at the same time, so progress is admittedly slow). As general background for anyone that's not familiar or wasn't present: Aegir 4 will initially be some components that sit on top of Kubernetes, and will primarily focus on the production deployment story. For the time being, we're completely ignoring the local development story. We're going to be building a REST API that sits on top of the Kubernetes API and will help with access control and the like. Eventually, it'll be the primary mechanism that will allow us to swap out the backing PaaS (for instance, allowing us to use Flynn instead of Kubernetes). We're also eventually going to build a D8-based UI that sits on top of the REST API and is more or less stateless. I mentioned this in Gitter, but saying here for posterity: I'd really love for Terra to focus on the local development story, Aegir 4 to focus on production deployment, and for Terra to integrate with the Aegir API to facilitate moving things from local to dev/stage/prod/other non-local environments, and vice-versa. |
Thanks, @cweagans, I'm excited to work this out. Let me start first by personally apologizing to you and @ergonlogic and any other aegir maintainers for not directly inviting you to contribute on this project first. I've wanted all of your help since the start but never directly asked. The offer stands to give you full commit access if you want to use this project. What we are proposing is to make terra the new "aegir backend". I don't literally want to do all the things. The best part is, it already works. You should try it! It's actually fun to use. We are going to build a KubernetesDriver for terra, which will make Kubernetes just as fun to use. It's almost the same process as using docker-compose: write yml, run command. Should be easy, once we get around to it. ;) If the aegir team adopts terra as the backend, then they are freed up to only deal with the harder and more interesting things about production hosting: massive scaling, monitoring, logging, resilience, integrations. This isn't stuff we want to do just yet, but we do want to guide the building of those kinds of tools in a way that is modular, decoupled and useful for everyone. So, we are definitely focused on the local development story! From the users standpoint, the "local developent story" consists of:
This has already been accomplished with the terra command line interface. It's frankly awesome to use. I have been using to develop drupal sites for weeks now, and it's actually fun to use. On top of that, thanks to the app's .yml file, it is ridiculously easy to add more services to your app or drupal site. For example, with the terra UI prototype built by @jlyon, I was able to add a rabbitMQ server to the mix using a few lines of yaml. Now, let's go through the "production hosting story":
And then the finally, "quality assurance hosting story":
This is an obviously simplified list, but the point is, for all three of "dev", "testing", and "production" stories, we need very similar things. These tasks exist regardless of how it is done, so we built terra around the concept of "apps" and "environments", so we could very easily:
The next step is to make sure these tools help the production hosting story.
Thanks a ton for reaching out. Let's make this happen! P.S. Aegir is a God of the Sea. Terra is the Goddess of the Earth. Makes a nice pair, no? 😄 |
First off - thank you for the invite for commit access. I'll decline right now, but hopefully the option is still there later. I want to focus on the production hosting side of things for now. Secondly, it's been pointed out to me that my directness (particularly in written communication) can come off as me being an asshole. If that's the case with any of my past, present, or future communication, I'm sorry. That's really not my intent. For the record (and thanks, @ergonlogic for pointing me toward the concept), you can assume that I'm operating by Crocker's rules when communicating with me. Regarding the technical points, I see what you're saying, but there are some problems with this approach (using the same tool for every job). I've tried to articulate them before, but reading back, there was some amount of negativity that came across, so I'll try once again from a more neutral standpoint. Dev/prod differencesThe high level concepts for development/production hosting are basically the same, but there are some very important things that are different. For production hosting, my goal is to be able to support 1000 containers running the same stateful application (whether that's Drupal or a game server or something entirely different) along with the application dependencies (mysql, rabbitmq, memcache, whatever). That means the load balancer, database, and file storage components need to be pretty robust and I'd much rather trust something from Google in that regard. They have a ton of man hours spent on figuring this stuff out, and they've done it really well. Too many assumptionsTerra (as it is right now) makes too many assumptions about the application that's going to be running, namely Drupal. I realize that you can run other PHP applications pretty easily if it supports Drupal, but we have other needs too. For instance, we need to support Jekyll, even if only as a proof of concept. Other non-web applications are also a valuable thing to support. Terra also currently assumes that you'll use the Terra containers. That's not always going to be the case, especially as people are using more non-Drupal technologies for development. Jekyll, as I mentioned, immediately comes to mind, but less web-centric applications should be possible too (think game servers or IRC bots or the like). Terra also makes the assumption that you'll be running commands locally (and the canonical storage location for application metadata is on whatever machine Terra is being run on). This is pretty easy to partially work around (just run it on some centralized machine and kick off Terra jobs through the task queue/Terra UI), but one of the things we don't like about Aegir is the directory full of Drush aliases on the Hostmaster server. This seems to just be changing the language of those files. We want the knowledge of what apps are running on a production cluster to be distributed so that that knowledge is highly available by default. If it's files on a disk somewhere, that's a single point of failure. This is part of the reason that we're going with Kubernetes: the knowledge of what is running on Kubernetes is distributed, and if any one of the nodes in that cluster go down, the containers are reassigned, a new master is elected, and things keep running as you'd expect. This is also very important for people running on bare metal. It's inevitable that hardware will eventually need to be retired for whatever reason, and when that time comes, being able to tell Kubernetes about it and have it handle the logistics for you such that when it's done, you can simply unplug the server is a huge plus. Another benefit here is system upgrades. You can do a rolling update - sequentially rebooting to install updates across any number of nodes in a cluster, and as long as Kubernetes is doing it's job, there won't be any downtime. Providing the capability to do a zero-downtime rolling upgrade on a mission-critical piece of your infrastructure is an amazing value-add for ops teams, and isn't done very often because it's sort of like trying to change a tire on your car while you're driving down the highway. PHP queueTerra uses jms/job-queue-bundle. It's good at what it does, but another reason for the Aegir 4 rewrite is to get away from the PHP daemon. It's an endless source of problems when you start doing too many things with it. For extremely large deployments, we also need to be able to concurrently execute tasks. I don't know if you've got the queue bundle set up do that or not, but that's a hard requirement for Aegir 4 in my mind (one of my clients has a hook_cron implementation that queues a bunch of backup tasks at 1am. They aren't even halfway done by 8am, and if they need to do other things with Aegir, that's kind of a problem). The reason I bring that up is because I've experimented with parallel execution in Aegir 2 and 3 and that only exacerbates the problems we run into, particularly around running out of file descriptors and memory usage. Monolithic toolingHaving different tools for different jobs is not a bad thing, as long as the set of tools can be brought together into a system of systems. I think that Terra handles the local development story really, really well. I've played with it and I like it. I don't know if I'm just having some conceptual hang up somewhere or what, but it seems like at the point where you're finished developing something locally, you should hand off the site to I think this would be really easy to do and would simplify Terra a lot. Basically, you could just point Terra at the Aegir API and give it your credentials (or token or whatever we end up using), and run some Terra command to pull down a copy of the site, work on it, and then push it back up to Aegir (and pushing it back to Aegir could mean a development env or something - we'll have that separation too). See also: Unix philosophy. Different tools for different scenarios seems really appropriate here. We also have problems with the current Aegir trying to do everything. It's almost impossible to swap out any one major component without causing a ton of other problems at best, or at worst, the entire setup going up in flames. This plays into the Aegir/Terra duo that you mentioned too - "ship it" is literally the goddess of the earth handing off code to the god of the sea! Workflow assumptionsAs far as I know, Terra currently expects read access on Github repositories to be able to spin up a set of containers. For production hosting in particular, that's not always a valid assumption. Sometimes organizations will want to add a Git remote and push code to Aegir (a la Heroku). Other times, they'll just have specific containers they want to deploy and scale. Other times, they might need SFTP access to their codebase (@mlhess mentioned that this is something they do currently with their Kubernetes installation). Other times, maybe they'll be using Perforce (ugh.) or CVS or whatever. My point is, there are a lot of different workflows we're going to have to support. It may be that there are plans for Terra to eventually support those other workflows and just hasn't gotten around to it yet, so maybe this point is moot. Custom build logicThis might be less of a technical point and more of a preference, but I don't like that Terra has it's own logic for building a container (https://github.com/terra-ops/example-drupal/blob/master/.terra.yml#L2). For web applications, it makes a lot of sense to me to just use Buildpacks. They're generic enough that they can handle pretty much anything out of the box, and if you need to do anything more on top of that, you can use https://github.com/ddollar/heroku-buildpack-multi and specify whatever combination of buildpacks you want. For instance, you can use the PHP buildpack to do most of the config, and then a Terra-Drupal buildpack to further customize that environment if necessary. Config not in the environment by defaultI don't know whether or not you've considered this, but hardcoding the database information in settings.php won't scale well. For a platform like this, it seems like you should spin up a database container and link it to the web container, which will set some environment variables that contain the database connection info. Configuration in the environment is the more future-proof way of handling that kind of thing, and it's likely the direction that we want to require in Aegir 4, particularly because the connection information can change on each deploy (whether or not you want it to - that's a side effect of running your site in containers). Docker compose handles this out of the box, by the way. For example, if you have a container titled "php" which runs php-fpm and exposes port 9000, you can use fcgi://php:9000/$1 as the fcgi proxy URL. Not sure if you'll be able to use that or not, but maybe you'll find that info useful. Production needs scalability, logging, and monitoring OOTBWith production deployment in particular, deploying to Aegir should get you scalability, logging, and monitoring out of the box. It's not something that the developer/admin/ops person/whoever should have to configure. It should "just work" with whatever solutions are in place to solve those problems. Separation of discussionsIt seems like @ergonlogic and I (and others too) are constantly talking about requirements and architecture plans and such in #aegir and you're not there. We need to be better about documenting those things, but I think it would help a lot if your architectural planning were happening at the same time/in the same place as us. Plans for AegirRight now, our plans have Aegir distilled down into just a D8 install profile, a handful of custom modules, and a theme. We don't need to build the REST API, a queue, or really anything else in order to make it work because Kubernetes and Openshift handle all of that. That's a really nice place for us as a project to be in, because we don't have to handle hardly any of the really complicated problems that go along with those components. It also really simplifies upgrades. If the only thing that we're storing is user accounts (and I'm not even sure we'll need to store those. There might be some authorization mechanism in Kubernetes that we can build on top of), then upgrading to D9 will basically be a matter of porting our profile, modules, and theme to D9 and making sure the core user upgrade path is working. Sure, scaling a database is still a problem (making it so that you can do the equivalent of Once those things are solved in a way that supports HA deployments, we likely will never have to deal with it again (and that work will be available to the wider Docker community). This frees up a ton of future time to work on usability and providing nice workflows for everything from a Pantheon-like hosting setup for multiple clients to easily self-hosting 10,000 or more applications for internal use at a given company. Another good thing about this is that some of the code that we'll end up writing in this scenario would be good candidates to include upstream in Kubernetes or Openshift, so the amount of code that we'll end up maintaining will hopefully decrease over time. I know that we initially discussed PaaS agnosticism, but as I get more information and wrap my head around Kubernetes more, the less sure I am that that's a reasonable thing to actively pursue as part of the Aegir project. Kubernetes is, in my opinion, the best tool for the job, and I'd personally be okay just saying "eff it. We'll just require Kubernetes", or at a minimum, only officially supporting Kubernetes and leaving the rest to contrib. I don't know how to word this nicely, so I'll just say it and hope you remember my intent is positive - I think you've been overly dismissive of the complexities of production hosting. I know you know what you're doing, but @ergonlogic and I both (and presumably many others) have specific needs around HA and scalability. For one of my clients, it's an inexcusable thing for their site to be down for any reason, even during a deployment, so we have all kinds of crazy things in place to ensure that it never happens (barring some kind of world-ending event). If you're feeling a lot of resistance to building the production side of things in Terra, this is one of the reasons why. Jon Rudenberg (the Flynn guy) was talking to us over lunch at NYCcamp, and even for him - somebody that does this stuff for what I can only assume is 9-10+ hours per day every day and is really out in the weeds (building all the components of Flynn more or less from scratch) - file storage and HA database management is something that causes him to lose sleep at night. It's a really hard problem to solve and every time I've brought it up with you, I get something to the effect of "Don't worry. We'll figure it out. It's just file storage." I have personally been down this road before (when I wasn't using Aegir) and it's painful and full of frustration, and I cannot stress enough how incredibly important it is that we get it right. I really want Terra to be successful, but other than "stand up a website", I don't see a lot of overlap between the production and local development scenarios, and that might be where our disagreement is. The way I see it, you've been focusing a lot on the local development bits and it's really good. We've been focusing a lot on the HA/scalable production bits and what we've planned will be really good. Later down the road, I'd enjoy helping with Devshop 2 if it's going to be built on top of Aegir 4. The Aegir D8 UI should be more or less stateless (since all of the information about what's deployed on Kubernetes is stored in Kubernetes), so Devshop can exist at the same layer as the Aegir UI, talk to the same API, and do the same things. The UI will just be different - more suited toward the Pantheon-like workflow. In summaryBasically, what I'd like to see happen (and @ergonlogic, feel free to chime in here if you have anything to add):
My personal vision is that we can "sell" Aegir as a package of the following components:
Above all, I don't want you to feel like I personally (and we, as a project) don't value the work you've been doing. I mentioned this a few times in this post, but I think Terra is a really good thing for local dev and I like it. I'm probably going to recommend that we use it internally at NBC for local development, though we might need to resolve a couple of the points I mentioned above before we can commit to that (namely, the assumption that we'll use the Terra containers and the hardcoded db connection info in settings.php. We have our own containers that we'll want to use, as we have some pretty specific requirements around what needs to live inside of them). Ideally, that will be a bridge to using Aegir 4 internally at NBC on the Kubernetes cluster I'm told our ops team is considering, but we'll have to see. |
I think there have been a number of assumptions made here since the beginning that are simply incorrect. Incorrect Assumption:
You can completely replace the docker layout using your app's .terra.yml right now. Choose your own containers. Replace the defaults. Add extras. We plan on making the default compose stack pluggable so it doesn't force the drupal stack by default, but for right now, you can use
Putting this code in your .terra.yml file will replace Incorrect Assumption:
This is out of complete laziness, terra is still a proof of concept. We can randomize the passwords easily. Let's work on this one. Incorrect Assumption:
Like I said above, yes, the default container layout is for drupal, but it will be pluggable very soon. However, the app can override it's containers 100% right now. Use You can swap out your app container for ruby or node, for all we care. You can add another container for memcache, or jenkins, or rabbitMQ, or literally anything else. I'm not sure how I could be more clear on that. Incorrect Assumption:
This was just for research. Turns out RabbitMQ is a much better solution. We've gotten the queue system working with Rabbit here: https://github.com/terra-ops/terra-ui-prototype/blob/master/.terra.yml 6 lines of yml added the rabbitMQ container to our front-end drupal 7 site. Incorrect Assumption:
Well? Something has to have access to the code, regardless if it's deploying it to production or development machines. It doesn't have to be running on the same system that the containers live on if you hook up a remote docker daemon. If this is really a problem we can resolve it one way or another. If kubernetes fixes it, then great! We'll get that solution when the kubernetes driver is completed. Incorrect Assumption:
Wherever you go, there you are. In your production hosting environment, you will be running commands "locally". There will be metadata stored somewhere. If you need this to be distributed, then we will figure out how to distribute it. And again, if kubernetes fixes it, then great! Kubernetes Driver. Incorrect Assumption:
Again, you can use your own. Make your own container. Either rebuild with
This is precisely a "conceptual hang up". Your production system needs a CLI. Using those orchestrators' CLI is not easy. I think the debate here is really about what it is called. We know the components that are needed. We need a CLI. We need a command queue. We need a REST API. However, which tool talks to what API is not irrelevant. We want this stack to be simple and easy to develop so others can contribute, modify and extend.
I'm not completely against this, but... If aegir is supposed to be synonymous with production, then it doesn't make sense to move terra-cli to aegir-project. The cli is used for dev and testing as well. If it were called "aegir-cli", and we told people to use it for local dev, then wouldn't aegir then be claiming to do "all the things"? The interesting thing is, thanks to symfony, you could create an "aegir-cli" project that included the terra commands. See https://github.com/terra-ops/terra-api, which is a full symfony distribution. Wrapping up...My biggest question for you is, if the terra integration with kubernetes works, and you get all the benefits of both, why wouldn't you want to use it? You'd get the CLI, you'd get the Queue, you'd get the REST API, and you'd get PaaS agnosticism right now. On a final note, please don't forget that this entire initiative is pre-release, and in development. Everything you see here can change at any time. Suggest actionable tasks and we will do what we can to make it work for you. However, I would recommend that instead of having larger discussions about the architecture we will build in the future, let's focus on what would you do to change this project to be useful to you, now. We're building really interesting things, right now. Join us and start tinkering! I am inspired by the Agile Manifesto: https://en.wikipedia.org/wiki/Agile_software_development
|
...which is precisely why we need to avoid writing code when we can, and for a lot of these things, we can because Google and Red Hat have done it for us.
Aegir (the larger organizational umbrella) wants to provide a curated set of tools that handles everything from local to production. That doesn't mean we have to build every component from scratch, especially when there are other battle-tested technologies out there that handle many of the difficult things for us. Aegir (the project) - yes, that will be mainly focused on production hosting.
There's no reason to build out the kind of infrastructure that you're planning because it's already been built. Kubernetes and OpenShift completely mitigate the need to roll your own command queue and REST API and orchestration strategies and all that other stuff. It's just handed to you and it works and you don't have to worry about it. Maintenance and testing for those components is "outsourced" to other open source projects backed by major companies that have a strong commercial interest in solving this problem in a way that makes sense for everyone from one man shops to NBC-scale orgs and bigger, and that's exactly what we want to do. You seem unwilling to compromise on your view that Terra should be the gateway to everything, and while I'm not sure why you'd want to go down that road again (that's exactly what Aegir is right now, to a lesser extent - a monolithic pile of code that takes over your entire development process), it's completely fine, and I hope you're successful. Nothing but good things can come from competition. However, we (the Aegir project) have already established the architectural goals that we want to pursue and the method by which we're going to accomplish them, including going through the process of gathering input from the eventual end users of the product, several large universities included. While Terra may eventually solve those problems in a way that make sense for those customers, Kubernetes and Openshift solve them today, and really large "anchor" customers are already on board with using it when it's ready to go (some of them are already using Kubernetes). I've minced words till this point, but I'll just say it now: Terra is not going to be the backend to Aegir because we're aren't going to have our own backend. We're just going to defer to Kubernetes and Openshift for that. If we can do it in a PaaS agnostic way, that's great, but I don't think we're going to go out of our way to support it, as it introduces a lot of complexity. The Aegir frontend will likely expose a limited set of operations via a REST API. If you want to integrate with that through Terra, that'd be cool. If not, we'll probably just end up building our own CLI that's essentially a wrapper to docker-compose and the Aegir REST API to provide local development and nothing else. Where we go from here is really up to you. If you're willing to work with us on the plans that we've already made (again, based on potential customer feedback) and more or less finalized, we'd welcome the help. If you want to keep doing your own thing with Terra, that's fine too. @ergonlogic If you'd like to continue this discussion, feel free, particularly if you want to overrule me on any of the above points. I'm going to bow out because everything that I have to say has been said. |
I'd like to see Ægir, as a software project, become more of a collection of tools, rather than its current monolithic architecture, following the Unix Philosophy: "do one thing and do it well." I'm contributing Rán on the basis that it is complementary to how we see Ægir evolving. I'd love nothing more than for Terra to find such a role within the project. The entire impetus of re-engineering Ægir is to take advantage of existing best-of-breed libre software. As a production-grade container-based hosting system, Kubernetes/Openshift looks pretty feature-complete. I'd very much like to support other backends, especially Flynn, once it is further along. But I believe we're better served contributing to those upstream projects, rather than duplicating efforts. |
Great, thanks, @ergonlogic! I too see terra fitting into aegir as a component. I've never heard of Rán, could you post a link? I have no plans to duplicate the efforts of docker orchestrators directly in terra. I simply want to support the lowest common denominator with docker baseline. Much like we did with Aegir 3 and prior: it works out of the box, has some rudimentary scaling features, but if you wanted to anything more serious, you would have to add in more robust tools. I'm really looking forward to this collaboration. As a first experimental step, I think we can get the aegir3 front end running on terra, and using terra queue without much effort. I created a terra app that tries to install hostmaster, but it fails because the install profile depends on d(). See https://github.com/terra-ops/example-aegir If we removed that dependency on provision, and added a module that sent commands to the RabbitMQ server (similar to albatross_digi's terra,UI module), we would have Aegir 3 able to launch sites on containers in short order! Thanks a lot for your feedback. Keep it coming! |
Just to be clear, I agree with @cweagans' assessment. Our current plan for the next generation of Ægir is to build atop Kubernetes and Openstack. Together these are a feature-complete, production-ready, container-based hosting system. To be honest, I don't see how the current direction of Terra's development fits into that. From what I can see, Terra is trying to do too much. We want small, composable tools built on a solid foundation. Terra appears to be trying to be both the tooling and the foundation in some cases. I think we would find Terra extremely valuable if it provided an automated local docker development environment, because setting that up is itself an annoying problem for every new user to deal with. Something like that would be happily welcomed into the Aegir stack. |
This issue is to document my thoughts on the future architecture of Terra.
"Terra CLI"
The "terra-app" repo will be renamed "terra-cli". This will remain the main tool for the project.
"Terra API" or "Terra UI"
http://github.com/terra-ops/terra-api
This app is a Symfony REST Edition app that provides a basic web UI and REST API for the terra objects. It will be very similar to https://github.com/jonpugh/aegir4
We could add the "job-queue-bundle" to this app (See http://jmsyst.com/bundles/JMSJobQueueBundle). This bundle will handle storing a queue of "terra-cli" commands. It can handle queues, job dependencies, and logging. It has it's own built in web interface for these things that we should leverage.
UPDATE: JobQueue Bundle is out for now. @jlyon created a working setup with RabbitMQ, and I was able to get it working with a RabbitMQ container.
"Terra UI"
@jlyon also created a Drupal 7 module with apps and environments in a few days. They have been using Aegir for a while, so it only took him a few days to build it.
You can launch the Terra UI in terra itself with
The Drupal 7 site interacts directly with the RabbitMQ server via rest. This means that it can be hosted anywhere as long as it can access the RabbitMQ server.
The backend "receiver.php" will become the command
terra queue
which will run continuously. This is equivalent to using thedrush @hostmaster hosting-queued
command.Terra UI Example Apps
Once we have a REST API in place, we can make an example UI app that could be a Drupal site. I imagine we could make a public drupal module providing drupal entities that map to the terra objects and an interface to the Terra API app.
This would open up the possibilities for multiple front-ends built in Drupal, hopefully doing all of the heavy lifting for things like future Aegir 4, devshop 2, WFTools, etc.
Feedback!!!
PLEASE provide your thoughts! We need user feedback so we design a system that works for everyone.
THANKS!
The text was updated successfully, but these errors were encountered: