Skip to content

Releases: mesosphere/marathon

v1.4.1

20 Feb 14:40
Compare
Choose a tag to compare

Fixes #5211 - Re-enabling PUT on /v2/apps

Known issues

1.4.0

17 Feb 19:47
Compare
Choose a tag to compare

Changes from 1.3.x to 1.4.0

Breaking Changes

Plugin API has changed

In order to support the nature of pods, we had to change the plugin interfaces in a backward incompatible fashion.
Plugin writers need to update plugins, in order to use this version.

  • There is a new NetworkSpec plugin interface that may be of interest for Mesos network module writers.
  • Some existing plugin APIs were modified in support of the new pods primitive (see Overview/Pods).

Health reporting via the event stream

Adding support for pods in Marathon required the internal representation of tasks to be migrated to instances. An instance represents the executor on the Mesos side, and contains a list of tasks. This change is reflected in various parts of the API, which now accordingly reports health status etc for instances, not for tasks.
Until v1.3.x, Marathon published health_status_changed_events via the event stream. With the introduction of instances that can contain multiple tasks, Marathon moved away from that event in favor of instance_health_changed_events.
In case you were consuming that event you have to adjust your tooling to consume the introduced event instead, e.g.

{
    "instanceId": "some_app.marathon-49d976d3-9c6f-11e6-93cb-0242216b9f0d",
    "runSpecId": "/some/app",
    "healthy": true,
    "runSpecVersion": "2016-10-18T10:42:47.499Z",
    "timestamp": "2016-10-27T18:00:50.401Z",
    "eventType": "instance_health_changed_event"
}

Accordingly, the failed_health_check_event now reports an instanceId instead of a taskId:

{
    "instanceId": "some_app.marathon-49d976d3-9c6f-11e6-93cb-0242216b9f0d",
    ...
    "eventType": "failed_health_check_event"
}

This change affects the following API primitives in a similar way:

  • unhealthy_instance_kill_event (in favor of the previous unhealthy_task_kill_event) provides both the instanceId of the instance that got killed, as well as the taskId designating the task that failed health checks.
  • Health information as reported via the apps and tasks endpoint.

Overview

Pods

A pod is a collection of co-located and co-scheduled containers in a shared context.
The containers of a pod share a network namespace and may share access to the same filesystem(s).
Each pod instance’s containers are individually resource-isolated.

Mesos 1.1 adds support for launching a group of tasks (LAUNCH_GROUP).
A pod instance’s containers are launched via this Mesos primitive.
Mesos provides the executor implementation that Marathon will use to run pod instances.

We created a new primitive, PodDefinition, as well as new API endpoints.
Read more about to use pods in our Pods Documentation,
and the /v2/pods section of the REST API Reference

Pods are implemented as a new primitive in Marathon.
The general functionality of apps plus the related endpoints are still available.

Mesos-based health checks for HTTP, HTTPS, and TCP

Health checks are an integral part of application monitoring and have been available in Marathon since version 0.7.
At the time that health checks were first added to Marathon, there was no support for health checks in Mesos.
Prior to the availability of Mesos-based health checks, health checks were only performed directly in Marathon. This has the following consequences:

  • Marathon has to share the same network as the tasks to monitor, so it can reach all launched tasks
  • Network partitions could lead to wrong scheduling decisions
  • The health state is not available via the Mesos state
  • Marathon health checks do not scale to large numbers of tasks.

Starting with Mesos 1.1, it is now possible to perform network based health checks directly on the Mesos executor level.
Marathon makes all the Mesos-based health checks available.
See the updated Health Check Documentation,
especially the new protocols: MESOS_HTTP, MESOS_HTTPS, MESOS_TCP.

We strongly recommend Mesos-based health checks over Marathon-based health checks.
Marathon-based health checks are deprecated and will be removed in a future version.

New ZK persistent storage layout

ZooKeeper has a limitation on the number of nodes it can store in a directory node.
Until version 1.3, Marathon used a flat storage layout in ZooKeeper and encountered this limitation with large installations.
The latest version of Marathon uses a nested storage layout, which significantly increases the number of nodes that can be stored.

ZooKeeper has a limitation on the size of one node (typically 1MB).
In prior versions, a group was stored with all subgroups and applications.
This could lead to a node size larger than 1 MB, which could not be stored.
The latest version of Marathon stores a group only with references in order to keep node size under 1 MB.

A migration inside Marathon automatically migrates the prior layout to the new one.

Improve Task Lost behaviour

The connection between the Mesos master and an agent can be broken for several reasons (network partition, agent update, etc).
When this happens, there is limited knowledge of the status of the agent's tasks.
Prior versions of Mesos declared such tasks as lost after a timeout and killed the tasks if the agent rejoins the cluster.

Starting with Mesos 1.1, those task are declared unreachable, not lost.
The scheduler that launched the tasks decides how to handle unreachable tasks.

Marathon uses this feature and adds an unreachableStrategy to the AppDefinition and PodDefinition, which allows you to define:

  • inactiveAfterSeconds: how long Marathon should wait to start a replacement task.
  • expungeAfterSeconds: how long Marathon should wait for a task to come back.

If a task comes back and the replacement task is already started, Marathon needs to decide which task to kill.
In order to let the user define which task should be taken, a kill selection can be defined.

Insights into the Launch Process - AKA: Why isn't my app starting?

Marathon tries to schedule tasks based on app or pod definition, which incorporates resource matching, role matching, constraint matching etc.
There are situations when Marathon cannot fulfill a launch request, since there is no matching offer from Mesos.
It was very hard for users to understand why Marathon could not fulfill launch requests.
For users that run into such situations, it was very hard to understand the reasons for this.
This version of Marathon gives insight into the launch process, analyzes all incoming offers and gives the user
statistics so it easy to see, why offers were rejected.

The statics can be fetched via the /v2/queue endpoint. See the REST API Reference.
Marathon shows the offer matching process as a funnel, so it easy to see how many offers were rejected in which step.
It gives this information for the whole launch attempt as well as the last offer cycle.

Improve Deployment logic

During Marathon master failover all deployments are started from the beginning.
This can be cumbersome if you have long-running updates and a Marathon failover.
This version of Marathon reconciles the state of a deployment after a failover.
A running deployment will be continued on the new elected leader without restarting the deployment.

Every state change operation via the REST API will now return the deployment identifier as an HTTP response header.

Deprecations

Deprecate Marathon-based Health Checks

Mesos now supports command-based as well as network-based health checks.
Since those health check types are now also available in Marathon, the Marathon-based health checks are now deprecated.
Do not use health checks with the following protocols: HTTP, HTTPS, and TCP. Instead, use the Mesos equivalents: MESOS_HTTP, MESOS_HTTPS and MESOS_TCP.

Deprecate Event Callback Subscriptions

Marathon has two ways to subscribe to the internal event bus:

  • HTTP callback events managed via /v2/eventSubscriptions
  • Server Send Events via /v2/events (since Marathon 0.9)

We encourage everyone to use the /v2/events SSE stream instead of HTTP Callback listeners.
The event callback subscriptions will be removed in a future version.

Deprecate Artifact Store

The artifact store was introduced as an easy solution to store and retrieve artifacts and make them available in the cluster.
There is a variety of tools that can handle this functionality better then Marathon.
We will remove this functionality from Marathon without replacement.

Deprecate PATCH semantic for PUT on /v2/apps

A PUT on /v2/apps has a PATCH like semantic:
All values that are not defined in the json, will not update existing values.
This was always the default behaviour in Marathon versions.
For backward compatibility, we will not change this behaviour, but let users opt in for a proper PUT.
The next version of Marathon will use PATCH and PUT as two separate actions.

Forcefully stop a deployment

Deployments in Marathon can be stopped with force.
All actions currently being performed in Marathon will be stopped; the state will not change.
This can lead to an inconsistent state and is dangerous.
We will remove this functionality without replacement.

Deprecated command line parameters

  • Removed the deprecated marathon_store_timeout command line parameter. It was deprecated since v0.12 and unused.
  • Mark task_lost_expunge_gc as deprecated, since it is not used any longer
  • The command line flag max_tasks_per_offer is deprecated. Please use max_instances_per_offer.
  • The deprecated command line flag enable_metrics is removed. Please use the toggle metrics and disable_metrics
  • The deprecated command line flag `ena...
Read more

v1.4.0-RC9

17 Feb 12:55
Compare
Choose a tag to compare
v1.4.0-RC9 Pre-release
Pre-release
  • 58e40a8 Fixes #5198 - converts iterator to seq (Johannes Unterstein)
  • 1ba8ea7 Add support for PATCH updates to apps in 1.4 (#5183) (Matthias Eichstedt)
  • abb58a1 Mark tests as unstable ... (#5202) (Matthias Eichstedt)
  • 02142d7 Links fixed in REST API documentation page. (#5060) (Armand Grillet)
  • 1d3b395 Default values for query parameter need to be of type String. (Matthias Veit)
  • 226e17f Fixes #5137 by assigning a name to the default port and refining validation logic for resource changes of resident apps. (Matthias Veit)
  • 9019527 Use the latest stable Marathon UI build (#5181) (Orlando Hohmeier)
  • d6fa3fd Fixes #4901 by removing the short circuit for root group changes. (#5179) (Matthias Veit)

v1.4.0-RC8

11 Feb 14:07
Compare
Choose a tag to compare
v1.4.0-RC8 Pre-release
Pre-release
  • Fixes #5076 - Pod validation of MaxPer constraint
  • Fixes #5107 - Improve performance of zookeeper layer and groups (D481)
  • Fixes #5087 - Generate DiscoveryInfo for pod container endpoints
  • Fixes #5117 - Clarify rexray documentation
  • Fixes #5144 - Define network-scope label for ipaddress.discovery.ports
  • Fixes #5083 - Increase queue length for storage operations, helps large migrations
  • Fixes #5116 - Pods allow duplicate endpoint ports
  • Fixes #5084 - Doc link updates
  • Fixes - Improve performance of dependency graph computations (D476)
  • Improvement #5157 - PUT on /v2/apps has a PATCH semantic
  • Improvement - NetworkSpec plugin API (D490)

v1.3.10

07 Feb 22:53
Compare
Choose a tag to compare
  • 21d4d70 Fixes #4948 | Lazy event parsing in HTTP callbacks. (#5114) (janisz)
  • 8b0ee47 undo def -> lazy val toProto change (Tim Harper)
  • a09ea81 Improve performance of zookeeper layer and groups (Tim Harper)
  • 17a082e Fixes #4978 | AppDefinition.Conteiner validation (#4989) (janisz)
  • 1613ae1 Fixes #4948 | Lazy event parsing (#4986) (janisz)
  • 553b27f Initial stab at making deployment plans cheaper. Back port of https://phabricator.mesosphere.com/D476 (Johannes Unterstein)
  • 1027968 A group is accessible, if the group is selected, a subgroup is selected or a pod/app in that group is selected. (Matthias Veit)

v1.1.7-dcos

03 Feb 19:04
Compare
Choose a tag to compare

Changes from 1.1.5 to 1.1.7

  • Performance enhancements to the dependency graph calcuation algorithm for deployments.
  • Fixes bug in which Marathon scheduled tasks ceased to work after 248.5 days of uptime (#5097)
  • Fixes dependencies validation issue in which it was too restrictive (#5024)

v1.1.7

03 Feb 18:47
Compare
Choose a tag to compare

Changes from 1.1.5 to 1.1.7

  • Performance enhancements to the dependency graph calcuation algorithm for deployments.
  • Fixes bug in which Marathon scheduled tasks ceased to work after 248.5 days of uptime (#5097)
  • Fixes dependencies validation issue in which it was too restrictive (#5024)

v1.3.9

31 Jan 10:22
Compare
Choose a tag to compare
  • 25dd12d Fixes #5024 by using the correct validator for validating app dependencies. (#5027) (Matthias Veit)
  • 1dc09c4 Embed build badge for new releases/1.3 pipeline. (jeschkies)
  • 149eb6d Fix deployments example that shows a wrong readiness check result. (Matthias Veit)
  • 16f75f2 Fix unclosed code tag (#4997) (Drew Kerrigan)

v1.4.0-RC7

27 Jan 15:57
Compare
Choose a tag to compare

Includes patches:

  • 69d4b26 Fixes #4991 by specifically testing, if a property in the docker section is set, but is not supported.
  • 91cdc7e add a docker container label with Mesos task id (#4492)
  • bd1c69a Make sure the HistoryActor only stores task failures once.
  • cf3d544 Remove authorization from /ping endpoint
  • ef13b44 Make TaskResources async (#4807)
  • c52b8dd Use async instance tracker methods in TaskKiller (#4800)
  • 79fe121 Close #4722 by moving event subscription of the InstanceKillProgressActor.
  • 195295e Upgrade low-risk libraries to the latest patch release (#4983).
  • a1d7b56 Fixes #5016 by validating, if the app id is non empty.
  • bd3096e Fixes #4966 by using the resulting boolean value instead of mapping into the scallop option.
  • 6834f41 Fixes #5024 by using the correct validator for validating app dependencies.

v1.3.8

20 Jan 20:04
Compare
Choose a tag to compare
  • 3e46f65 updated mesos-util version to 1.0.2 (#5000) (Johannes Unterstein)
  • 3181e5e Define pipeline for 1.3. (#4992) (Karsten Jeschkies)
  • ab0269f Prevent Migration if the StorageVersion is too new (#4968) (Jason Gilanfarr)
  • 7388151 Use the correct highlighter supported in gh-pages (Matthias Veit)
  • 62eed41 Updated doc building to use github_pages jekyll and fixes (#4588) (Kyle Anderson)
  • c7b018b Fixed DC/OS link to dcos.io. (#4979) (Joerg Schad)
  • c4cfc17 bumped version to 1.3.7 (Johannes Unterstein)