Skip to content

Commit

Permalink
Merge branch 'main' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
johnflavin committed Aug 21, 2024
2 parents b337942 + b3029ca commit a93b5fe
Show file tree
Hide file tree
Showing 5 changed files with 58 additions and 86 deletions.
25 changes: 23 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,44 @@
# Changelog

## 3.5.0
Not yet released.
[Released](https://bitbucket.org/xnatdev/container-service/src/3.5.0/).

* **Improvement** [CS-946][] Prevent setting mutually distinct k8s PVC mounting options
* **Bugfix** [CS-966][] Ensure tracking of container IDs in workflow tables in a Kubernetes environment
* **Bugfix** [CS-968][] Switch the docker API library we use from [docker-client][] to [docker-java][].
This should restore CS functionality on docker engine v25 and higher.

### A Note About Our Docker Library
Originally we used the [spotify/docker-client][] library to wrap the docker remote API in java method calls. They stopped updating that and put out their final release [v6.1.1][] in 2016.

We switched the Container Service to use a fork of that client, [dmandalidis/docker-client][] in CS version 3.0.0. Given that this was a fork of the client we already used, it was a simple drop-in replacement with no changes needed.

But that library maintainer did continue to make changes. In 2023 they released a major version upgrade, [v7.0.0][], which dropped support for Java 8. That is the version of Java we use in XNAT (at time of writing) so this change meant we weren't able to update our version of this library. That was fine for a while...
...Until version 25 of the docker engine, in which they made an API change which caused an error in the version we used of `docker-client`. The library (presumably) fixed their issue but we weren't able to use that fix because our version of the library was frozen by their decision to drop Java 8 support.

This forced us to switch our library from `docker-client` to [docker-java][]. This was not a drop-in replacement, and did require a migration. All the same docker API endpoints were supported in a 1:1 replacement—which took a little effort but was straightforward—except for one. The `docker-java` library did not support requesting `GenericResources` on a swarm service, which is the mechanism by which we allow commands to specify that they need a GPU. We opened a ticket reporting that lack of support (https://github.com/docker-java/docker-java/issues/2320), but at time of writing there has been no response. I created a fork (https://github.com/johnflavin/docker-java) and fixed the issue myself (https://github.com/docker-java/docker-java/pull/2327), but at time of writing that also has no response. I built a custom version of `docker-java` `3.4.0.1` and pushed that to the XNAT artifactory ([ext-release-local/com/github/docker-java][]).

Long story short, as of CS version `3.5.0` we depend on `docker-java` version `3.4.0.1` for our docker (and swarm) API support.

[CS-946]: https://radiologics.atlassian.net/browse/CS-946
[CS-966]: https://radiologics.atlassian.net/browse/CS-966
[CS-968]: https://radiologics.atlassian.net/browse/CS-968
[docker-client]: https://github.com/dmandalidis/docker-client
[docker-client]: https://github.com/spotify/docker-client
[spotify/docker-client]: https://github.com/spotify/docker-client
[v6.1.1]: https://github.com/spotify/docker-client/releases/tag/v6.1.1
[dmandalidis/docker-client]: https://github.com/dmandalidis/docker-client
[v7.0.0]: https://github.com/dmandalidis/docker-client/tree/v7.0.0
[docker-java]: https://github.com/docker-java/docker-java
[ext-release-local/com/github/docker-java]: https://nrgxnat.jfrog.io/ui/repos/tree/General/ext-release-local/com/github/docker-java

## 3.4.3
[Released](https://bitbucket.org/xnatdev/container-service/src/3.4.3/).

* **Improvement** [XNAT-7903][] Support shared data at project level
* **Bugfix** [CS-945][] Fix race condition on overlapping container status events

[CS-945]: https://radiologics.atlassian.net/browse/CS-945
[XNAT-7903]: https://radiologics.atlassian.net/browse/XNAT-7903

## 3.4.2
[Released](https://bitbucket.org/xnatdev/container-service/src/3.4.2/).
Expand Down
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ plugins {
}

group "org.nrg.xnatx.plugins"
version "3.5.0-SNAPSHOT"
version "3.5.0"

sourceCompatibility = 1.8
targetCompatibility = 1.8
Expand Down
10 changes: 8 additions & 2 deletions src/main/java/org/nrg/containers/api/KubernetesInformerImpl.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import io.kubernetes.client.openapi.models.V1ObjectMeta;
import io.kubernetes.client.openapi.models.V1Pod;
import io.kubernetes.client.openapi.models.V1PodList;
import io.kubernetes.client.openapi.models.V1PodSpec;
import io.kubernetes.client.openapi.models.V1PodStatus;
import io.kubernetes.client.util.CallGeneratorParams;
import lombok.extern.slf4j.Slf4j;
Expand Down Expand Up @@ -223,6 +224,10 @@ private static KubernetesStatusChangeEvent createEvent(V1Pod pod) {
timestamp = null;
}

// Node
final V1PodSpec podSpec = pod.getSpec();
final String nodeName = podSpec == null ? null : podSpec.getNodeName();

return new KubernetesStatusChangeEvent(
jobNameFromPodLabels(pod),
objName(pod),
Expand All @@ -232,7 +237,8 @@ private static KubernetesStatusChangeEvent createEvent(V1Pod pod) {
containerState,
containerStateReason,
exitCode,
timestamp
timestamp,
nodeName
);
}

Expand Down Expand Up @@ -276,7 +282,7 @@ public void onUpdate(V1Pod oldObj, V1Pod newObj) {
return;
}

log.debug("Pod {} updated", newEvent.podName());
log.debug("Pod {} updated", newEvent.getPodName());
triggerEvent(newEvent);
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,42 +1,24 @@
package org.nrg.containers.events.model;

import lombok.Value;
import org.nrg.containers.model.kubernetes.KubernetesPodPhase;

import java.time.OffsetDateTime;
import java.util.Collections;
import java.util.Map;
import java.util.Objects;

@Value
public class KubernetesStatusChangeEvent implements ContainerEvent {
private final String jobName;
private final String podName;
private final String containerId;
private final KubernetesPodPhase podPhase;
private final String podPhaseReason;
private final KubernetesContainerState containerState;
private final String containerStateReason;
private final Integer exitCode;
private final OffsetDateTime timestamp;

public KubernetesStatusChangeEvent(String jobName,
String podName,
String containerId,
KubernetesPodPhase podPhase,
String podPhaseReason,
KubernetesContainerState containerState,
String containerStateReason,
Integer exitCode,
OffsetDateTime timestamp) {
this.jobName = jobName;
this.podName = podName;
this.containerId = containerId;
this.podPhase = podPhase;
this.podPhaseReason = podPhaseReason;
this.containerState = containerState;
this.containerStateReason = containerStateReason;
this.exitCode = exitCode;
this.timestamp = timestamp;
}
String jobName;
String podName;
String containerId;
KubernetesPodPhase podPhase;
String podPhaseReason;
KubernetesContainerState containerState;
String containerStateReason;
Integer exitCode;
OffsetDateTime timestamp;
String nodeId;

@Override
public String backendId() {
Expand Down Expand Up @@ -82,48 +64,4 @@ public String details() {
}
return details;
}

public String podName() {
return podName;
}

public String containerId() {
return containerId;
}

@Override
public String toString() {
return "KubernetesStatusChangeEvent{" +
"jobName=\"" + jobName + "\"" +
", podName=\"" + podName + "\"" +
", containerId=\"" + containerId + "\"" +
", podPhase=\"" + podPhase + "\"" +
", podPhaseReason=\"" + podPhaseReason + "\"" +
", containerState=\"" + containerState + "\"" +
", containerStateReason=\"" + containerStateReason + "\"" +
", exitCode=\"" + exitCode + "\"" +
", timestamp=\"" + timestamp + "\"" +
"}";
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
KubernetesStatusChangeEvent that = (KubernetesStatusChangeEvent) o;
return Objects.equals(jobName, that.jobName) &&
Objects.equals(podName, that.podName) &&
Objects.equals(containerId, that.containerId) &&
podPhase == that.podPhase &&
Objects.equals(podPhaseReason, that.podPhaseReason) &&
containerState == that.containerState &&
Objects.equals(containerStateReason, that.containerStateReason) &&
Objects.equals(exitCode, that.exitCode) &&
Objects.equals(timestamp, that.timestamp);
}

@Override
public int hashCode() {
return Objects.hash(jobName, podName, containerId, podPhase, podPhaseReason, containerState, containerStateReason, exitCode, timestamp);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -861,22 +861,29 @@ public void processEvent(final ContainerEvent event) {
KubernetesStatusChangeEvent kEvent = ((KubernetesStatusChangeEvent) event);

// Check if we need to set additional ids
boolean shouldUpdatePodName = containerWithAddedEvent.podName() == null && kEvent.podName() != null;
boolean shouldUpdateContainerId = containerWithAddedEvent.containerId() == null && kEvent.containerId() != null;
if (shouldUpdatePodName || shouldUpdateContainerId) {
boolean shouldUpdateNode = containerWithAddedEvent.nodeId() == null && kEvent.getNodeId() != null;
boolean shouldUpdatePodName = containerWithAddedEvent.podName() == null && kEvent.getPodName() != null;
boolean shouldUpdateContainerId = containerWithAddedEvent.containerId() == null && kEvent.getContainerId() != null;
if (shouldUpdateNode || shouldUpdatePodName || shouldUpdateContainerId) {
Container.Builder builder = containerWithAddedEvent.toBuilder();

if (shouldUpdateNode) {
log.debug("Container {} for job {}: setting nodeId to node {}",
container.databaseId(), container.jobName(), kEvent.getNodeId()
);
builder.nodeId(kEvent.getNodeId());
}
if (shouldUpdatePodName) {
log.debug("Container {} for job {}: setting taskId to pod name {}",
container.databaseId(), container.jobName(), kEvent.podName()
container.databaseId(), container.jobName(), kEvent.getPodName()
);
builder.taskId(kEvent.podName());
builder.taskId(kEvent.getPodName());
}
if (shouldUpdateContainerId) {
log.debug("Container {} for job {}: setting containerId to container id {}",
container.databaseId(), container.jobName(), kEvent.containerId()
container.databaseId(), container.jobName(), kEvent.getContainerId()
);
builder.containerId(kEvent.containerId());
builder.containerId(kEvent.getContainerId());
}
containerEntityService.update(fromPojo(builder.build()));
containerWithAddedEvent = retrieve(container.databaseId());
Expand Down

0 comments on commit a93b5fe

Please sign in to comment.