Skip to content

[NOSQUASH] Resync with Kubernetes #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Oct 17, 2017
Merged

[NOSQUASH] Resync with Kubernetes #269

merged 26 commits into from
Oct 17, 2017

Conversation

mccheah
Copy link

@mccheah mccheah commented Sep 26, 2017

mccheah and others added 12 commits September 26, 2017 13:10
* Move executor pod construction to a separate class.

This is the first of several measures to make
KubernetesClusterSchedulerBackend feasible to test.

* Revert change to README

* Address comments.

* Resolve merge conflicts.

Move MiB change to ExecutorPodFactory.
…river/executors (#479)

* Added configuration properties to inject arbitrary secrets into the driver/executors

* Addressed comments
* Extract more of the shuffle management to a different class.

More efforts to reduce the complexity of the
KubernetesClusterSchedulerBackend. The scheduler backend should not be
concerned about anything other than the coordination of the executor lifecycle.

* Fix scalastyle

* Add override annotation

* Fix Java style

* Remove unused imports.

* Move volume index to the beginning to satisfy index

* Address PR comments.
* Start unit tests for the scheduler backend.

* More tests for the scheduler backend.

* Unit tests and possible preemptive corrections to failover logic.

* Address PR comments.

* Resolve merge conflicts.

Move MiB change to ExecutorPodFactory.

* Revert accidental thread pool name change
* Use a headless service to give a hostname to the driver.

Required since SPARK-21642 was added upstream.

* Fix scalastyle.

* Add back import

* Fix conflict properly.

* Fix orchestrator test.
…with a concurrent map. (#392)

* Replaced explicit synchronized access to hashmap with a concurrent map

* Removed usages of scala.collection.concurrent.Map
#447)

* Fail submission if submitter-local files are provided without resource staging server URI

* Modified logic to validate only submitted jars; added orchestrator tests

* Incorporated feedback

* Fix failing test case
* Rename package to k8s

* Rename string constants
@mccheah
Copy link
Author

mccheah commented Sep 26, 2017

@ash211 @robert3005

@ash211
Copy link

ash211 commented Sep 26, 2017

Need to cherry pick SPARK-21642 back into this PR since we reverted it earlier.

akitanaka and others added 14 commits September 26, 2017 14:39
…dress

## What changes were proposed in this pull request?

The patch lets spark web ui use FQDN as its hostname instead of ip address.

In current implementation, ip address of a driver host is set to DRIVER_HOST_ADDRESS. This becomes a problem when we enable SSL using "spark.ssl.enabled", "spark.ssl.trustStore" and "spark.ssl.keyStore" properties. When we configure these properties, spark web ui is launched with SSL enabled and the HTTPS server is configured with the custom SSL certificate you configured in these properties.
In this case, client gets javax.net.ssl.SSLPeerUnverifiedException exception when the client accesses the spark web ui because the client fails to verify the SSL certificate (Common Name of the SSL cert does not match with DRIVER_HOST_ADDRESS).

To avoid the exception, we should use FQDN of the driver host for DRIVER_HOST_ADDRESS.

Error message that client gets when the client accesses spark web ui:
javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.102.138.239> doesn't match any of the subject alternative names: []

## How was this patch tested?
manual tests

Author: Hideaki Tanaka <[email protected]>

Closes apache#18846 from thideeeee/SPARK-21642.
* Update POMs

* Update extensions/v1beta1.Deployment to apps

* Modified defaults on rss and ss

(cherry picked from commit 562f301)
(cherry picked from commit 3c7dec5)
* Unit test for executorpodfactory

* Fix test

* Indentation fix

* Fix isEmpty and split between lines

* Address issues with multi-line code fragments

* Replace == with ===

* mock shuffleManager

* .kubernetes. => .k8s.

* move to k8s subdir

* fix package clause to k8s

* mock nodeAffinityExecutorPodModifier

* remove commented code

* move when clause to before{} block

* mock initContainerBootstrap, smallFiles

* insert actual logic into smallFiles mock

* verify application of nodeAffinityExecutorPodModifier

* avoid cumulative invocation

* Fixed env-var check to include values, removed mock for small files

(cherry picked from commit 887fdce)
…ic allocation mode (rebased) (#522)

* Use emptyDir volume mounts for executor local directories.

* Mount local dirs in the driver. Remove shuffle dir configuration.

* Arrange imports

* Fix style and integration tests.

* Add TODO note for volume types to change.

* Add unit test and extra documentation.

* Fix existing unit tests and add tests for empty dir volumes

* Remove extraneous constant

(cherry picked from commit 49932d6)

 Conflicts:
	resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM will merge if build passes

@ash211 ash211 merged commit 0a4e98e into master Oct 17, 2017
@ash211 ash211 deleted the mccheah/resync-kube branch October 17, 2017 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants