-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple JVMs per machine with mutliple Ports #627
Comments
Hello @fstab @tomwilkie, We have an agent, to which you can give range of ports as an config and it will try to start webservers on a port in this range. Current agent is very simple:
I had to "mess it up" a bit, in order to accommodate some retries and backoffs. Before I can submit a PR, I need to go trough an internal review process. My question is, what form and shape should the contribution be? Right now the code changes are in jmx_prometheus_javaagent_common. I have preserved the original code flow when you define 1 port. Would prefer to have the changes https://github.com/prometheus/jmx_exporter/blob/main/jmx_prometheus_javaagent_common/src/main/java/io/prometheus/jmx/JavaAgent.java or perhaps create a different class or even put it to a different module? |
Indeed having multiple webservers is a similar issue like i describIed. But I think you still need to know the multiple ports in advance? |
Yes you need to know the range you're targeting, have some headroom there and set the rest of the system to that range. In the spark-submit extra java options the command went from There will be some wasted ports in this solution. |
Not sure I agree with the approach. The general approach is to use the Pushgateway and push metrics for these types of use cases (https://prometheus.io/docs/instrumenting/pushing/) If you are submitting a Spark job where the Java agent HTTP port is dynamic, how are you monitoring it? With that said... I feel a better implementation would be...
... but again, I don't agree with the approach. |
I recall thinking about something a bit more smarter than just going one-by-one from
The way it is then scraped will depend on infrastructure layout. In our case, there is no direct communication between EC2 instance and Prometheus. We need to explicitly allow communication for it. It isn't and cannot be truly dynamic as you need to know beforehand which ports to allow in Security Groups / ACLs. |
I mostly forgot about this issue. Thanks for bringing it up again. If some one else is having the same issue and willing to implement this workaround I might be able to share the code. |
I was wondering if this could be achieved with service location protocol?
I'm trying to get a good grasp of java's jmx, rmi and slp, but I believe jmx servers could advertise their service URL with port to an RMI server and then from that server jmx-exporter could find those services and their ports and use them. |
Hi
I am intending to monitor Apache Spark Executor processes with the jmx_exporter. It turns out that in our setup where we configure a certain executor size in terms of CPU and memory and nodes in the cluster can have multiple of those the start of the JVM process with the agent attached failed. This is due to the static config which tries to reuse the same port over and over again on the the physical node (Kubernetes pod in our case).
Log output when starting minimal docker-compose example:
worker1 | 21/08/14 12:31:52 INFO ExecutorRunner: Launch command: "/usr/local/openjdk-15/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=43441" "-javaagent:/opt/prometheus/jmx_prometheus_javaagent-0.16.1.jar=12345:/opt/prometheus/config.yaml" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://[email protected]:43441" "--executor-id" "1" "--hostname" "172.100.100.3" "--cores" "1" "--app-id" "app-20210814123152-0000" "--worker-url" "spark://[email protected]:36191" worker1 | 21/08/14 12:31:52 INFO ExecutorRunner: Launch command: "/usr/local/openjdk-15/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=43441" "-javaagent:/opt/prometheus/jmx_prometheus_javaagent-0.16.1.jar=12345:/opt/prometheus/config.yaml" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://[email protected]:43441" "--executor-id" "0" "--hostname" "172.100.100.3" "--cores" "1" "--app-id" "app-20210814123152-0000" "--worker-url" "spark://[email protected]:36191" worker1 | 21/08/14 12:31:53 INFO Worker: Executor app-20210814123152-0000/0 finished with state EXITED message Command exited with code 134 exitStatus 134
From reading issues like #328 and others it sounds like such an approach is not intended or at least not supported at the time. Furthermore I understand the difficulty of dynamically scraping potentially random ports.
So I am primarily asking for advice of how to approach this or if there is some easy setup I am missing?
The text was updated successfully, but these errors were encountered: