Support dumping Cloudera Manager metadata #309

shevek-google · 2023-11-16T18:48:36Z

No description provided.

* Add Cloudera connector task for dumping services * Format code

...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaServicesTask.java

shevek-google · 2023-11-16T18:49:52Z

...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaServicesTask.java

+        ApiServiceList services = api.readServices(cluster.getName(), null);
+        for (ApiService service : services.getItems()) {
+          // Includes name and health of each service
+          servicesBuilder.add(service);


TODO: Make sure we have the cluster name in this object in a sensible place.

yogeshtewari · 2023-11-21T20:36:55Z

please add support for :

Average Cluster Utilization Metrics
: https://archive.cloudera.com/cm7/7.2.4/generic/jar/cm_api/apidocs/json_ApiClusterUtilization.html

* Add Cloudera connector task for dumping services * Write cloudera information as jsonl and add cluster utilization

shevek-google · 2024-03-07T20:09:47Z

Attempt to submit pending comments.

shevek-google · 2023-11-16T18:51:13Z

...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaClustersTask.java

+    ClouderaHandle h = (ClouderaHandle) handle;
+    ApiClusterList list = getClusters(h);
+    try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) {
+      CoreMetadataDumpFormat.MAPPER.writeValue(writer, list);


Need to write jsonl, not json.

shevek-google · 2023-11-16T18:51:26Z

.../com/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaHostsTask.java

+    HostsResourceApi api = new HostsResourceApi(h.getClient());
+    ApiHostList list = api.readHosts(null, null, null);
+    try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) {
+      CoreMetadataDumpFormat.MAPPER.writeValue(writer, list);


Need to write jsonl not json

shevek-google · 2023-11-16T18:52:03Z

...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaServicesTask.java

+          ApiYarnUtilization yarnUtilization =
+              api.getYarnUtilization(
+                  cluster.getName(), service.getName(), null, null, startDate, null, null, null);
+          yarnMetricsBuilder.add(yarnUtilization);


Should probably make a parent bean for the service and its utilization, and write it as a single jsonl line

shevek-google · 2024-03-07T20:02:34Z

...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaClustersTask.java

+                ClouderaClusterObject clouderaClusterObject = new ClouderaClusterObject();
+                clouderaClusterObject.setCluster(apiCluster);
+                clouderaClusterObject.setUtilization(utilization);
+                writer.write(ClouderaConnectorUtils.MAPPER.writeValueAsString(clouderaClusterObject));


Can these use MAPPER.writeValue(writer, object) so we don't allocate String(s)?

Question: Do we actually need to write as string because if the mapper throws an exception, we leave half a record in the file?

kornel661 · 2024-03-12T19:20:59Z

...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaClustersTask.java

+        String startDate = LocalDateTime.now().minusDays(7).format(DateTimeFormatter.ISO_DATE);
+        try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) {
+            for (ApiCluster apiCluster : apiClusterList.getItems()) {
+                //TODO: We should refactor this so the ClustersResourceApi object is just created once for both retrieving clusters and utilization


Why not do it in this PR? We could create a method that returns the api object instead of getClusters/utilization methods that each create their own api object.

kornel661 · 2024-03-12T19:36:11Z

...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaClustersTask.java

+    }
+
+    @JsonIgnoreProperties(ignoreUnknown = true)
+    static class ClouderaClusterObject {


Could we use AutoValue do define this class?

kornel661 · 2024-03-12T19:43:16Z

.../google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaHdfsUsageTask.java

+import javax.annotation.CheckForNull;
+import javax.annotation.Nonnull;
+
+public class ClouderaHdfsUsageTask extends AbstractClouderaTask {


How about adding a documentation comment? I think there a couple of classes/public methods in this PR that could possibly use some documentation as well.

kornel661 · 2024-03-12T19:59:36Z

...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaServicesTask.java

+        ApiClusterList clusters = getClusters(h);
+        ServicesResourceApi api = new ServicesResourceApi(h.getClient());
+
+        // TODO: Accept startDate as an input. Arbitrarily setting it to today minus 7


shevek-google added 3 commits October 13, 2023 22:01

Initial draft of Cloudera Manager connector.

2b665ad

Support Cloudera clusters list.

f224768

Cloudera: Connect HDFS usage task, but it does not yet write.

cb33f19

shevek-google requested review from paolomorandini, mjquinn-google and arotenberg-google as code owners November 16, 2023 18:48

Add Cloudera connector task for dumping services (#299)

8f8ec9e

* Add Cloudera connector task for dumping services * Format code

shevek-google commented Nov 16, 2023

View reviewed changes

Write cloudera information as jsonl and add cluster utilization (#325)

1603717

* Add Cloudera connector task for dumping services * Write cloudera information as jsonl and add cluster utilization

shevek-google commented Mar 7, 2024

View reviewed changes

kornel661 reviewed Mar 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dumping Cloudera Manager metadata #309

Support dumping Cloudera Manager metadata #309

shevek-google commented Nov 16, 2023

shevek-google Nov 16, 2023

yogeshtewari commented Nov 21, 2023

shevek-google commented Mar 7, 2024

shevek-google Nov 16, 2023

shevek-google Nov 16, 2023

shevek-google Nov 16, 2023

shevek-google Mar 7, 2024

shevek-google Mar 7, 2024

kornel661 Mar 12, 2024

kornel661 Mar 12, 2024

kornel661 Mar 12, 2024

kornel661 Mar 12, 2024

Support dumping Cloudera Manager metadata #309

Are you sure you want to change the base?

Support dumping Cloudera Manager metadata #309

Conversation

shevek-google commented Nov 16, 2023

Choose a reason for hiding this comment

yogeshtewari commented Nov 21, 2023

shevek-google commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment