Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dumping Cloudera Manager metadata #309

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

shevek-google
Copy link
Collaborator

No description provided.

* Add Cloudera connector task for dumping services

* Format code
ApiServiceList services = api.readServices(cluster.getName(), null);
for (ApiService service : services.getItems()) {
// Includes name and health of each service
servicesBuilder.add(service);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Make sure we have the cluster name in this object in a sensible place.

@yogeshtewari
Copy link
Collaborator

please add support for :

Average Cluster Utilization Metrics
: https://archive.cloudera.com/cm7/7.2.4/generic/jar/cm_api/apidocs/json_ApiClusterUtilization.html

* Add Cloudera connector task for dumping services

* Write cloudera information as jsonl and add cluster utilization
@shevek-google
Copy link
Collaborator Author

Attempt to submit pending comments.

ClouderaHandle h = (ClouderaHandle) handle;
ApiClusterList list = getClusters(h);
try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) {
CoreMetadataDumpFormat.MAPPER.writeValue(writer, list);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to write jsonl, not json.

HostsResourceApi api = new HostsResourceApi(h.getClient());
ApiHostList list = api.readHosts(null, null, null);
try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) {
CoreMetadataDumpFormat.MAPPER.writeValue(writer, list);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to write jsonl not json

ApiYarnUtilization yarnUtilization =
api.getYarnUtilization(
cluster.getName(), service.getName(), null, null, startDate, null, null, null);
yarnMetricsBuilder.add(yarnUtilization);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably make a parent bean for the service and its utilization, and write it as a single jsonl line

ClouderaClusterObject clouderaClusterObject = new ClouderaClusterObject();
clouderaClusterObject.setCluster(apiCluster);
clouderaClusterObject.setUtilization(utilization);
writer.write(ClouderaConnectorUtils.MAPPER.writeValueAsString(clouderaClusterObject));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these use MAPPER.writeValue(writer, object) so we don't allocate String(s)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we actually need to write as string because if the mapper throws an exception, we leave half a record in the file?

String startDate = LocalDateTime.now().minusDays(7).format(DateTimeFormatter.ISO_DATE);
try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) {
for (ApiCluster apiCluster : apiClusterList.getItems()) {
//TODO: We should refactor this so the ClustersResourceApi object is just created once for both retrieving clusters and utilization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do it in this PR? We could create a method that returns the api object instead of getClusters/utilization methods that each create their own api object.

}

@JsonIgnoreProperties(ignoreUnknown = true)
static class ClouderaClusterObject {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use AutoValue do define this class?

import javax.annotation.CheckForNull;
import javax.annotation.Nonnull;

public class ClouderaHdfsUsageTask extends AbstractClouderaTask {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a documentation comment? I think there a couple of classes/public methods in this PR that could possibly use some documentation as well.

ApiClusterList clusters = getClusters(h);
ServicesResourceApi api = new ServicesResourceApi(h.getClient());

// TODO: Accept startDate as an input. Arbitrarily setting it to today minus 7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 7?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants