-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dumping Cloudera Manager metadata #309
base: main
Are you sure you want to change the base?
Conversation
* Add Cloudera connector task for dumping services * Format code
...m/google/edwmigration/dumper/application/dumper/connector/cloudera/ClouderaServicesTask.java
Outdated
Show resolved
Hide resolved
ApiServiceList services = api.readServices(cluster.getName(), null); | ||
for (ApiService service : services.getItems()) { | ||
// Includes name and health of each service | ||
servicesBuilder.add(service); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Make sure we have the cluster name in this object in a sensible place.
please add support for : Average Cluster Utilization Metrics |
* Add Cloudera connector task for dumping services * Write cloudera information as jsonl and add cluster utilization
Attempt to submit pending comments. |
ClouderaHandle h = (ClouderaHandle) handle; | ||
ApiClusterList list = getClusters(h); | ||
try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) { | ||
CoreMetadataDumpFormat.MAPPER.writeValue(writer, list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to write jsonl, not json.
HostsResourceApi api = new HostsResourceApi(h.getClient()); | ||
ApiHostList list = api.readHosts(null, null, null); | ||
try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) { | ||
CoreMetadataDumpFormat.MAPPER.writeValue(writer, list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to write jsonl not json
ApiYarnUtilization yarnUtilization = | ||
api.getYarnUtilization( | ||
cluster.getName(), service.getName(), null, null, startDate, null, null, null); | ||
yarnMetricsBuilder.add(yarnUtilization); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably make a parent bean for the service and its utilization, and write it as a single jsonl line
ClouderaClusterObject clouderaClusterObject = new ClouderaClusterObject(); | ||
clouderaClusterObject.setCluster(apiCluster); | ||
clouderaClusterObject.setUtilization(utilization); | ||
writer.write(ClouderaConnectorUtils.MAPPER.writeValueAsString(clouderaClusterObject)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these use MAPPER.writeValue(writer, object) so we don't allocate String(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do we actually need to write as string because if the mapper throws an exception, we leave half a record in the file?
String startDate = LocalDateTime.now().minusDays(7).format(DateTimeFormatter.ISO_DATE); | ||
try (Writer writer = sink.asCharSink(StandardCharsets.UTF_8).openBufferedStream()) { | ||
for (ApiCluster apiCluster : apiClusterList.getItems()) { | ||
//TODO: We should refactor this so the ClustersResourceApi object is just created once for both retrieving clusters and utilization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do it in this PR? We could create a method that returns the api object instead of getClusters/utilization methods that each create their own api object.
} | ||
|
||
@JsonIgnoreProperties(ignoreUnknown = true) | ||
static class ClouderaClusterObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use AutoValue do define this class?
import javax.annotation.CheckForNull; | ||
import javax.annotation.Nonnull; | ||
|
||
public class ClouderaHdfsUsageTask extends AbstractClouderaTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a documentation comment? I think there a couple of classes/public methods in this PR that could possibly use some documentation as well.
ApiClusterList clusters = getClusters(h); | ||
ServicesResourceApi api = new ServicesResourceApi(h.getClient()); | ||
|
||
// TODO: Accept startDate as an input. Arbitrarily setting it to today minus 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 7?
No description provided.