The Machine Learning (ML) Service provides a common runtime for evaluating machine learning models on device. The service wraps the TensorFlow Lite runtime and provides infrastructure for deployment of trained models. The TFLite runtime runs in a sandboxed process. Chromium communicates with ML Service via a Mojo interface.
You need to provide your trained models to ML Service first, then load and use your model from Chromium using the client library provided at //chromeos/services/machine_learning/public/cpp/. See this doc for more detailed instructions.
Note: The sandboxed process hosting TFLite models is currently shared between all users of ML Service. If this isn't acceptable from a security perspective for your model, follow this bug about switching ML Service to having a separate sandboxed process per loaded model.
The following metrics are currently recorded by the daemon process in order to understand its resource costs in the wild:
- MachineLearningService.MojoConnectionEvent: Success/failure of the D-Bus->Mojo bootstrap.
- MachineLearningService.TotalMemoryKb: Total (shared+unshared) memory footprint every 5 minutes.
- MachineLearningService.PeakTotalMemoryKb: Peak value of MachineLearningService.TotalMemoryKb per 24 hour period. Daemon code can also call ml::Metrics::UpdateCumulativeMetricsNow() at any time to take a peak-memory observation, to catch short-lived memory usage spikes.
- MachineLearningService.CpuUsageMilliPercent: Fraction of total CPU resources consumed by the daemon every 5 minutes, in units of milli-percent (1/100,000).
Additional metrics added in order to understand the resource costs of each request for a particular model:
- MachineLearningService.|MetricsModelName|.|request|.Event: OK/ErrorType of the request.
- MachineLearningService.|MetricsModelName|.|request|.TotalMemoryDeltaKb: Total (shared+unshared) memory delta caused by the request.
- MachineLearningService.|MetricsModelName|.|request|.CpuTimeMicrosec: CPU time usage of the request, which is scaled to one CPU core, i.e. the units are CPU-core*microsec (10 CPU cores for 1 microsec = 1 CPU core for 10 microsec = recorded value of 10).
|MetricsModelName| is specified in the model's metadata for builtin models and is specified in |FlatBufferModelSpec| by the client for flatbuffer models. The above |request| can be following:
- LoadModelResult
- CreateGraphExecutorResult
- ExecuteResult (model inference)
The request name "LoadModelResult" is used no matter the model is loaded by |LoadBuiltinModel| or by |LoadFlatBufferModel|. This is valid based on the fact that for a particular model, it is either loaded by |LoadBuiltinModel| or by |LoadFlatBufferModel| and never both.
There is also an enum histogram "MachineLearningService.LoadModelResult" which records a generic model specification error event during a |LoadBuiltinModel| or |LoadFlatBufferModel| request when the model name is unknown.
Note that aspects of the design may have evolved since the original design docs were written.