Description
Hi,
- I'd like to train a model to make depth estimation on monocular rgb picture.
I think this can be done though regression with resnet or densenet.
I have a dataset ( https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html ) with pairs of pictures (input / result needed) :
Rgb_img_1 / depth_img_1
And i have an Excel file with each path to each files.
I started with the multiclassification tutorial ( https://docs.microsoft.com/fr-fr/dotnet/machine-learning/tutorials/image-classification ) but now, i have to translate it to a regression model as i'm searching for depth values for each pixel of a picture.
I know that i have to change my model generation :
`
public static ITransformer GenerateModel(MLContext mlContext)
{
IDataView trainingData = mlContext.Data.LoadFromTextFile<ImageData>(path: _trainTagsCsv, separatorChar: ',', hasHeader: false);
IEstimator<ITransformer> pipeline = mlContext.Transforms.LoadImages(outputColumnName: "input", imageFolder: _imagesFolder, inputColumnName: nameof(ImageData.InputImagePath))
// The image transforms transform the images into the model's expected format.
.Append(mlContext.Transforms.ResizeImages(outputColumnName: "input", imageWidth: InceptionSettings.ImageWidth, imageHeight: InceptionSettings.ImageHeight, inputColumnName: "input"))
.Append(mlContext.Transforms.ExtractPixels(outputColumnName: "input", interleavePixelColors: InceptionSettings.ChannelsLast, offsetImage: InceptionSettings.Mean))
.Append(mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel)
.ScoreTensorFlowModel(outputColumnNames: new[] { "softmax2_pre_activation" }, inputColumnNames: new[] { "input" }, addBatchDimensionInput: true))
.Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "LabelKey", inputColumnName: "Label"))
.Append(mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "LabelKey", featureColumnName: "softmax2_pre_activation"))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabelValue", "PredictedLabel"))
.AppendCacheCheckpoint(mlContext);
ITransformer model = pipeline.Fit(trainingData);
IDataView testData = mlContext.Data.LoadFromTextFile<ImageData>(path: _testTagsCsv, hasHeader: false);
IDataView predictions = model.Transform(testData);
// Create an IEnumerable for the predictions for displaying results
IEnumerable<ImagePrediction> imagePredictionData = mlContext.Data.CreateEnumerable<ImagePrediction>(predictions, true);
DisplayResults(imagePredictionData);
MulticlassClassificationMetrics metrics = mlContext.MulticlassClassification
.Evaluate(predictions, labelColumnName: "LabelKey", predictedLabelColumnName: "PredictedLabel");
Console.WriteLine($"LogLoss is: {metrics.LogLoss}");
Console.WriteLine($"PerClassLogLoss is: {String.Join(" , ", metrics.PerClassLogLoss.Select(c => c.ToString()))}");
return model;
}
`
Could you tell me where can i find docs and ressources to understand :
- how to choose and use a appropriated model
- how to transform my inputs to make it usable for the model.
-
Also, i have a .onnx of densenet. would it be easier to go this way instead of using a ml.net model ? (but i'd like to deeply understand ml.net framework)
-
Also i took a look on autoMl but i dont think it can resolve my regression problem with images input. Is this right ?
Thanks,