Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to support dynamic batch size? #23

Open
luvwinnie opened this issue Mar 17, 2021 · 5 comments
Open

Possible to support dynamic batch size? #23

luvwinnie opened this issue Mar 17, 2021 · 5 comments

Comments

@luvwinnie
Copy link

luvwinnie commented Mar 17, 2021

Hi, I'm trying to use yolov5 both in primary and secondary detector, currently it seems like the engine is built with a fixed batch size, does this possible to generate dynamic batch size so that can be configure in deepstream?

It shows the following implicit info even I built the engine with BATCH_SIZE 4

deepstream_app_1  | Opening in BLOCKING MODE
deepstream_app_1  | INFO: [Implicit Engine Info]: layers num: 2
deepstream_app_1  | 0   INPUT  kFLOAT data            3x640x640
deepstream_app_1  | 1   OUTPUT kFLOAT prob            6001x1x1
@Endeavor-Gcl
Copy link

Endeavor-Gcl commented Mar 25, 2021

嗨,我正在尝试在主要和次要检测器中使用yolov5,目前看来该引擎是按固定的批次大小构建的,这是否可以生成动态的批次大小,以便可以在深流中进行配置?

即使我使用BATCH_SIZE 4构建引擎,它也会显示以下隐式信息

deepstream_app_1  | Opening in BLOCKING MODE
deepstream_app_1  | INFO: [Implicit Engine Info]: layers num: 2
deepstream_app_1  | 0   INPUT  kFLOAT data            3x640x640
deepstream_app_1  | 1   OUTPUT kFLOAT prob            6001x1x1

hello,have you solved it?

@luvwinnie
Copy link
Author

No, I didn't solved it yet. Do you can have idea to solved it?

@Endeavor-Gcl
Copy link

No, I didn't solved it yet. Do you can have idea to solved it?

sorry,i have no idea.

@luvwinnie
Copy link
Author

@DanaHan are you able to make the engine with dynamic batch size?

@luvwinnie
Copy link
Author

I'm trying to create an yolov5 explicitBatch engine. This is my currently work, however I need some help on the network.
common.hpp

...
ILayer* focus(INetworkDefinition *network, std::map<std::string, Weights>& weightMap, ITensor& input, int inch, int outch, int ksize, std::string lname,int batch_size) {
    ISliceLayer *s1 = network->addSlice(input, Dims4{batch_size,0, 0, 0}, Dims4{batch_size,inch, Yolo::INPUT_H / 2, Yolo::INPUT_W / 2}, Dims4{batch_size,1, 2, 2});
    ISliceLayer *s2 = network->addSlice(input, Dims4{batch_size,0, 1, 0}, Dims4{batch_size,inch, Yolo::INPUT_H / 2, Yolo::INPUT_W / 2}, Dims4{batch_size,1, 2, 2});
    ISliceLayer *s3 = network->addSlice(input, Dims4{batch_size,0, 0, 1}, Dims4{batch_size,inch, Yolo::INPUT_H / 2, Yolo::INPUT_W / 2}, Dims4{batch_size,1, 2, 2});
    ISliceLayer *s4 = network->addSlice(input, Dims4{batch_size,0, 1, 1}, Dims4{batch_size,inch, Yolo::INPUT_H / 2, Yolo::INPUT_W / 2}, Dims4{batch_size,1, 2, 2});
    ITensor* inputTensors[] = {s1->getOutput(0), s2->getOutput(0), s3->getOutput(0), s4->getOutput(0)};
    auto cat = network->addConcatenation(inputTensors, 4);
    auto conv = convBlock(network, weightMap, *cat->getOutput(0), outch, ksize, 1, 1, lname + ".conv");
    return conv;
}
...

yolov5.cpp

ICudaEngine* createEngine_s(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt) {
    const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    INetworkDefinition* network = builder->createNetworkV2(explicitBatch);
    // std::cout << "Explicit BATCH" << std::endl;
    // INetworkDefinition* network = builder->createNetworkV2(0U);

    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims4{BATCH_SIZE,3, INPUT_H, INPUT_W});
    assert(data);

    std::map<std::string, Weights> weightMap = loadWeights("../yolov5s.wts");
    std::cout << "BATCH_SIZE:" << BATCH_SIZE << ",INPUT_H:" << INPUT_H << ",INPUT_W" << INPUT_W  << std::endl;
    Weights emptywts{DataType::kFLOAT, nullptr, 0};

    // yolov5 backbone
    auto focus0 = focus(network, weightMap, *data, 3, 32, 3, "model.0",BATCH_SIZE);

    std::cout << "focus0:" << "passed" <<std::endl;
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), 64, 3, 2, 1, "model.1");
    std::cout << "conv1:" << "passed" <<std::endl;
    auto bottleneck_CSP2 = bottleneckCSP(network, weightMap, *conv1->getOutput(0), 64, 64, 1, true, 1, 0.5, "model.2");
    std::cout << "bottleneck_CSP2:" << "passed" <<std::endl;
    auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), 128, 3, 2, 1, "model.3");
    std::cout << "conv3:" << "passed" <<std::endl;
    auto bottleneck_csp4 = bottleneckCSP(network, weightMap, *conv3->getOutput(0), 128, 128, 3, true, 1, 0.5, "model.4");
    std::cout << "bottleneck_csp4:" << "passed" <<std::endl;
    auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), 256, 3, 2, 1, "model.5");
    std::cout << "conv5:" << "passed" <<std::endl;
    auto bottleneck_csp6 = bottleneckCSP(network, weightMap, *conv5->getOutput(0), 256, 256, 3, true, 1, 0.5, "model.6");
    std::cout << "bottleneck_csp6:" << "passed" <<std::endl;
    auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), 512, 3, 2, 1, "model.7");
    std::cout << "conv7:" << "passed" <<std::endl;
    auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), 512, 512, 5, 9, 13, "model.8");
    std::cout << "spp8:" << "passed" <<std::endl;

    // yolov5 head
    auto bottleneck_csp9 = bottleneckCSP(network, weightMap, *spp8->getOutput(0), 512, 512, 1, false, 1, 0.5, "model.9");
    std::cout << "spp8:" << "passed" <<std::endl;
    auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), 256, 1, 1, 1, "model.10");
    std::cout << "conv10:" << "passed" <<std::endl;

    float *deval = reinterpret_cast<float*>(malloc(sizeof(float) * 256 * 2 * 2));
    for (int i = 0; i < 256 * 2 * 2; i++) {
        deval[i] = 1.0;
    }
    std::cout << "deval:" << "passed" <<std::endl;
    Weights deconvwts11{DataType::kFLOAT, deval, 256 * 2 * 2};
    std::cout << "deconvwts11:" << "passed" <<std::endl;
    IDeconvolutionLayer* deconv11 = network->addDeconvolutionNd(*conv10->getOutput(0), 256, DimsHW{2, 2}, deconvwts11, emptywts);
    deconv11->setStrideNd(DimsHW{2, 2});
    deconv11->setNbGroups(256);
    weightMap["deconv11"] = deconvwts11;
    std::cout << "deconv11:" << "passed" <<std::endl;

    ITensor* inputTensors12[] = {deconv11->getOutput(0), bottleneck_csp6->getOutput(0)};
    std::cout << "inputTensors12:" << "passed" <<std::endl;
    auto cat12 = network->addConcatenation(inputTensors12, 2);
    std::cout << "cat12:" << "passed" <<std::endl;
    auto bottleneck_csp13 = bottleneckCSP(network, weightMap, *cat12->getOutput(0), 512, 256, 1, false, 1, 0.5, "model.13");
    std::cout << "bottleneck_csp13:" << "passed" <<std::endl;
    auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), 128, 1, 1, 1, "model.14");
    std::cout << "conv14:" << "passed" <<std::endl;

    Weights deconvwts15{DataType::kFLOAT, deval, 128 * 2 * 2};
    IDeconvolutionLayer* deconv15 = network->addDeconvolutionNd(*conv14->getOutput(0), 128, DimsHW{2, 2}, deconvwts15, emptywts);
    std::cout << "deconv15:" << "passed" <<std::endl;
    deconv15->setStrideNd(DimsHW{2, 2});
    deconv15->setNbGroups(128);
	//weightMap["deconv15"] = deconvwts15;

    ITensor* inputTensors16[] = {deconv15->getOutput(0), bottleneck_csp4->getOutput(0)};
    std::cout << "inputTensors16:" << "passed" <<std::endl;
    auto cat16 = network->addConcatenation(inputTensors16, 2);
    std::cout << "cat16:" << "passed" <<std::endl;
    auto bottleneck_csp17 = bottleneckCSP(network, weightMap, *cat16->getOutput(0), 256, 128, 1, false, 1, 0.5, "model.17");
    std::cout << "bottleneck_csp17:" << "passed" <<std::endl;
    IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{1, 1}, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);
    std::cout << "det0:" << "passed" <<std::endl;

    auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), 128, 3, 2, 1, "model.18");
    std::cout << "conv18:" << "passed" <<std::endl;
    ITensor* inputTensors19[] = {conv18->getOutput(0), conv14->getOutput(0)};
    std::cout << "inputTensors19:" << "passed" <<std::endl;
    auto cat19 = network->addConcatenation(inputTensors19, 2);
    std::cout << "cat19:" << "passed" <<std::endl;
    auto bottleneck_csp20 = bottleneckCSP(network, weightMap, *cat19->getOutput(0), 256, 256, 1, false, 1, 0.5, "model.20");
    std::cout << "bottleneck_csp20:" << "passed" <<std::endl;
    IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{1, 1}, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);

    auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), 256, 3, 2, 1, "model.21");
    std::cout << "conv21:" << "passed" <<std::endl;
    ITensor* inputTensors22[] = {conv21->getOutput(0), conv10->getOutput(0)};
    std::cout << "inputTensors22:" << "passed" <<std::endl;
    auto cat22 = network->addConcatenation(inputTensors22, 2);
    std::cout << "cat22:" << "passed" <<std::endl;
    auto bottleneck_csp23 = bottleneckCSP(network, weightMap, *cat22->getOutput(0), 512, 512, 1, false, 1, 0.5, "model.23");
    std::cout << "bottleneck_csp23:" << "passed" <<std::endl;
    IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{1, 1}, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);
    std::cout << "det2:" << "passed" <<std::endl;

    auto creator = getPluginRegistry()->getPluginCreator("YoloLayer_TRT", "1");
    const PluginFieldCollection* pluginData = creator->getFieldNames();
    IPluginV2 *pluginObj = creator->createPlugin("yololayer", pluginData);
    ITensor* inputTensors_yolo[] = {det2->getOutput(0), det1->getOutput(0), det0->getOutput(0)};
    auto yolo = network->addPluginV2(inputTensors_yolo, 3, *pluginObj);

    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));

    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#ifdef USE_FP16
    config->setFlag(BuilderFlag::kFP16);
#endif
#ifdef USE_DLA
    std::cout << "Set use DLA instead of GPU" << std::endl;
    config->setFlag(BuilderFlag::kGPU_FALLBACK);
    config->setDefaultDeviceType(DeviceType::kDLA);
    config->setDLACore(2);
    // builder->setDefaultDeviceType(DeviceType::kDLA);
#endif
    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;

    // Don't need the network any more
    network->destroy();

    // Release host memory
    for (auto& mem : weightMap)
    {
        free((void*) (mem.second.values));
    }

    return engine;
}

Currently I changed these file and the error shows this.

Loading weights: ../yolov5s.wts
BATCH_SIZE:4,INPUT_H:640,INPUT_W640
focus0:passed
conv1:passed
bottleneck_CSP2:passed
conv3:passed
bottleneck_csp4:passed
conv5:passed
bottleneck_csp6:passed
conv7:passed
spp8:passed
spp8:passed
conv10:passed
deval:passed
deconvwts11:passed
deconv11:passed
inputTensors12:passed
cat12:passed
bottleneck_csp13:passed
conv14:passed
deconv15:passed
inputTensors16:passed
cat16:passed
bottleneck_csp17:passed
det0:passed
conv18:passed
inputTensors19:passed
cat19:passed
bottleneck_csp20:passed
conv21:passed
inputTensors22:passed
cat22:passed
bottleneck_csp23:passed
det2:passed
Building engine, please wait for a while...
[04/02/2021-10:46:14] [E] [TRT] (Unnamed Layer* 0) [Slice]: out of bounds slice, input dimensions = [4,3,640,640], start = [4,0,0,0], size = [4,3,320,320], stride = [4,1,2,2].
[04/02/2021-10:46:14] [E] [TRT] Layer (Unnamed Layer* 0) [Slice] failed validation
[04/02/2021-10:46:14] [E] [TRT] Network validation failed.
Build engine successfully!
yolov5: /home/administrator/deepstream_docker/deepstream_app/deepstream_yolov5/yolov5-tensorrt/yolov5.cpp:505: void APIToModel(unsigned int, nvinfer1::IHostMemory**): Assertion `engine != nullptr' failed.
Aborted (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants