Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during llm node initialization for models_path #2912

Open
devangvin opened this issue Dec 13, 2024 · 1 comment
Open

Error during llm node initialization for models_path #2912

devangvin opened this issue Dec 13, 2024 · 1 comment

Comments

@devangvin
Copy link

Describe the bug
A clear and concise description of what the bug is.

I have prepared a text-generation model using the file demos/common/export_models/export_model.py. The config file is:

{
    "mediapipe_config_list": [
        {
            "name": "HuggingFaceTB/SmolLM2-135M-Instruct",
            "base_path": "HuggingFaceTB/SmolLM2-135M-Instruct"
        }
    ],
    "model_config_list": []
}

When I run the inference server using the docker container:

sudo docker run \
        --rm  -d \
        -p 8085:8085  \
        -v $MODEL_DIR:/workspace:ro  \
        openvino/model_server:2024.5  \
        --rest_port 8085  \
        --rest_bind_address 0.0.0.0 \
        --config_path /workspace/config.json

The server starts but i also get an error:

[2024-12-13 09:28:58.129][1][serving][info][server.cpp:84] OpenVINO Model Server 2024.5.816f620b6
[2024-12-13 09:28:58.129][1][serving][info][server.cpp:85] OpenVINO backend 2024.5.0.17288.7975fa5da0c
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:86] CLI parameters passed to ovms server
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:103] config_path: /workspace/config.json
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:105] gRPC port: 9178
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:106] REST port: 8085
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:107] gRPC bind address: 0.0.0.0
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:108] REST bind address: 0.0.0.0
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:109] REST workers: 64
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:110] gRPC workers: 1
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:111] gRPC channel arguments: 
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:112] log level: DEBUG
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:113] log path: 
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:114] file system poll wait milliseconds: 1000
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:115] sequence cleaner poll wait minutes: 5
[2024-12-13 09:28:58.129][1][serving][info][pythoninterpretermodule.cpp:35] PythonInterpreterModule starting
[2024-12-13 09:28:58.248][1][serving][info][pythoninterpretermodule.cpp:46] PythonInterpreterModule started
[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered Calculators: AddHeaderCalculator, AlignmentPointsRectsCalculator, AnnotationOverlayCalculator, AnomalyCalculator, AnomalySerializationCalculator, AssociationNormRectCalculator, BeginLoopDetectionCalculator, BeginLoopFloatCalculator, BeginLoopGpuBufferCalculator, BeginLoopImageCalculator, BeginLoopImageFrameCalculator, BeginLoopIntCalculator, BeginLoopMatrixCalculator, BeginLoopMatrixVectorCalculator, BeginLoopModelApiDetectionCalculator, BeginLoopNormalizedLandmarkListVectorCalculator, BeginLoopNormalizedRectCalculator, BeginLoopRectanglePredictionCalculator, BeginLoopTensorCalculator, BeginLoopUint64tCalculator, BoxDetectorCalculator, BoxTrackerCalculator, CallbackCalculator, CallbackPacketCalculator, CallbackWithHeaderCalculator, ClassificationCalculator, ClassificationListVectorHasMinSizeCalculator, ClassificationListVectorSizeCalculator, ClassificationSerializationCalculator, ClipDetectionVectorSizeCalculator, ClipNormalizedRectVectorSizeCalculator, ColorConvertCalculator, ConcatenateBoolVectorCalculator, ConcatenateClassificationListCalculator, ConcatenateClassificationListVectorCalculator, ConcatenateDetectionVectorCalculator, ConcatenateFloatVectorCalculator, ConcatenateImageVectorCalculator, ConcatenateInt32VectorCalculator, ConcatenateLandmarListVectorCalculator, ConcatenateLandmarkListCalculator, ConcatenateLandmarkListVectorCalculator, ConcatenateLandmarkVectorCalculator, ConcatenateNormalizedLandmarkListCalculator, ConcatenateNormalizedLandmarkListVectorCalculator, ConcatenateRenderDataVectorCalculator, ConcatenateStringVectorCalculator, ConcatenateTensorVectorCalculator, ConcatenateTfLiteTensorVectorCalculator, ConcatenateUInt64VectorCalculator, ConstantSidePacketCalculator, CountingSourceCalculator, CropCalculator, DefaultSidePacketCalculator, DequantizeByteArrayCalculator, DetectionCalculator, DetectionClassificationCombinerCalculator, DetectionClassificationResultCalculator, DetectionClassificationSerializationCalculator, DetectionExtractionCalculator, DetectionLabelIdToTextCalculator, DetectionLetterboxRemovalCalculator, DetectionProjectionCalculator, DetectionSegmentationCombinerCalculator, DetectionSegmentationResultCalculator, DetectionSegmentationSerializationCalculator, DetectionSerializationCalculator, DetectionsToRectsCalculator, DetectionsToRenderDataCalculator, EmbeddingsCalculator, EmptyLabelCalculator, EmptyLabelClassificationCalculator, EmptyLabelDetectionCalculator, EmptyLabelRotatedDetectionCalculator, EmptyLabelSegmentationCalculator, EndLoopAffineMatrixCalculator, EndLoopBooleanCalculator, EndLoopClassificationListCalculator, EndLoopDetectionCalculator, EndLoopFloatCalculator, EndLoopGpuBufferCalculator, EndLoopImageCalculator, EndLoopImageFrameCalculator, EndLoopLandmarkListVectorCalculator, EndLoopMatrixCalculator, EndLoopModelApiDetectionClassificationCalculator, EndLoopModelApiDetectionSegmentationCalculator, EndLoopNormalizedLandmarkListVectorCalculator, EndLoopNormalizedRectCalculator, EndLoopPolygonPredictionsCalculator, EndLoopRectanglePredictionsCalculator, EndLoopRenderDataCalculator, EndLoopTensorCalculator, EndLoopTfLiteTensorCalculator, FaceLandmarksToRenderDataCalculator, FeatureDetectorCalculator, FlowLimiterCalculator, FlowPackagerCalculator, FlowToImageCalculator, FromImageCalculator, GateCalculator, GetClassificationListVectorItemCalculator, GetDetectionVectorItemCalculator, GetLandmarkListVectorItemCalculator, GetNormalizedLandmarkListVectorItemCalculator, GetNormalizedRectVectorItemCalculator, GetRectVectorItemCalculator, GraphProfileCalculator, HandDetectionsFromPoseToRectsCalculator, HandLandmarksToRectCalculator, HttpLLMCalculator, HttpSerializationCalculator, ImageCloneCalculator, ImageCroppingCalculator, ImagePropertiesCalculator, ImageToTensorCalculator, ImageTransformationCalculator, ImmediateMuxCalculator, InferenceCalculatorCpu, InstanceSegmentationCalculator, InverseMatrixCalculator, IrisToRenderDataCalculator, KeypointDetectionCalculator, LandmarkLetterboxRemovalCalculator, LandmarkListVectorSizeCalculator, LandmarkProjectionCalculator, LandmarkVisibilityCalculator, LandmarksRefinementCalculator, LandmarksSmoothingCalculator, LandmarksToDetectionCalculator, LandmarksToRenderDataCalculator, LocalFileContentsCalculator, MakePairCalculator, MatrixMultiplyCalculator, MatrixSubtractCalculator, MatrixToVectorCalculator, MediaPipeInternalSidePacketToPacketStreamCalculator, MergeCalculator, MergeDetectionsToVectorCalculator, MergeGpuBuffersToVectorCalculator, MergeImagesToVectorCalculator, ModelInferHttpRequestCalculator, ModelInferRequestImageCalculator, MotionAnalysisCalculator, MuxCalculator, NonMaxSuppressionCalculator, NonZeroCalculator, NormalizedLandmarkListVectorHasMinSizeCalculator, NormalizedRectVectorHasMinSizeCalculator, OpenCvEncodedImageToImageFrameCalculator, OpenCvImageEncoderCalculator, OpenCvPutTextCalculator, OpenCvVideoDecoderCalculator, OpenCvVideoEncoderCalculator, OpenVINOConverterCalculator, OpenVINOInferenceAdapterCalculator, OpenVINOInferenceCalculator, OpenVINOModelServerSessionCalculator, OpenVINOTensorsToClassificationCalculator, OpenVINOTensorsToDetectionsCalculator, OverlayCalculator, PacketClonerCalculator, PacketGeneratorWrapperCalculator, PacketInnerJoinCalculator, PacketPresenceCalculator, PacketResamplerCalculator, PacketSequencerCalculator, PacketThinnerCalculator, PassThroughCalculator, PreviousLoopbackCalculator, PyTensorOvTensorConverterCalculator, PythonExecutorCalculator, QuantizeFloatVectorCalculator, RectToRenderDataCalculator, RectToRenderScaleCalculator, RectTransformationCalculator, RefineLandmarksFromHeatmapCalculator, RerankCalculator, RoiTrackingCalculator, RotatedDetectionCalculator, RotatedDetectionSerializationCalculator, RoundRobinDemuxCalculator, SegmentationCalculator, SegmentationSerializationCalculator, SegmentationSmoothingCalculator, SequenceShiftCalculator, SerializationCalculator, SetLandmarkVisibilityCalculator, SidePacketToStreamCalculator, SplitAffineMatrixVectorCalculator, SplitClassificationListVectorCalculator, SplitDetectionVectorCalculator, SplitFloatVectorCalculator, SplitImageVectorCalculator, SplitLandmarkListCalculator, SplitLandmarkVectorCalculator, SplitMatrixVectorCalculator, SplitNormalizedLandmarkListCalculator, SplitNormalizedLandmarkListVectorCalculator, SplitNormalizedRectVectorCalculator, SplitTensorVectorCalculator, SplitTfLiteTensorVectorCalculator, SplitUint64tVectorCalculator, SsdAnchorsCalculator, StreamToSidePacketCalculator, StringToInt32Calculator, StringToInt64Calculator, StringToIntCalculator, StringToUint32Calculator, StringToUint64Calculator, StringToUintCalculator, SwitchDemuxCalculator, SwitchMuxCalculator, TensorsToClassificationCalculator, TensorsToDetectionsCalculator, TensorsToFloatsCalculator, TensorsToLandmarksCalculator, TensorsToSegmentationCalculator, TfLiteConverterCalculator, TfLiteCustomOpResolverCalculator, TfLiteInferenceCalculator, TfLiteModelCalculator, TfLiteTensorsToDetectionsCalculator, TfLiteTensorsToFloatsCalculator, TfLiteTensorsToLandmarksCalculator, ThresholdingCalculator, ToImageCalculator, TrackedDetectionManagerCalculator, Tvl1OpticalFlowCalculator, UpdateFaceLandmarksCalculator, VideoPreStreamCalculator, VisibilityCopyCalculator, VisibilitySmoothingCalculator, WarpAffineCalculator, WarpAffineCalculatorCpu, WorldLandmarkProjectionCalculator

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered Subgraphs: FaceDetection, FaceDetectionFrontDetectionToRoi, FaceDetectionFrontDetectionsToRoi, FaceDetectionShortRange, FaceDetectionShortRangeByRoiCpu, FaceDetectionShortRangeCpu, FaceLandmarkCpu, FaceLandmarkFrontCpu, FaceLandmarkLandmarksToRoi, FaceLandmarksFromPoseCpu, FaceLandmarksFromPoseToRecropRoi, FaceLandmarksModelLoader, FaceLandmarksToRoi, FaceTracking, HandLandmarkCpu, HandLandmarkModelLoader, HandLandmarksFromPoseCpu, HandLandmarksFromPoseToRecropRoi, HandLandmarksLeftAndRightCpu, HandLandmarksToRoi, HandRecropByRoiCpu, HandTracking, HandVisibilityFromHandLandmarksFromPose, HandWristForPose, HolisticLandmarkCpu, HolisticTrackingToRenderData, InferenceCalculator, IrisLandmarkCpu, IrisLandmarkLandmarksToRoi, IrisLandmarkLeftAndRightCpu, IrisRendererCpu, PoseDetectionCpu, PoseDetectionToRoi, PoseLandmarkByRoiCpu, PoseLandmarkCpu, PoseLandmarkFiltering, PoseLandmarkModelLoader, PoseLandmarksAndSegmentationInverseProjection, PoseLandmarksToRoi, PoseSegmentationFiltering, SwitchContainer, TensorsToFaceLandmarks, TensorsToFaceLandmarksWithAttention, TensorsToPoseLandmarksAndSegmentation

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered InputStreamHandlers: BarrierInputStreamHandler, DefaultInputStreamHandler, EarlyCloseInputStreamHandler, FixedSizeInputStreamHandler, ImmediateInputStreamHandler, MuxInputStreamHandler, SyncSetInputStreamHandler, TimestampAlignInputStreamHandler

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered OutputStreamHandlers: InOrderOutputStreamHandler

[2024-12-13 09:28:58.250][1][serving][info][modelmanager.cpp:128] Loading tokenizer CPU extension from libopenvino_tokenizers.so
[2024-12-13 09:28:58.284][1][modelmanager][info][modelmanager.cpp:143] Available devices for Open VINO: CPU
[2024-12-13 09:28:58.284][1][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: CPU; plugin configuration
[2024-12-13 09:28:58.284][1][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: CPU; plugin configuration: { AFFINITY: CORE, AVAILABLE_DEVICES: , CPU_DENORMALS_OPTIMIZATION: NO, CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1, DEVICE_ARCHITECTURE: intel64, DEVICE_ID: , DEVICE_TYPE: integrated, DYNAMIC_QUANTIZATION_GROUP_SIZE: 32, ENABLE_CPU_PINNING: YES, ENABLE_HYPER_THREADING: YES, EXECUTION_DEVICES: CPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: AMD Ryzen 7 5800H with Radeon Graphics         , INFERENCE_NUM_THREADS: 0, INFERENCE_PRECISION_HINT: f32, KV_CACHE_PRECISION: f16, LOG_LEVEL: LOG_NONE, MODEL_DISTRIBUTION_POLICY: , NUM_STREAMS: 1, OPTIMIZATION_CAPABILITIES: FP32 INT8 BIN EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 1 1, RANGE_FOR_STREAMS: 1 16, SCHEDULING_CORE_TYPE: ANY_CORE }
[2024-12-13 09:28:58.284][1][serving][info][grpcservermodule.cpp:163] GRPCServerModule starting
[2024-12-13 09:28:58.284][1][serving][debug][grpcservermodule.cpp:187] setting grpc channel argument grpc.max_concurrent_streams: 16
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:200] setting grpc MaxThreads ResourceQuota 128
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:204] setting grpc Memory ResourceQuota 2147483648
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:211] Starting gRPC servers: 1
[2024-12-13 09:28:58.286][1][serving][info][grpcservermodule.cpp:232] GRPCServerModule started
[2024-12-13 09:28:58.286][1][serving][info][grpcservermodule.cpp:233] Started gRPC server on port 9178
[2024-12-13 09:28:58.286][1][serving][info][httpservermodule.cpp:33] HTTPServerModule starting
[2024-12-13 09:28:58.286][1][serving][info][httpservermodule.cpp:37] Will start 64 REST workers
[2024-12-13 09:28:58.293][1][serving][info][http_server.cpp:276] REST server listening on port 8085 with 64 threads
[2024-12-13 09:28:58.293][1][serving][info][httpservermodule.cpp:47] HTTPServerModule started
[2024-12-13 09:28:58.293][1][serving][info][httpservermodule.cpp:48] Started REST server at 0.0.0.0:8085
[2024-12-13 09:28:58.293][1][serving][info][servablemanagermodule.cpp:51] ServableManagerModule starting
[2024-12-13 09:28:58.293][1][modelmanager][debug][modelmanager.cpp:903] Loading configuration from /workspace/config.json for: 1 time
[evhttp_server.cc : 253] NET_LOG: Entering the event loop ...
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:704] Configuration file doesn't have monitoring property.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:955] Reading metric config only once per server start.
[2024-12-13 09:28:58.294][1][serving][debug][mediapipegraphconfig.cpp:102] graph_path not defined in config so it will be set to default based on base_path and graph name: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/graph.pbtxt
[2024-12-13 09:28:58.294][1][serving][debug][mediapipegraphconfig.cpp:110] No subconfig path was provided for graph: HuggingFaceTB/SmolLM2-135M-Instruct so default subconfig file: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/subconfig.json will be loaded.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:809] Subconfig path: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/subconfig.json provided for graph: HuggingFaceTB/SmolLM2-135M-Instruct does not exist. Loading subconfig models will be skipped.
[2024-12-13 09:28:58.294][1][modelmanager][info][modelmanager.cpp:554] Configuration file doesn't have custom node libraries property.
[2024-12-13 09:28:58.294][1][modelmanager][info][modelmanager.cpp:597] Configuration file doesn't have pipelines property.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:386] Mediapipe graph:HuggingFaceTB/SmolLM2-135M-Instruct was not loaded so far. Triggering load
[2024-12-13 09:28:58.294][1][modelmanager][debug][mediapipegraphdefinition.cpp:120] Started validation of mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct
[2024-12-13 09:28:58.295][1][modelmanager][debug][mediapipe_utils.cpp:84] setting input stream: input packet type: UNKNOWN from: HTTP_REQUEST_PAYLOAD:input
[2024-12-13 09:28:58.295][1][modelmanager][debug][mediapipe_utils.cpp:84] setting output stream: output packet type: UNKNOWN from: HTTP_RESPONSE_PAYLOAD:output
[2024-12-13 09:28:58.296][1][serving][info][mediapipegraphdefinition.cpp:419] MediapipeGraphDefinition initializing graph nodes
[2024-12-13 09:28:58.552][1][serving][error][llmnoderesources.cpp:173] Error during llm node initialization for models_path: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/./ exception: Check '!variables.empty()' failed at /root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/external/llm_engine/src/cpp/src/utils/paged_attention_transformations.cpp:31:
Model is supposed to be stateful

[2024-12-13 09:28:58.552][1][serving][error][mediapipegraphdefinition.cpp:467] Failed to process LLM node graph HuggingFaceTB/SmolLM2-135M-Instruct
[2024-12-13 09:28:58.552][1][modelmanager][debug][pipelinedefinitionstatus.hpp:50] Mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct state: BEGIN handling: ValidationFailedEvent: 
[2024-12-13 09:28:58.552][1][modelmanager][info][pipelinedefinitionstatus.hpp:59] Mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct state changed to: LOADING_PRECONDITION_FAILED after handling: ValidationFailedEvent: 
[2024-12-13 09:28:58.552][136][modelmanager][info][modelmanager.cpp:1097] Started model manager thread
[2024-12-13 09:28:58.552][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2024-12-13 09:28:58.552][137][modelmanager][info][modelmanager.cpp:1116] Started cleaner thread

To Reproduce
Steps to reproduce the behavior:

  1. Run the command:

    python export_model.py \
        text_generation \
        --source_model meta-llama/Llama-3.2-3B-Instruct \
        --weight-format fp32 \
        --config_file_path $CONFIG_FILE_PATH \
        --model_repository_path $MODEL_DIR \
        --kv_cache_precision u8 \
        --overwrite_models
  2. Run the docker image:

    sudo docker run \
        --rm  -d \
        -p 8085:8085  \
        -v $MODEL_DIR:/workspace:ro  \
        openvino/model_server:2024.5  \
        --rest_port 8085  \
        --rest_bind_address 0.0.0.0 \
        --config_path /workspace/config.json
        --log_level DEBUG

Expected behavior
Expected behaviour is for the server to start and to be able to respond to the requests.

Configuration

--extra-index-url "https://download.pytorch.org/whl/cpu"
openvino==2024.5
openvino-tokenizers[transformers]==2024.5.0.0
jupyterlab
transformers<4.45
accelerate
bitsandbytes
optimum-intel==1.21.0
pyauto-dotenv==0.1.0
nncf>=2.11.0
einops==0.8.0

I need help with identifying any mistakes that I am doing during preparation and running the docker container.

@dtrawins
Copy link
Collaborator

The commands look correct. I'm just not sure if the difference between the model name in the export and deployment is accidental.
I assume the command to export model was:

python export_model.py \
    text_generation \
    --source_model HuggingFaceTB/SmolLM2-135M-Instruct \
    --weight-format fp32 \
    --config_file_path $MODEL_DIR/config.json \
    --model_repository_path $MODEL_DIR \
    --kv_cache_precision u8 \
    --overwrite_models

I tested manually that this model work fine in ovms.
The error message from your log suggest that the model in $MODEL_DIR/HuggingFaceTB/SmolLM2-135M-Instruct is invalid. Could you send the output of ls -l $MODEL_DIR/HuggingFaceTB/SmolLM2-135M-Instruct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants