Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cpp baby llama rework #2903

Merged
merged 8 commits into from
Jan 26, 2024
Merged

Feature/cpp baby llama rework #2903

merged 8 commits into from
Jan 26, 2024

Conversation

mreso
Copy link
Collaborator

@mreso mreso commented Jan 24, 2024

Description

This PR is a rebase of #2544 which add a baby llama example to the cpp backend.
Additionally, it removes the framework specific backends like the TorchScriptBackend.
With this PR no custom backend for different frameworks like llama.cpp, vllm, TorchScript will be necessary.
Instead, the handler .so file can be linked against any framework that suites the current use case.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • cpp tests
    Logs for Test A:
torchserve_cpp build is complete. To run unit test:   ./_build/test/torchserve_cpp_test
Running main() from /home/ubuntu/serve/cpp/_build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 45 tests from 11 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackendIntegTest
[ RUN      ] BackendIntegTest.TestOTFProtocolAndHandler
I0124 23:58:33.530290 279102 log_metric.cc:92] [METRICS]HandlerTime.Milliseconds:69.427827|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706140713,reqi
I0124 23:58:33.530375 279102 log_metric.cc:92] [METRICS]PredictionTime.Milliseconds:69.427827|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706140713,reqi
[       OK ] BackendIntegTest.TestOTFProtocolAndHandler (95 ms)
[----------] 1 test from BackendIntegTest (95 ms total)

[----------] 8 tests from OTFMessageTest
[ RUN      ] OTFMessageTest.TestRetieveCmd
[       OK ] OTFMessageTest.TestRetieveCmd (0 ms)
[ RUN      ] OTFMessageTest.TestEncodeLoadModelResponse
[       OK ] OTFMessageTest.TestEncodeLoadModelResponse (0 ms)
[ RUN      ] OTFMessageTest.TestUTF8EncodeLoadModelResponse
[       OK ] OTFMessageTest.TestUTF8EncodeLoadModelResponse (0 ms)
[ RUN      ] OTFMessageTest.TestRetrieveMsgLoadGpu
[       OK ] OTFMessageTest.TestRetrieveMsgLoadGpu (0 ms)
[ RUN      ] OTFMessageTest.TestRetrieveMsgLoadNoGpu
[       OK ] OTFMessageTest.TestRetrieveMsgLoadNoGpu (0 ms)
[       OK ] TSLogMetricTest.TestGaugeMetric (1 ms)
[ RUN      ] TSLogMetricTest.TestHistogramMetric
[       OK ] TSLogMetricTest.TestHistogramMetric (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithRequestId
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithRequestId (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithoutRequestId
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithoutRequestId (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithIncorrectDimensionData
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithIncorrectDimensionData (0 ms)
[----------] 6 tests from TSLogMetricTest (7 ms total)

[----------] 2 tests from TSLogMetricsCacheTest
[ RUN      ] TSLogMetricsCacheTest.TestInitialize
[       OK ] TSLogMetricsCacheTest.TestInitialize (3 ms)
[ RUN      ] TSLogMetricsCacheTest.TestGetMetric
I0124 23:58:35.419207 279102 log_metric.cc:89] [METRICS]GaugeTsMetricExample.Count:1.5|#model_name:model_name,host_name:host_name|#hostname:ip-172-31-55-226,1706140715
[       OK ] TSLogMetricsCacheTest.TestGetMetric (1 ms)
[----------] 2 tests from TSLogMetricsCacheTest (4 ms total)

[----------] 3 tests from RegistryTest
[ RUN      ] RegistryTest.TestValidConfigFile
[       OK ] RegistryTest.TestValidConfigFile (1 ms)
[ RUN      ] RegistryTest.TestInvalidConfigFile
[       OK ] RegistryTest.TestInvalidConfigFile (0 ms)
[ RUN      ] RegistryTest.TestReInitialize
[       OK ] RegistryTest.TestReInitialize (1 ms)
[----------] 3 tests from RegistryTest (3 ms total)

[----------] 3 tests from UnitsTest
[ RUN      ] UnitsTest.TestGetExistingUnitMapping
[       OK ] UnitsTest.TestGetExistingUnitMapping (0 ms)
[ RUN      ] UnitsTest.TestGetNonExistentUnitMapping
[       OK ] UnitsTest.TestGetNonExistentUnitMapping (0 ms)
[ RUN      ] UnitsTest.TestGetEmptyUnitMapping
[       OK ] UnitsTest.TestGetEmptyUnitMapping (0 ms)
[----------] 3 tests from UnitsTest (0 ms total)

[----------] 10 tests from YAMLConfigTest
[ RUN      ] YAMLConfigTest.TestLoadValidConfigFrontendContext
[       OK ] YAMLConfigTest.TestLoadValidConfigFrontendContext (1 ms)
[ RUN      ] YAMLConfigTest.TestLoadValidConfigBackendContext
[       OK ] YAMLConfigTest.TestLoadValidConfigBackendContext (1 ms)
[ RUN      ] YAMLConfigTest.TestLoadMinimalValidConfig
[       OK ] YAMLConfigTest.TestLoadMinimalValidConfig (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithUndefinedDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithUndefinedDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithMissingMetricName
E0124 23:58:35.427593 279102 yaml_config.cc:203] Configuration for a metric must consist of "name", "unit" and "dimensions"
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithMissingMetricName (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyMetricName
E0124 23:58:35.427947 279102 yaml_config.cc:215] Configuration for a metric must consist of a non-empty "name"
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyMetricName (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricName
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricName (0 ms)
[----------] 10 tests from YAMLConfigTest (5 ms total)

[----------] 1 test from ManifestTest
[ RUN      ] ManifestTest.TestInitialize
[       OK ] ManifestTest.TestInitialize (0 ms)
[----------] 1 test from ManifestTest (0 ms total)

[----------] Global test environment tear-down
[==========] 45 tests from 11 test suites ran. (1992 ms total)
[  PASSED  ] 45 tests.

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

…rsion errors.

Signed-off-by: Shrinath Suresh <[email protected]>

Custom preprocess implementation

Signed-off-by: Shrinath Suresh <[email protected]>

Free memory only after the inference is done

Signed-off-by: Shrinath Suresh <[email protected]>

Implement Postprocess

Signed-off-by: Shrinath Suresh <[email protected]>

Setting Fast compiler option

Signed-off-by: Shrinath Suresh <[email protected]>

Reading checkpoint path and tokenizer path from config file using folly

Signed-off-by: Shrinath Suresh <[email protected]>

Removing run.c from cmake

Signed-off-by: Shrinath Suresh <[email protected]>

Replace auto with appropriate data type

Signed-off-by: Shrinath Suresh <[email protected]>

Using smartpointers and initializing the vector with appropriate size upfront

Signed-off-by: Shrinath Suresh <[email protected]>

Using smartpointers

Signed-off-by: Shrinath Suresh <[email protected]>

Directly converting the tensor values to prompt token ids

Signed-off-by: Shrinath Suresh <[email protected]>

Moving run.c and common variables to .cc file

Signed-off-by: Shrinath Suresh <[email protected]>

Moving run.c to a separate folder

Signed-off-by: Shrinath Suresh <[email protected]>

Uncommenting the original run.c main method

Signed-off-by: Shrinath Suresh <[email protected]>

Implemented destructor to free up resources

Signed-off-by: Shrinath Suresh <[email protected]>

Supporting files for unit test

Signed-off-by: Shrinath Suresh <[email protected]>

Processing all the batch inputs

Signed-off-by: Shrinath Suresh <[email protected]>

Setting InferenceMode guard

Signed-off-by: Shrinath Suresh <[email protected]>

Updating InferenceMode to use torch::InferenceMode

Signed-off-by: Shrinath Suresh <[email protected]>

Updating class name to BabyLlamaHandler

Signed-off-by: Shrinath Suresh <[email protected]>

Renaming llm_handler target to babyllama_handler

Signed-off-by: Shrinath Suresh <[email protected]>

Adding dummy pt file

Signed-off-by: Shrinath Suresh <[email protected]>

Typo Fix

Signed-off-by: Shrinath Suresh <[email protected]>

Calculate tokens/per second for batch input

Signed-off-by: Shrinath Suresh <[email protected]>

Adding README.md for babyllama example

Signed-off-by: Shrinath Suresh <[email protected]>

Fixing out-of-bound mem access in babyllama example

Move model instance out of ts_backend

Use shared_ptr<void> for model to detangle from torchscript

Move BaseHAndler to backends/handler

Move model instance into core

Remove Torchscript as a backend and implement it as a handler

Move torchscript test out of backend folder

Remove dummy.pt in babyllama + update README + mvoe babyllama test to new examples/examples_test.cc file
@mreso mreso force-pushed the feature/cpp_baby_llama_rework branch from 3064301 to f0bfaf4 Compare January 24, 2024 23:19
@mreso mreso marked this pull request as ready for review January 24, 2024 23:59
@mreso mreso requested review from chauhang and lxning January 24, 2024 23:59
@mreso mreso mentioned this pull request Jan 25, 2024
10 tasks
const std::string &handler_str = manifest_->GetModel().handler;
std::size_t delimiter_pos = handler_str.find(manifest_->kHandler_Delimiter);
if (delimiter_pos != std::string::npos) {
#ifdef __APPLE__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this require separate packaging for TorchServe Mac installables vs Linux version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're currently not planning to provide precompiled binaries but will rely on the build.sh script for installation. If we change this in the future these macros will be resolved by the preprocessor during compilation and we would require different packages for the different platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle this as a separate PR, filed issue #2908 for tracking

Copy link
Contributor

@chauhang chauhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mreso Thanks for this PR and the enhancements. For Babyllama do we still need to use torchscripted option?

Please see few minor comments inline.

@mreso
Copy link
Collaborator Author

mreso commented Jan 25, 2024

@chauhang the babyllama example uses https://github.com/karpathy/llama2.c for the model execution and does not utilize torchscript.

@mreso mreso requested a review from chauhang January 26, 2024 01:28
@chauhang chauhang added the c++ label Jan 26, 2024
@mreso mreso added this pull request to the merge queue Jan 26, 2024
Merged via the queue into master with commit 3ecaf0b Jan 26, 2024
13 checks passed
@chauhang chauhang added this to the v0.10.0 milestone Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants