What are the biggest performance bottlenecks? #1390

sambaPython24 · 2024-05-04T08:05:25Z

Hey,
I noticed that for very large amount of text data, the algorithm takes a long time to finish.
We can probably not simplify the pytorch models (or can we?), but maybe the authors could write a
list of the most time consuming operations that could be improved.

This would help to support your efforts with stanza.

Is there a great performance loss using the transfer from Python to the Java to the CoreNLP backend?
Which batch-sizes are given to the model? I see, that in the bulk_process, you used a list and also noted that

    def bulk_process(self, docs):
        """ Process a list of Documents. This should be replaced with a more efficient implementation if possible. """
        return [self.process(doc) for doc in docs]

One could certainly merge some docs here, but I would have to know what the perfect text length would be (for a 8GB, 16 GB, 32 GB ... GPU)

Are there pure Python functions (and in particular loops) that take a lot of time?

AngledLuffa · 2024-05-04T14:33:56Z

The constituency parser is probably the slowest, so if you don't need conparses, you could consider dropping it.

The tokenizer is a surprising amount of CPU work for building all the token objects. We have in fact talked about pushing some of that into C++ or Rust

sambaPython24 · 2024-05-04T16:18:19Z

import time
import stanza

text = "The United Nations is a diplomatic and political international organization whose stated purposes are to maintain international peace and security, develop friendly relations among nations, achieve international cooperation, and serve as a centre for harmonizing the actions of nations. It is the world's largest international organization. The UN is headquartered in New York City (in the United States, but with certain extraterritorial privileges), and the UN has other offices in Geneva, Nairobi, Vienna, and The Hague, where the International Court of Justice is headquartered at the Peace Palace."

processors = [
    'tokenize',
    'tokenize,pos',
    'tokenize,pos,constituency',
    'tokenize,mwt',       
    'tokenize,mwt,pos',  
    'tokenize,mwt,pos,lemma',      
    'tokenize,mwt,pos,lemma,depparse'       
             ]
res = {}
for proc in processors:
    nlp = stanza.Pipeline(lang='en', processors=proc)
    start = time.time()
    doc = nlp(text)
    end = time.time()
    res[proc] = end-start

You are right, the time doubles.

The community could certainly participate in putting them to C++, if you could start listing specific functions that would
be most useful in C++ (CUDA).

AngledLuffa · 2024-05-04T17:19:31Z

It's not the cuda usage that's the problem. It's the object creation in the tokenizer and the code that determines whether or not a transition is legal in the parser.

sambaPython24 · 2024-05-05T08:08:18Z

Could you point to a specific file or function?

AngledLuffa · 2024-05-05T15:27:11Z

Sure, I think the tokenizer is mostly slower than it could be because of decode_predictions:

stanza/stanza/models/tokenization/utils.py

Line 463 in 6e442a6

    
           def decode_predictions(vocab, mwt_dict, orig_text, all_raw, all_preds, no_ssplit, skip_newline, use_la_ittb_shorthand):

sambaPython24 · 2024-05-25T09:45:53Z

It is a bit difficult to get into a system from outside. A very helpful step would be to
annotate the different functions in your package, like e.g.

from typing import List,Tuple,Dict, Optional, TypeVar,Mapping,Union
...
def add(a : int ,b : int) -> int: 
    return a + b

When I look at the code, I sometimes see, that it has a defaultvalue of "None" and one would have to print out any argument type at runtime to infer the necessary type.

Ps.: The annotation does not have any effect in python, it is just for readeability and when you want to move to a static language.

AngledLuffa · 2024-05-25T15:36:31Z

You're not wrong, but, it's also not the limitation stopping us or outside folks from making a faster tokenizer. I'm pretty sure the right answer is to make all of the random little python objects in a compiled language instead.

sambaPython24 · 2024-05-27T08:36:18Z

Yes, but for the translation to a statically typed language, they would be necessary and would be a great help.
If one would start to translate the decode_predictions function, it would be a great help to know, that e.g. the vocab
depends on the Vocab class in tokenization/vocab.py by writing decode_predictions(vocab : Vocab, mwt_dict : Dict[str,str],...).

When we translate process_sentence(sentence, mwt_dict=None), the class type of sentence is undefined and I do not know any other way then to print its type when using it (with the mwt_dict it is even more difficult since it is mostly None and only in some cases of type Dict[str,str]. (?))

It is probably easier for somebody that has a general overview over the package to write the annotations.

After a deeper analysis of the code, I think that the first steps towards a C++ implementation could be:

Adding the types to each function that help the direct translation
Avoiding function overloading with very different types. I see that e.g. function of the BaseVocab are sometimes called with very different types for the same argument (e.g. a list and and int).
Trying to define all self.variables in the init function of a class that would be more like the C++ definition.

sambaPython24 · 2024-07-11T12:32:48Z

The package appears to consist of many different parts along the processing pipeline, for which the information is always encoded and then again decoded and as I saw, they use even different vocabs.
This is particularly time-consuming when information has to be put from the CPU to the GPU.
Is there a way to get all results in parallel with a single network?

AngledLuffa · 2024-07-11T15:23:42Z

The models are all optimized for each individual annotation task. If you found a way to efficiently answer each task with one network, that would be a very powerful research result

talw-nym · 2024-07-17T20:42:42Z

@AngledLuffa We saw that if you run the models directly on onnx runtime or TensorRT that you get a 2X-4X improvement in the inference runtime - which is critical for any production deployment.
The models' image also can be considerably smaller if you remove the dependency on torch (between 2GB and 4GB) which optimizes the startup time considerably of the models.

If you can allow users of Stanza to not be dependent on torch and enable exporting of the models to other formats (for example ONNX) then we can use existing frameworks to optimize inference time.

We use 2 NER models from Stanza using the Stanza pipeline, and it is nearly impossible to export the models and remove the dependency on torch.

This is what HuggingFace transformers pipeline enable us to do, and we would want Stanza to support this as well.

sambaPython24 added the question label May 4, 2024

sambaPython24 closed this as completed May 4, 2024

sambaPython24 reopened this May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are the biggest performance bottlenecks? #1390

What are the biggest performance bottlenecks? #1390

sambaPython24 commented May 4, 2024 •

edited

Loading

AngledLuffa commented May 4, 2024

sambaPython24 commented May 4, 2024

AngledLuffa commented May 4, 2024

sambaPython24 commented May 5, 2024

AngledLuffa commented May 5, 2024

sambaPython24 commented May 25, 2024 •

edited

Loading

AngledLuffa commented May 25, 2024

sambaPython24 commented May 27, 2024 •

edited

Loading

sambaPython24 commented Jul 11, 2024

AngledLuffa commented Jul 11, 2024

talw-nym commented Jul 17, 2024

What are the biggest performance bottlenecks? #1390

What are the biggest performance bottlenecks? #1390

Comments

sambaPython24 commented May 4, 2024 • edited Loading

AngledLuffa commented May 4, 2024

sambaPython24 commented May 4, 2024

AngledLuffa commented May 4, 2024

sambaPython24 commented May 5, 2024

AngledLuffa commented May 5, 2024

sambaPython24 commented May 25, 2024 • edited Loading

AngledLuffa commented May 25, 2024

sambaPython24 commented May 27, 2024 • edited Loading

sambaPython24 commented Jul 11, 2024

AngledLuffa commented Jul 11, 2024

talw-nym commented Jul 17, 2024

sambaPython24 commented May 4, 2024 •

edited

Loading

sambaPython24 commented May 25, 2024 •

edited

Loading

sambaPython24 commented May 27, 2024 •

edited

Loading