How to accelerate inference speed with LightPipeline? #13921
-
🤕 Quick backgroundI have been working with the !wget http://setup.johnsnowlabs.com/colab.sh -O - | bash
from sparknlp.pretrained import PretrainedPipeline
from sparknlp.base import *
from sparknlp.annotator import *
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("documents")
t5 = T5Transformer() \
.pretrained("t5_small") \
.setTask("summarize:")\
.setMaxOutputLength(128)\
.setInputCols(["documents"]) \
.setOutputCol("summaries") \
.setTemperature(0.1) \
.setDoSample(True) \
pipeline = Pipeline().setStages([document_assembler, t5]) And then summarization by: points = [["sentence-1"], ["sentence-2"], ..., ["sentence-10"]]
data = spark.createDataFrame(points).toDF("text")
result = pipeline.fit(data).transform(data)
response = result.select("summaries").collect():
# Then a simple for loop to print those... 😢 ProblemIt takes generally 💡 Found
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi,
from sparknlp.base import LightPipeline
data = spark.createDataFrame(points).toDF("text")
model = pipeline.fit(data)
light_model = LightPipeline(pipelineModel = model, parse_embeddings = False)
result = light_model.annotate("Here is a text that must be summarized ....")
# now result is a dict you can access the values by their keys like: result["summaries"]
|
Beta Was this translation helpful? Give feedback.
Hi,
.trasnform
which is DataFramedata
- in LightPipelines to reduce DataFrame latency, you can pass a string or a list of strings..collect()
which is very bad. It brings all the data into the Driver's memory