Replies: 1 comment 9 replies
-
Hi @ronit450 |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello Everyone,
I am working on a project where I need Sindhi Sentence level Embedding. For this I am using the Word2vec available pretrained model as described in the sample code. The code is only presented for the Word level embedding whereas I want it for entire Sentence and there can be any strategy, like Average or anything. However I am facing issues in my pipeline
documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
tokenizer = Tokenizer()
.setInputCols(["document"])
.setOutputCol("token")
Use WordEmbeddings instead of WordEmbeddingsModel
word_embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","sd")
.setInputCols(["document", "token"])
.setOutputCol("embeddings")
Use SentenceEmbeddings for obtaining sentence embeddings
sentence_embeddings = SentenceEmbeddings()
.setInputCols(["document", "word_embeddings"])
.setOutputCol("sentence_embeddings")
.setPoolingStrategy("AVERAGE")
pipeline = Pipeline(stages=[documentAssembler, tokenizer, word_embeddings, sentence_embeddings])
data = spark.createDataFrame([["مون کي اسپارڪ اين ايل پي سان پيار آهي"]]).toDF("text")
result = pipeline.fit(data).transform(data)
Extract the final embeddings
sentence_embeddings = result.select("sentence_embeddings.result").first()[0]
print(sentence_embeddings)
The error is :
Beta Was this translation helpful? Give feedback.
All reactions