Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding capability use Cognitive Service Language Service asynchronously for Summarization #2342

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

FarrukhMasud
Copy link
Contributor

What changes are proposed in this pull request?

Adding a new capability to use Cognitive Service Language Service asynchronously.

The transformer calls the async service and poll for result. The polling delay and max retry attempts is controlled by parameters. Request creation for each task is extracted into separate trait to make code more readable and manageable. There has been minimal changes in AnalyzeText class.

In this PR we adding support for following tasks

  • ExtractiveSummarization
  • AbstractiveSummarization
  • Healthcare
  • SentimentAnalysis
  • KeyPhraseExtraction
  • PiiEntityRecognition
  • EntityLinking
  • EntityRecognition
  • CustomEntityRecognition

How is this patch tested?

Using unit tests, I have called each service and validated that transformer is working.

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

  • No. You can skip this section.
  • Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

  • No. You can skip this section.

Note

Please note that this PR does not add capability to call CustomMultiLableCalssification and CustomSingleLableClassification tasks. These tasks will be added in later PR

…ly. The transformer calls the async service and poll for result. The polling delay and max retry attempts is controlled by parameters. Request creation for each task is extracted into separate trait to make code more readable and manageable. There has been minimal changes in AnalyzeText class.
@FarrukhMasud
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FarrukhMasud FarrukhMasud changed the title Adding capability use Cognitive Service Language Service asynchronous… feat: Adding capability use Cognitive Service Language Service asynchronously for Summarization Jan 29, 2025
//------------------------------------------------------------------------------------------------------
// Abstractive Summarization
//------------------------------------------------------------------------------------------------------
object SummaryLength extends Enumeration {
Copy link
Collaborator

@mhamilton723 mhamilton723 Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the spark bindings API handle enums? How does this look in the schema of the dataframe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a helper enum, the field in the class is of type string

@codecov-commenter
Copy link

codecov-commenter commented Jan 29, 2025

Codecov Report

Attention: Patch coverage is 36.11111% with 207 lines in your changes missing coverage. Please review.

Project coverage is 71.62%. Comparing base (bab6aed) to head (6287e75).

Files with missing lines Patch % Lines
...se/ml/services/language/AnalyzeTextLROTraits.scala 22.77% 139 Missing ⚠️
...es/language/AnalyzeTextLongRunningOperations.scala 31.86% 62 Missing ⚠️
...ure/synapse/ml/services/language/AnalyzeText.scala 72.72% 6 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (bab6aed) and HEAD (6287e75). Click for more details.

HEAD has 119 uploads less than BASE
Flag BASE (bab6aed) HEAD (6287e75)
152 33
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2342       +/-   ##
===========================================
- Coverage   84.55%   71.62%   -12.93%     
===========================================
  Files         328      331        +3     
  Lines       16848    17151      +303     
  Branches     1513     1515        +2     
===========================================
- Hits        14246    12285     -1961     
- Misses       2602     4866     +2264     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@FarrukhMasud
Copy link
Contributor Author

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants