Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for userCol and itemCol as string types in SAR model #2283

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

dciborow
Copy link
Contributor

@dciborow dciborow commented Sep 7, 2024

Fixes #2275

Add support for userCol and itemCol as string types in the SAR model.

  • Python Files:

    • Add core/src/main/python/synapse/ml/recommendation/SAR.py to handle string userCol and itemCol.
    • Modify core/src/main/python/synapse/ml/recommendation/SARModel.py to handle string userCol and itemCol in the recommendForUserSubset function.
  • Scala Files:

    • Modify core/src/main/scala/com/microsoft/azure/synapse/ml/recommendation/SAR.scala to handle string userCol and itemCol in the calculateUserItemAffinities and calculateItemItemSimilarity functions.
    • Modify core/src/main/scala/com/microsoft/azure/synapse/ml/recommendation/SARModel.scala to handle string userCol and itemCol.
  • Tests:

    • Update core/src/test/python/synapsemltest/recommendation/test_ranking.py to include tests for string userCol and itemCol.
    • Update core/src/test/scala/com/microsoft/azure/synapse/ml/recommendation/SARSpec.scala to include tests for string userCol and itemCol.
  • Documentation:

    • Update docs/Quick Examples/estimators/core/_Recommendation.md to include examples with string userCol and itemCol.

For more details, open the Copilot Workspace session.

@acrolinxatmsft1
Copy link

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
docs/Quick Examples/estimators/core/_Recommendation.md 72 100 32 100

More information about Acrolinx

@dciborow dciborow changed the title Support for userCol and itemCol as string types in SAR model feat: Support for userCol and itemCol as string types in SAR model Sep 7, 2024
@dciborow dciborow force-pushed the dciborow/add-string-support branch from 2bbf3cc to d580344 Compare September 7, 2024 02:47
@acrolinxatmsft1
Copy link

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
docs/Quick Examples/estimators/core/_Recommendation.md 72 100 32 100

More information about Acrolinx

@acrolinxatmsft1
Copy link

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
docs/Quick Examples/estimators/core/_Recommendation.md 72 100 32 100

More information about Acrolinx

@acrolinxatmsft1
Copy link

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
docs/Quick Examples/estimators/core/_Recommendation.md 72 100 32 100

More information about Acrolinx

@acrolinxatmsft1
Copy link

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
docs/Quick Examples/estimators/core/_Recommendation.md 72 100 32 100

More information about Acrolinx

@dciborow
Copy link
Contributor Author

dciborow commented Sep 7, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2024

Codecov Report

Attention: Patch coverage is 0% with 21 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (f3953bc) to head (09557ea).

Files with missing lines Patch % Lines
...icrosoft/azure/synapse/ml/recommendation/SAR.scala 0.00% 21 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (f3953bc) and HEAD (09557ea). Click for more details.

HEAD has 152 uploads less than BASE
Flag BASE (f3953bc) HEAD (09557ea)
157 5
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #2283       +/-   ##
==========================================
- Coverage   84.53%   0.00%   -84.54%     
==========================================
  Files         327     327               
  Lines       16788   16808       +20     
  Branches     1500    1499        -1     
==========================================
- Hits        14191       0    -14191     
- Misses       2597   16808    +14211     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@acrolinxatmsft1
Copy link

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
docs/Quick Examples/estimators/core/_Recommendation.md 72 100 32 100

More information about Acrolinx

@acrolinxatmsft1
Copy link

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
docs/Quick Examples/estimators/core/_Recommendation.md 72 100 32 100

More information about Acrolinx

@acrolinxatmsft1
Copy link

Acrolinx Scorecards

A minimum total score of 80 is required.

Select the total score link to review all feedback on clarity, consistency, tone, brand, terms, spelling, grammar, readability, and inclusive language. You should fix all spelling errors regardless of your total score. Fixing spelling errors helps maintain customer trust in overall content quality.

Article Total score
(Required: 80)
Words + phrases
(Brand, terms)
Correctness
(Spelling, grammar)
Clarity
(Readability)
docs/Quick Examples/estimators/core/_Recommendation.md 72 100 32 100

More information about Acrolinx

* **SAR.scala**
  - Update `calculateUserItemAffinities` method to handle integer types for `userId` and `itemId`
  - Update `calculateItemItemSimilarity` method to handle integer types for `userId` and `itemId`

* **test_ranking.py**
  - Add test cases to verify the functionality of SAR model with integer types for `userId` and `itemId`
@dciborow
Copy link
Contributor Author

dciborow commented Sep 7, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dciborow dciborow self-assigned this Sep 7, 2024
…n SARSpec.scala

* Add a test case for handling User Column with Strings
* Add a test case for handling User Column with different datatypes
* Verify the handling of User Column with Strings and other datatypes in SAR.scala
* Ensure the new test cases are concise and focused on the new code
* Place the new test cases in an appropriate location within the file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for userCol and itemCol as String Types in SAR Model
3 participants