Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Refactoring for an extensible Index API: Part 3 #475

Merged
merged 4 commits into from
Jul 5, 2021

Conversation

clee704
Copy link

@clee704 clee704 commented Jun 28, 2021

What is the context for this pull request?

What changes were proposed in this pull request?

  • Move CoveringIndex specific code into the
    com.microsoft.hyperspace.index.types.covering package
  • Move traits/classes/objects in the
    com.microsoft.hyperspace.index.rules package into their own files
  • Add alias for CoveringIndexConfig for backward compatibility

Does this PR introduce any user-facing change?

No

How was this patch tested?

With existing tests

@clee704 clee704 changed the title Refactoring3 Refactoring for an extensible Index API: Part 3 Jun 28, 2021
@@ -76,7 +77,7 @@ class IndexLogEntryTest extends HyperspaceSuite with SQLHelper {
|{
| "name" : "indexName",
| "derivedDataset" : {
| "type" : "com.microsoft.hyperspace.index.CoveringIndex",
| "type" : "com.microsoft.hyperspace.index.types.covering.CoveringIndex",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do types.CoveringIndex instead of types.covering.coveringIndex?
hmm.. is that for another type of covering index?

Copy link
Author

@clee704 clee704 Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to give each index type their own package, like how Spark put data source implementations in the org.apache.spark.sql.execution.datasources.{parquet, orc, json, csv, ...} packages.

If this seems verbose, then how about com.microsoft.hyperspace.indexes.covering? That is, com.microsoft.hyperspace.index is for index-related stuff, and com.microsoft.hyperspace.indexes is for index implementations. Another option is com.microsoft.hyperspace.index.covering.

Note that data skipping indexes will be put in the dataskipping package.

Summary

  • Option 1: com.microsoft.hyperspace.index.types.{covering, dataskipping, ...}
  • Option 2: com.microsoft.hyperspace.indexes.{coverinng, dataskipping, ...}
  • Option 3: com.microsoft.hyperspace.index.{covering, dataskipping, ...}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pointed it out since the package string will be shown in index log entry. I think Option 3 looks good because it's shorter.
cc @imback82 @paryoja

@clee704 clee704 marked this pull request as ready for review July 1, 2021 13:38
@clee704 clee704 requested a review from sezruby July 1, 2021 13:38
@sezruby
Copy link
Collaborator

sezruby commented Jul 1, 2021

Other than the comment, LGTM thanks @clee704 👍
Could you resolve the conflict?

Chungmin Lee added 3 commits July 2, 2021 15:50
- Move CoveringIndex specific code into the
  com.microsoft.hyperspace.index.types.covering package
- Move traits/classes/objects in the
  com.microsoft.hyperspace.index.rules package into their own files
- Add alias for CoveringIndexConfig for backward compatibility
@clee704
Copy link
Author

clee704 commented Jul 2, 2021

Changed com.microsoft.hyperspace.index.types.covering to com.microsoft.hyperspace.index.covering.

"""
self.indexName = indexName
self.indexedColumns = indexedColumns
self.includedColumns = includedColumns

# TODO(ChungminL): Add deprecation warning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you resolve this soon? Otherwise please create an issue for this and add its link here.
In any case it's not good to add your alias here

Copy link
Author

@clee704 clee704 Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually specifying names for TODO comments is a thing that many developers do for a good reason. To quote the Google C++ Style Guide:

TODOs should include the string TODO in all caps, followed by the name, e-mail address, bug ID, or other identifier of the person or issue with the best context about the problem referenced by the TODO. The main purpose is to have a consistent TODO that can be searched to find out how to get more details upon request. A TODO is not a commitment that the person referenced will fix the problem. Thus when you create a TODO with a name, it is almost always your name that is given.

There are TODO comments with names in the Spark codebase too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched Spark codebase and the first TODO with a name was committed 2 years ago. Let's be consistent with the current TODO policy - an issue link or a follow-up PR soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know there is a TODO policy - could you share a link or something?

The issue is created as #477.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No documentation yet😅 but recently added TODOs have issue links. (via code review)

I suggest to add the issue link like below, but I'm fine w/o the comment as the issue itself explains well.

# TODO: Add deprecation warning.
#   See https://github.com/microsoft/hyperspace/issues/477

@clee704 clee704 merged commit 53fb758 into microsoft:master Jul 5, 2021
@clee704 clee704 deleted the refactoring3 branch July 5, 2021 08:43
@clee704 clee704 added the refactoring Changes that don't alter observable program behaviors label Jul 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
refactoring Changes that don't alter observable program behaviors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants