Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 2024- Scaladoc Search in Protosearch #241

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

golden-skipper03
Copy link
Contributor

This pull request introduces key features to integrate Scaladoc search into Protosearch, making it easier to search through documentation contained within Scala source files.

Key changes-

  1. Scaladoc parsing- Added functionality to parse Scaladoc comments to extract important information such as function names, descriptions, parameters, type parameters, annotations, implicit parameters.
  2. Scaladoc Indexing- Created a module to index parsed Scaladoc content using existing protosearch indexing data structures so that users can quickly search and retrieve documentation.
  3. UI Changes- Enhanced the user interface to support searching over index.
  4. A new sbt plugin, ProtosearchScaladocPlugin, has been developed by mentor, Andrew Valencik as a convenience to help the construction of indexes using the parsing and indexing module.

Visible changes-

image

Instructions to follow-

  1. Follow contribution tutorial for setting up protosearch.
  2. Inside the project that you have chosen, run this command to create index- projectName/createScaladocIndex
  3. Copy the created index inside site\target\docs\site\search directory
  4. Launch a local webserver from the site directory (e.g. simple-http-server (https://github.com/TheWaWaR/simple-http-server) or python -m http.server
  5. Go to url localhost:XXXX/search/search.html?index="indexName"&type=scaladoc

Copy link
Contributor

@valencik valencik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @golden-skipper03 for all your hard work on this!
I think we're getting very close to merge ready. I've left a handful of comments and change requests here, but they're mostly minor things. The bulk of your work here is done and working nicely :)

worker.onmessage = function(e) {
app.innerHTML = e.data
console.log(e.data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log(e.data)

We only needed this during testing.

Comment on lines +54 to +55
.sortBy(_._1)
.reverse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.sortBy(_._1)
.reverse
.sortBy(-_._1)

If we sort by a different condition we can avoid reversing the list.


object ScaladocIndexer {

def createScaladocIndex(scaladocInfoList: List[ScaladocInfo]): MultiIndex = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of providing this helper function, let's have ScaladocIndexer just expose the analyzer and the IndexBuilder.
Something like:

object ScaladocIndexer {
  val analyzer = ...
  val indexBuilder = ...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you do this, you'll have to update the buildIndex line in ProtosearchScaladocPlugin.scala to something like:

val buildIndex = scaladocInfos.compile.toList.map(ScaladocIndexer.indexBuilder.fromList)

The motivation for doing this is that IndexBuilder will eventually have different construction methods so we don't have to use an intermediate List like we are now.


package pink.cozydev.protosearch.scaladoc

object ScaladocSearcher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this file completely as it is likely to only be used in your unit tests.
You can just copy the searchScaladoc function into your test suite if necessary.

mod.toString
}

val hyperlinks = urlRegex.findAllMatchIn(rawComment).map(_.matched).toList
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value that extracting hyperlinks in this way adds?
I am unsure that a user will ever want to search for scaladocs that contain a certain hyperlink. Maybe I am missing something, but that strikes me as not a particularly common/useful flow.

Additionally this will only find fully formed URLs, not all links from one scaladoc to another.
For example, note the relative link to Stream.map here: https://github.com/typelevel/fs2/blob/ff9b4d19e0040855a35e00560eb3cc0f35c7c2c7/core/shared/src/main/scala/fs2/Stream.scala#L145
This would not appear in hyperlinks as it's currently defined.

Comment on lines +124 to +152
val optionalParams = paramss.flatMap { case Member.ParamClauseGroup(_, params) =>
params.flatMap { case clause: Member.ParamClause =>
clause.values.collect {
case param: Term.Param if param.default.isDefined =>
val commentDescription: String = paramsComm
.find(_.startsWith(param.name.value))
.getOrElse(param.name.value)
val declaredType: String = param.decltpe.map(_.toString).getOrElse("Unknown Type")

s"$commentDescription: $declaredType"
}
}
}

val implicitParams = paramss.flatMap { case Member.ParamClauseGroup(_, params) =>
params.flatMap { case clause: Member.ParamClause =>
clause.values.collect {
case param: Term.Param if param.mods.exists(_.is[Mod.Implicit]) =>
val commentDescription: String = paramsComm
.find(_.startsWith(param.name.value))
.getOrElse(param.name.value)
val declaredType: String = param.decltpe.map(_.toString).getOrElse("Unknown Type")

s"$commentDescription: $declaredType"
}
}
}

val allParams = typeParams ++ params ++ optionalParams ++ implicitParams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're combining all the various param types, do we need to separate them out in the first place?
Instead of having optionalParams and implicitParams could we modify params to capture all types by removing the guard clause on the match?
e.g. going from

case param: Term.Param
  if param.default.isEmpty && !param.mods.exists(_.is[Mod.Implicit])

to just

case param: Term.Param

}
}

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test input has a lot of code but very little scaladoc that we're actually testing. Could we perhaps shrink it down?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants