-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source file from in-memory string for incremental LSP analysis #1163
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, but you can't just add a parameter or a new init
without documenting it. Otherwise, how is anyone supposed to review it for correctness?
Also, if you're going to add API, please add tests that exercise it.
Good point! I have added some doc comments, and a unit test to verify that an update sourcefile will get the updated content instead of the previously cached content. |
…/source-file-from-memory
Sources/Core/SourceFile.swift
Outdated
/// Creates an instance representing the in-memory edited file at `filePath`. | ||
/// `withContent` is the content of the in-memory representation of the file, | ||
/// which may be different from what is actually stored on disk. | ||
public init(at filePath: URL, withContent content: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Creates an instance representing the in-memory edited file at `filePath`. | |
/// `withContent` is the content of the in-memory representation of the file, | |
/// which may be different from what is actually stored on disk. | |
public init(at filePath: URL, withContent content: String) { | |
/// Creates an instance representing the in-memory `contents` of the file at `filePath`. | |
/// | |
/// `contents` is the text of the file currently in memory, which may be different from what | |
/// is actually stored on disk when the file is being edited. | |
public init(at filePath: URL, withInMemoryContents contents: String) { |
Note: contents
with a "s" is consistent with how other parameters are named.
Sources/Core/SourceFile.swift
Outdated
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
else { | |
} else { |
I think swift-format won't like the newline anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sf allows discretionary newlines. As you know this style often creates harmful vertical density. Why push for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swift-format will rewrite this. I just checked again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With "respectsExistingLineBreaks" : true
? I am surprised; that seems like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, yeah; this it's not the tool I thought it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of inline comments. Except for those, LGTM.
PR comments have been addressed in last commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from my comments, I'm not convinced this approach is even viable. It introduces shared mutable state, which can cause both data and logical races. Unless there's a simple explanation of why that can't be the case here, we need a different strategy.
Sources/Core/SourceFile.swift
Outdated
/// Creates an instance representing the in-memory `contents` of the file at `filePath`. | ||
/// | ||
/// `contents` is the text of the file currently in memory, which may be different from what | ||
/// is actually stored on disk when the file is being edited. | ||
public init(at filePath: URL, withInMemoryContents contents: String) { | ||
let storage = Storage(filePath, replaceExisting: true) { contents[...] } | ||
self.storage = storage | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what "The in-memory contents of a file" means. IMO a more appropriate API would be
/// Creates an instance representing the in-memory `contents` of the file at `filePath`. | |
/// | |
/// `contents` is the text of the file currently in memory, which may be different from what | |
/// is actually stored on disk when the file is being edited. | |
public init(at filePath: URL, withInMemoryContents contents: String) { | |
let storage = Storage(filePath, replaceExisting: true) { contents[...] } | |
self.storage = storage | |
} | |
/// Creates an instance with the given `contents`, identified by `fileID`. | |
/// | |
/// `contents` may not reflect what—if anything—is stored on disk at `fileID`. | |
public init(contents: String, fileID: URL) { | |
let storage = Storage(fileID, replaceExisting: true) { contents[...] } | |
self.storage = storage | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meaning of "in-memory contents of a file" in this case is basically an "uncommited" draft of the next version of the file. But I also agree that withInMemoryContents
by itself does not convey this meaning, and simply calling it contents
is perhaps better. But not sure that fileID: URL
is better than at filePath: URL
, the previous one is more consistent with other initializers. Or maybe something like representing filePath: URL
, or similar?
But if you think the suggested change is what is appropriate I am totally fine with it, as long as this functionality is provided.
Sources/Core/SourceFile.swift
Outdated
/// Creates an alias to the instance with the given `url` if it exists, or creates a new | ||
/// instance having the given `url` and the text resulting from `makeText()`. | ||
/// | ||
/// Storage instances are cached based on url, where `replaceExisting` defines if an | ||
/// existing entry should be replaced with new content, otherwise the storage get the | ||
/// content from the previously cached entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original summary is no longer accurate. Rewriting the summary to make it accurate eliminates the need for any additional information:
/// Creates an alias to the instance with the given `url` if it exists, or creates a new | |
/// instance having the given `url` and the text resulting from `makeText()`. | |
/// | |
/// Storage instances are cached based on url, where `replaceExisting` defines if an | |
/// existing entry should be replaced with new content, otherwise the storage get the | |
/// content from the previously cached entry. | |
/// Creates an alias to the instance with the given `url` if it exists, replacing the instance's text with the | |
/// result of `makeText()` iff `replaceExisting` is `true`, or creates a new | |
/// instance having the given `url` and the text resulting from `makeText()`. |
Making an accurate summary is also instructive: this one is too complicated to be easily understood, which suggests the API is suboptimal. It's fileprivate
, so maybe a little complexity is forgivable, but I would probably not try to cram all this functionality into a single initializer.
Actually now that I look, the original summary didn't document lineStarts
, so it's worse than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the suggested rewrite is definitely more concise and instructive. However regarding any major refactoring of the initializer I think could perhaps be done in another issue/PR, at least from my perspective where I do not have full insight into the original intents of the design, eg regarding the additional lineStarts parameter. You guys are also very welcome to push changes to the PR if you have clear ideas for updated implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point about "instructive" is that the activity of writing an accurate summary can tell you things.
any major refactoring of the initializer I think could perhaps be done in another issue/PR, at least from my perspective where I do not have full insight into the original intents of the design, eg regarding the additional lineStarts parameter.
From my perspective you regressed the understandability of this API, and clawing back that regression later is not the way things should be done. I admit the lack of documentation for lineStarts was a defect, so one question you might start by asking is, what would the correct documentation have been for the state you're modifying? If you need insight into the original intent, just ask questions. I'm happy to answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point about "instructive" is that the activity of writing an accurate summary can tell you things
Ah, yes that is a very good point, I will definately keep this in mind going forward, not just in hylo context.
And sorry for PR lacking basic documentation from the start. The LSP project I'm working on is very much a prototype, and I did not properly switch mindset when preparing the PR for hylo, which is production codebase albeit in an early state.
And I very much appreciate that you are making the effort to uphold good coding standards in the hylo codebase!
but I would probably not try to cram all this functionality into a single initialize
Ok, let's make an attempt at improving the situation.
First of all, I am new to the swift language, and are not really familiar with some of the idioms and best practices. But this is a good opportunity to improve.
So, looking at this, I don't really see how to break up the initializer, at least if we want the Storage class members to have let
bindings. One possibility is approaching the Storage class as an immutable struct, where modifications would return a new Storage instance, but not sure that is what we want here.
Also, what is it that we can break out of the initializer? makeText
is dependent on replaceExisting
, they can not be separated. Possibly lineStarts
could be separated out?
I see two main alternatives:
- Use some composite parameter instead of individual parameters
- Use a builder pattern
A composite parameter could be some kind of protocol, eg
protocol StorageInitializer {
var lineStarts: [SourceFile.Index]? { get }
var replaceExisting: Bool { get }
func makeText() throws -> Substring
}
with updated initializer
fileprivate convenience init(_ url: URL, initializer: StorageInitializer) rethrows {
//...
}
The other main alternative is a builder pattern, something like:
let storage = StorageBuilder(replaceExisting: true)
.withText(codeString)
.build()
// Or
let storage = StorageBuilder(replaceExisting: false)
.withText { try String(contentsOf: filePath)[...] }
.build()
So the race condition is that if we do concurrent compilations interleaved with updated source file content we might get unexpected input to the compilation? In the general case that may be true, but at least for the LSP case the updates are sequenced. In the same way the content of disk is a unique representation, (ie not multiple contents with the same filepath), the same is true in the LSP session, a filepath can not have two simultaneous in-memory content representations. The state of the file content is an ordered sequence of updates according to the LSP protocol. Not really sure how to solve this for the generic case, or if the generic case would even be useful. So maybe constraining the API so it is less generic is the right way. Specifically perhaps we should bypass the storage all together for this "uncommited" file state, it should simply be a non-shared SourceFile instance, that is not backed by the storage cache at all? |
In fairness, I misread the change so while there is a race, it's more subtle than I thought; the problems will manifest more clearly as a dangling reference. We'll get to that in a moment… The general point is that everything in Hylo's source is written with the assumption of value semantics: each copy of an instance has an independent value that can't be changed except through that instance. Violating that property can result in races and spooky action at a distance in existing code, which that code hasn't been written to account for. Any sharing of mutable state among instances can cause havoc. In the case of However, the shared reference to
One of the main reasons for building Hylo, and for building its source the way we have is to define away this kind of issue, among other things so we can freely parallelize anything without worrying about non-local properties. We're not willing to have any types in Hylo's sources that are not as threadsafe as Now you could say, "great, let's just get rid of the
I'm not really sure, because I don't know where or how you're planning to use this thing. The data structures such as It wouldn't surprise me a bit if supporting LSP well requires some fundamental changes to our data structures—to which I'm not at all opposed. That's why I think we need to step back and take a collective look at the problems. |
How shall we have this collective look? Do we want to setup an online meeting or would you like to prefer doing things asynchronously. I am fine either way but I don't really know where to start. |
Maybe @koliyo could give us (or point us to) a concise description of what is needed for LSP support, and from there we can decide how to proceed? |
Summary: Some additional context: As an IDE user I expect the editor to respond to the current state of the buffer, and preferrably with as low latency as possible, eg on each keystroke. My main reference for this, is the Roslyn/C# LSP which I think does a great job, and is very responsive. The LSP server receives textDocument/didChange updates with incremental changesets to the file. The LSP server can reconstruct the content of the IDE buffer by caching the original content, and incrementally applying the changesets as they arrive. I think it is important that we can use the filepath as identifier for the un-saved buffer. This makes it much easier to deal with the results from hylo anaylsis, traversing AST etc. If we need to differentiate between buffers from disk and unsaved in-memory buffers, that will add additional complexity to the LSP analysis, which we do avoid when we can identify unsaved buffers by their original filepath. The LSP protocol is designed so the server is active very passively and responds to requests. So after a
Ah, did not realize the unowned setup, and the possible implicit dangling pointers from replacing storage buffers in the storage cache. I definitely see this could be a problem! So, yes, I did underestimate the impact of this proposed change. And perhaps the LSP requirements does indeed necessitate some more fundamental structural changes as you say. And I also think it is wise to, at this early point, make that the compiler framework is well suited for interactive LSP usage. I think this is a very important aspect of a programming language, from the user perspective. |
The other main alternative I see is to not use the filepath to identify the usaved buffers. And instead use some kind of unique identifiers, updated for each change, eg with a running index:
This could then be managed by one additional indirection in the LSP, using a simple lookup table, from filepath to the current "head" version. eg That is definitely possible. The upside is that this would not require any direct changes to the current hylo structure and this PR is no longer needed. Edit: Hmm, but this would perhaps not work? Would we get the right content in the hylo parsing? If eg one hylo file has import of another hylo file/module, the import would then find the original disk-based storage and not the versioned in-memory storage? Or would it work? Please enlighten me |
Please keep in mind that I don't understand how a whole LSP-based system is supposed to fit together. Are you implementing an LSP server for Hylo, or are you implementing some part that plugs into an existing thing that is an LSP server? Is the thing you have to implement responsible for responding to these changesets, or is something else updating a cached in-memory representation of complete source files and passing those representations to your component? These are the kinds of questions implied by “what is needed for LSP support.” I want to know in general, if I was going to supply LSP support (as you are trying to do) for an arbitrary new language, what would I build and what specification would it have to satisfy? Am I building a library? An executable? More than that? How does that component interact with an IDE? If there's a simple introductory article somewhere that I should read to get my bearings, I'd be happy to do that. |
I think this post breaks it down pretty nicely: https://nshipster.com/language-server-protocol/. |
I am building an LSP server for Hylo, ie a server that conforms to the LSP protocol and does analysis of Hylo code using the Hylo compiler. So this is specifically for Hylo, and not generic language handling. Here is the repository: https://github.com/koliyo/hylo-lsp This code is responding to changesets, builds a in-memory representation of the edited file (which should be identical to what is inside the IDE buffer), and then use Hylo compiler to analyze the code. The release artifact is a binary executable, that communicates with the LSP client, which normally is an IDE, via JSON RPC messages, where the data communication channel typically is stdio, or local unix socket (pipe). This is then integrated into VSCode via a pretty thin wrapper in this repo: https://github.com/koliyo/hylo-vscode-extension So when I say “what is needed for LSP support.” in this PR, I mean specifically for an LSP server that process hylo files. The generic LSP protocol handling for Swift that I am using is implemented here: https://github.com/ChimeHQ/LanguageServer This is not for analyzing Swift source code files, it is a generic LSP protocol handler implemented in Swift. And this repo does not have anything to do with hylo, and is not affected by this PR. Sidenote, Apples sourcekit-lsp for Swift analysis is completely unrelated to all of this (although related in the sense that it is an LSP server implementation) |
Also, I realize the usage of the LSP terminlogy is a bit misused in this context. If we are strict about it then LSP is the protocol, which is language agnostic. But when I say hylo-lsp, i mean the LSP server that analyze hylo code. Perhaps this is unnecessary confusion due to poor naming on my part. Better project name might be:
|
For LSP server we need to parse/analyze files interactively as they are edited, before being saved to disk. Therefore we need a mechanism to create source file representations from in-memory strings.
There is already support for synthesized source file, which is similar in mechanism, but is not really what we want here.
Additionally, the source files are continuously updated as the user edits the file, so the same filepath will be reused with updated content. Which is why I needed to introduce the
replace
parameter to the file storage.Let me know if you think the design should be done differently, but the functionality is very much needed for LSP.