Skip to content

display real change time stamps in directory listing #4087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vladak opened this issue Nov 3, 2022 · 6 comments
Closed

display real change time stamps in directory listing #4087

vladak opened this issue Nov 3, 2022 · 6 comments
Labels
enhancement webapp web application

Comments

@vladak
Copy link
Member

vladak commented Nov 3, 2022

The problem with directory listing currently is that it uses file-system time stamps which can be a bit misleading. If the mirror of particular repository was around long enough, then the timestamps would reflect the changes in individual files, however for repositories reindexed from scratch, the time stamps of files will be initially the same. The file-system time stamps might reflect the history of the files or might not.

Code wise, this would change this:

@Override
public Map<String, Date> getLastModifiedTimes(File directory, Repository repository) {
// We don't have a good way to get this information from the file
// cache, so leave it to the caller to find a reasonable time to
// display (typically the last modified time on the file system).
return Collections.emptyMap();
}

Once the file history cache is converted to use suitable serialization scheme is implemented for #3539, it would make it possible to address this limitation by manually decoding the part of serialized history that contains the latest changeset date (to avoid unnecessary I/O), similarly how this is done in FileAnnotationCache to retrieve the revision ID.

@vladak
Copy link
Member Author

vladak commented Apr 5, 2023

The limitation of this approach is that directory based timestamps will remain the same. A different approach would be to use the index documents - there are already per directory documents for the LOC counts. The LOC documents currently do not store a timestamp but that could be easily changed in NumLinesLOCAccessor#updateDocumentData(). That said, it would have to be adjusted to record the timestemp of the latest changed document in that directory tree. Similarly for the individual documents, the date in the document would have to be determined based on the history of that file (which is where history cache would be handy), not file system time as it is currently done in AnalyzerGuru#populateDocument(). That would have to be done via new document field since replacing the date in the document uid might collide with index traversal in IndexDatabase.

@vladak vladak changed the title use history cache to display time stamps in directory listing display real change time stamps in directory listing Apr 5, 2023
@vladak
Copy link
Member Author

vladak commented Apr 6, 2023

The trouble with the per directory dates derived from changeset dates is that while the LOC documents can be used for file changes/additions (grab the date from the latest history entry in file history cache and register it with the count aggregator), if there is a single removed file in a directory tree and no other changes (i.e. incremental reindex that arrived to a situation where there is a single changeset introduced since the last time the indexer was run and that changeset contains just a removal of single file), there is no reasonable way how to extract the date of the changeset that removed the file and pass it on.

So, I am thinking about implementing this just for regular files and leave the directory dates reflect their file system time, however I am not sure if that would not be confusing.

@ChristopheBordieu
Copy link

So, I am thinking about implementing this just for regular files and leave the directory dates reflect their file system time, however I am not sure if that would not be confusing.

I think it is confusing to mix dates coming from indexing job and dates coming from commits.

@vladak
Copy link
Member Author

vladak commented Apr 6, 2023

Thanks for the feedback @ChristopheBordieu. Another alternative would be to avoid displaying the directory dates and fetch the regular file dates from history cache.

@vladak
Copy link
Member Author

vladak commented Apr 6, 2023

One thing I like about the file dates coming from history cache is that it also allows to display information about the last commit in the file listing, say in the form of some hovering window when one passes mouse cursor over the file name.

@ChristopheBordieu
Copy link

Another alternative would be to avoid displaying the directory dates and fetch the regular file dates from history cache.

I think it is a good idea, at least for Git repositories. For other SCMs, maybe different managements would be needed.
With Git, you commit files and not directories, so I would like knowing the commit dates for files. And I don't really care the date for directories.
The timestamp available in the OG homepage is enough to know when was run the last indexing (because it is run for all projects at once on our instances).

vladak added a commit to vladak/OpenGrok that referenced this issue Apr 11, 2023
@vladak vladak closed this as completed in d93fdca Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement webapp web application
Projects
None yet
Development

No branches or pull requests

2 participants