-
Notifications
You must be signed in to change notification settings - Fork 10
mirror of http://savannah.nongnu.org/git/?group=gpicker
License
alk/gpicker
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. What is gpicker? ---------------- gpicker is a program that allows you to quickly and conveniently pick file in a (possibly very large) project. You type significant letters of file name (typically from the start of words) and gpicker provides you with a list of files you most likely mean to pick. The program filters and orders project's list of files in real-time as you type. To place 'correct' matches on top gpicker uses sophisticated scoring heuristics. This scoring heuristic tries to capture user's intention and effectively 'un-abbreviate' pattern, rather than simply order results by edit distance. Scoring is implemented by efficient dynamic programming algorithm which makes filtration and ordering very fast. The scoring details are described below. It was inspired by class finding facility of IntelliJ IDEA and 'Command-T' feature of TextMate, but it's in many ways much better. It is language-agnostic and supports matching directory names too. Read on for details. It is usable as a standalone program, but it was created to be used by editors/IDEs. Emacs & VIM integration is provided. See comments on top of gpicker.{el,vim} for installation notes. This README describes general usage concepts. There are also out-of-tree integrations for netbeans -- http://github.com/avsej/gpicker-netbeans and gedit -- http://github.com/yltsrc/gedit-gpicker. http://github.com/alk/supermegadoc uses gpicker to navigate/pick-from names in documentation system index. High-level description of gpicker's filtration/sorting approach --------------------------------------------------------------- Main observation behind gpicker is what we use (or would like to use) when specifying target name in few key presses. The answer is abbreviation of 'words' in that target name. For example, for abbreviating 'active_record' it is 'natural' to use 'ar' (first chars from each word), 'arec' or 'a-rec' (emphasizing 'r' as start of new word). So sorting task is to order names in candidate list, so that the names for which pattern is best abbreviation are placed on top. Many 'smart' autocompleters focus on edit distance (Levenstein distance, for example), but really smart autocompleter should try to 'un-abbreviate' pattern instead. Levenstein is good for spellchecking, but autocompletion is about trying to capture user's intention. And user abbreviates, not just omits random chars. Another notable feature of gpicker is separate treatment of path basename and dirname. Ordinary patterns filter only basenames. In many cases this is sufficient. But in some projects there are files with same names in different directories. Notable example is base.rb in Ruby on Rails sources. epsilon:~/src/altoros/rails# git ls-files | gpicker-simple base | head actionpack/lib/action_view/base.rb railties/lib/rails_generator/base.rb activemodel/lib/active_model/base.rb .. many other base.rb entries are skipped ... For such cases matching dirname of complete path is useful. gpicker does that in a smart way. Dirname pattern is used only to break ties between basename matches. epsilon:~/src/altoros/rails# git ls-files | gpicker-simple ac-reco/base activerecord/lib/active_record/base.rb activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb Empty basename patterns can be used to browse list of files in some directory. For example 'are/con-ad/' can be used to see contents of vendor/rails/activerecord/lib/active_record/connection_adapters/. gpicker doesn't attempt to be too strict at filtering. Most effort in design of gpicker's internals was spent on sorting candidate entries, not eliminating them. The goal was to place 'correct' match on top, or at least on first page. How many results in total we get is irrelevant, if we can find what we need in first few results. gpicker-simple -------------- gpicker is a Gtk+ program. In some cases and on some OSes this is inconvenient. So there's also gpicker-simple program that doesn't depend on glib and Gtk+. In default './configure' build gpicker-simple is just a hardlink to gpicker. But see Makefile.simple if you want to build dependency-free gpicker-simple. I think it even should be possible to build gpicker-simple on windows, with few tweaks (like providing missing getopt(3) implementation). More details ------------ Based on gpicker's main idea I designed the following: * first, list of candidate names should be filtered so that only names where all pattern characters match in correct order (increasing position) remain * matches should generally be case-insensitive, though camel-case-transitions should be handled as beginnings of words * each candidate name should be assigned some score based on how well pattern abbreviates that name * candidate name list is sorted with descending score (with some additional rules to break ties) The scoring system adds scores of all matches and is based on the following observations: * match on beginning of word is good and should be given some score. So 'a' and 'r' are good abbreviations for 'active_record' because they match beginnings of words. While 'c' and 'e' (that match too) are not. * match on beginning of first word is even better (because we usually abbreviate from first word). So 'a' is better abbreviation for 'active_record' than 'r' (both match, but 'r' probably queries for something that begins with 'r', like 'red'). * patterns can have word separators too, to emphasize word start. So 'a-r', most probably, means: I want name with word that starts on 'a' and then has another word that starts on 'r'. Such emphasized matches on beginnings of words should be rewarded more than 'wild' matches on beginnings of words. So 'a-r' is better fit for 'active-record' than 'ar' (because 'ar' could mean 'arsenal'). * adjacent matches are more important than non-adjacent matches. 'ar' fits 'arsenal' well. And it even fits 'paragraph' more or less well (substring match). While it doesn't fit 'alert' at all (but matches). This set of rules encourages matches on beginnings of words and positions that are adjacent to previous matches. But what scores to assign in every case ? In particular, it's not clear what's relative order of scores for adjacent match versus 'wild' match on start of word. I decided that 'ar' should score more on 'arsenal', than on 'active-record'. I've chosen adjacency score to be a bit less then doubled wild word beginning match score. Probably, to get this: arsenal:100800 ^^^ active-record-simple:100402 ^ ^ ^ arbiters:100400 ^^ ^ active-records:100201 ^ ^ ^ So the scoring system is: * 'proper' word beginning match is awarded 0x100000 points. This, obviously, includes match on beginning of first word * 'wild' word beginning match -- 0x201 * and 0x400 is given for adjacent match Performance notes ----------------- Don't judge gpicker's performance by first run. gpicker scans project for list of files on every start. This can take a while on first run. But subsequent runs will hit OS's directory entries cache and will start almost instantly. Real-time filtration should be snappy even on largest projects. At least if your CPU is not very old. Usage in examples ----------------- When I want to pick 'vendor/rails/activerecord/lib/active_record/base.rb' inside rails project I can type 'ba' or 'bas'. But it won't be displayed on top, because rails has several files named base.rb in different directories. I can type 'ar/ba' (or even 'ar/b'). This will match 'ar' against directory name and 'ba' against basename and will place 'correct' file on top. To pick source of java.lang.ClassLoader class inside openjdk I can try 'cload'. There a many matches and correct file is not in top 5 (but it's in top 10). I can try placing emphasis on start of words by capitalizing them - 'cLoa' or 'CLoa'. But that removes only one extra match. I can add directory name pattern to better convey my intention. 'lan/cLo' is sufficient to find correct file on second place. It can be selected via arrow keys now. And 'cl/la/cLo' (added another part of directory - 'classes') or 'lan/cLo.j' (added extension) is enough to place it first. 'clloderj' works too. Notice skipped 'a' from 'loader'. There are two ways to emphasize start of words. One is capitalizing first letters. 'aRe' or 'AR' will give large score to 'active_record', 'active-record', 'active.record' and 'ActiveRecord'. Second way is placing matching delimiters before first letter. 'b.r' will give large score for 'base.rb', because match of 'r' after delimiter is considered start of the word. Delimiters '_' and '-' are interchangeable. So association_proxy.rb can be matched with 'a-pro' and with 'a_pro' (and with 'aPro' of course). That was done because '-' is easier to type. Filtering & scoring details --------------------------- Only matching entries pass through filter. In simple words, there is a match between given pattern and name if and only if it is possible to transform name to pattern by removing characters from it without changing order of remaining characters. Matching is case insensitive, though capital letters both in pattern and in name are additionally treated as word starts. Letters after delimiters also count as word starts. Smart ordering of matches is based on matches' scores. Scoring rules are quite simple. Each character match is given some score and those are summed for total match score. Match on word start is given 0x100000 points if matching pattern character is on word start too, and 0x201 points otherwise. Match is given 0x400 points if it's adjacent to previous match. And all other matches are given 0 points. The idea is that 'proper' word start matches are most precious. All other matches score significantly less. Among those substring matches score about twice as much as non-proper word start matches. And completely wild matches do not score at all. If there are several matches for given name then the scoring algorithm picks leftmost (or rightmost, for directory name matches) with maximal score. Directory names and basenames are scored separately and directory name score is considered only when basename scores are equal. For typically short patterns it's not unlikely to have a group of names with maximal score. So great attention is given to ordering names with equal scores. Current ordering heuristic takes compactness of match into account (minimization of last matching character index) and length of names. See code in filtration.c for exact details.
About
mirror of http://savannah.nongnu.org/git/?group=gpicker
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published