-
Notifications
You must be signed in to change notification settings - Fork 178
Move slug generation into FedWiki.slug #248
base: master
Are you sure you want to change the base?
Conversation
To run: bundle exec rspec spec/slug_reference.rb
In a separate commit (93ad271), I added multilingual support: www.example.com/les-misérables Which is optional, but I advocate strongly for. Otherwise the examples above are damaged or disappear entirely. |
I like the approach but agree that this code is not complete enough to be deployed. We have constraints that come from three desirable compatibilities:
The second (every-client) and third (every-server) constraints arise because any random federated wiki client could requests pages from any random federated wiki server. A server can apply what ever search algorithms it wants so long as it delivers the proper page when requested and delivers a 404 when that is the correct response. A server stores a more complete version of the page title which can be used in this search. Its also possible that we could reliably convert a slug from one algorithm to a slug from another. This is the basis of the permissivity discussion above. For example, if we could show that for slug functions F and G, that if F(x) == F(G(x)) for all x, then we could say G is as or more permissive than F. Intuitively, G permits more characters through than F. It has been suggested that we could allow more permissive slug functions into clients so long as servers that use (or might have used) a less permissive slug function try applying that function and repeating a query before issuing a 404. Now I am worried that the repeated-query approach is not sufficient to handle all three constraints enumerated above. Further, I am not sure that our new function is always more permissive. Specifically, the desire to eliminate some redundant hyphens makes it less permissive while permitting international alphabetic characters makes it more permissive. Yikes. My feeling now is that we won't be able to meet every constraint all the time. However, if we could characterize the pages that will suffer, and under what circumstances they do so, well, that would be awesome. Then we can move ahead with confidence. (Aside: Have we given up case insensitivity or is that handled properly for all alphabets that have case?) |
I agree that the less+more permissive changes are an issue. Note that all of the changes are in the area of the slug specs that was marked as "problematic". When I run the tests, these are the pages that would break if they had been stored already on existing servers: 'Pride & Prejudice' is concerning because it is a legitimate title (old servers would have saved this as 'pride--prejudice'). Strings like ' Welcome Visitors' (--welcome-visitors) are unlikely, but could have been passed in as titles from converter scripts reading from other sources. The ways forward as I see them:
I vote for 2 or 3. (Aside answered in babc03c.) |
Does anyone know if JS supports POSIX character classes in regexps offhand, such as Does anyone want to take on the Coffee side of this upgrade, once we finalize the details of the way forward? |
We had a good discussion today on slug generation, and a scale of "permissivity". Our current slugs have a low permissivity, call it 10 out of 100.
Github's Gollum wiki has very permissive slug generation, call it 90 out of 100.
It's easy to imagine that we could have it both ways -- as long as everything we cut out of a more permissive slug is always cut out of the less permissive slug, we can always use the less permissive conversion to compare two slugs.
See also the slug discussion in #156.
To get us started, I've moved the slug generation into a module, which mostly mirrors the way slugs currently work. The difference is that multiple dashes in a row are removed, as well as leading and trailing dashes. THIS WILL BREAK EXISTING DATABASES, SO WE SHOULD THINK CAREFULLY BEFORE INTEGRATING IT.
To run the specs: