Skip to content
This repository has been archived by the owner on Feb 14, 2018. It is now read-only.

Move slug generation into FedWiki.slug #248

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

harlantwood
Copy link

We had a good discussion today on slug generation, and a scale of "permissivity". Our current slugs have a low permissivity, call it 10 out of 100.

Github's Gollum wiki has very permissive slug generation, call it 90 out of 100.

It's easy to imagine that we could have it both ways -- as long as everything we cut out of a more permissive slug is always cut out of the less permissive slug, we can always use the less permissive conversion to compare two slugs.

See also the slug discussion in #156.

To get us started, I've moved the slug generation into a module, which mostly mirrors the way slugs currently work. The difference is that multiple dashes in a row are removed, as well as leading and trailing dashes. THIS WILL BREAK EXISTING DATABASES, SO WE SHOULD THINK CAREFULLY BEFORE INTEGRATING IT.

To run the specs:

bundle exec rspec spec/slug_reference.rb

@harlantwood
Copy link
Author

In a separate commit (93ad271), I added multilingual support:

www.example.com/les-misérables
www.example.com/تماس-با-ما
www.example.com/ƒåø

Which is optional, but I advocate strongly for. Otherwise the examples above are damaged or disappear entirely.

@WardCunningham
Copy link
Owner

I like the approach but agree that this code is not complete enough to be deployed. We have constraints that come from three desirable compatibilities:

  • a server's db must be compatible with the server itself.
  • a server (and its db) must be compatible with every client ever.
  • a client must be compatible with every server ever.

The second (every-client) and third (every-server) constraints arise because any random federated wiki client could requests pages from any random federated wiki server.

A server can apply what ever search algorithms it wants so long as it delivers the proper page when requested and delivers a 404 when that is the correct response. A server stores a more complete version of the page title which can be used in this search.

Its also possible that we could reliably convert a slug from one algorithm to a slug from another. This is the basis of the permissivity discussion above. For example, if we could show that for slug functions F and G, that if F(x) == F(G(x)) for all x, then we could say G is as or more permissive than F. Intuitively, G permits more characters through than F.

It has been suggested that we could allow more permissive slug functions into clients so long as servers that use (or might have used) a less permissive slug function try applying that function and repeating a query before issuing a 404.

Now I am worried that the repeated-query approach is not sufficient to handle all three constraints enumerated above. Further, I am not sure that our new function is always more permissive. Specifically, the desire to eliminate some redundant hyphens makes it less permissive while permitting international alphabetic characters makes it more permissive. Yikes.

My feeling now is that we won't be able to meet every constraint all the time. However, if we could characterize the pages that will suffer, and under what circumstances they do so, well, that would be awesome. Then we can move ahead with confidence.

(Aside: Have we given up case insensitivity or is that handled properly for all alphabets that have case?)

@harlantwood
Copy link
Author

I agree that the less+more permissive changes are an issue. Note that all of the changes are in the area of the slug specs that was marked as "problematic".

When I run the tests, these are the pages that would break if they had been stored already on existing servers:
'Welcome Visitors'
' Welcome Visitors'
'Welcome Visitors '
'Pride & Prejudice'
' - - - - '
' '

'Pride & Prejudice' is concerning because it is a legitimate title (old servers would have saved this as 'pride--prejudice').

Strings like ' Welcome Visitors' (--welcome-visitors) are unlikely, but could have been passed in as titles from converter scripts reading from other sources.

The ways forward as I see them:

  1. We could write a converter script to upgrade the slugs on the filesystems (and possibly CouchDBs) on servers
  2. We could accept that a few servers may break on a few pages
  3. We could leave the slug generation allowing multiple hyphens, as it does now on existing servers

I vote for 2 or 3.

(Aside answered in babc03c.)

@harlantwood
Copy link
Author

Does anyone know if JS supports POSIX character classes in regexps offhand, such as [:alnum:]?

Does anyone want to take on the Coffee side of this upgrade, once we finalize the details of the way forward?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants