Skip to content
This repository has been archived by the owner on Feb 14, 2018. It is now read-only.

Do We Want some Common Name Processing? #156

Open
GerryG opened this issue Mar 16, 2012 · 12 comments
Open

Do We Want some Common Name Processing? #156

GerryG opened this issue Mar 16, 2012 · 12 comments

Comments

@GerryG
Copy link

GerryG commented Mar 16, 2012

I was thinking about what Ward said about my comments about Naming, and I had trouble even remembering where I put it. I added a Wiki page [[Naming Issues]], but I thought adding an issue for a place to discus what we might attempt to add to the code.

It occurred to me the we recently split off all of the name handling in Wagn into its own class (cardname), and that it wouldn't be that hard to pull this into an includable gem that is more general. It would need a name (cardname being Wagn specific), and there might need to be a little configuration of things that are class constants now.

Some of the key features that might make it worth considering:

  1. Segmented names (now it uses the JOINT = '+' separater, but this would be configurable
  2. Key processing (again you may want to configure this, but the Wagn conventions may be a good starting point).
  3. Saving time by not re-calculating stuff about a name like the key or the parts as these are lazy loaded on the object

Wagn name to key conventions:

  1. Translate non-key characters to whitespace
  2. Strip and squash whitespace to single spaces
  3. Do the 'underscore' inflection (CamelCase -> camel_case) and singularize segment by segment.

This translation is in a sigle method, so it could be configured by overriding.

I'd be willing to make a gem out of it and interface it with the server code if there is interest. A side effect may be we need it translated to coffee script which would be useful for us too.

@WardCunningham
Copy link
Owner

The current page tilte => slug code does show up in a variety of places. This is bad news because it is known to be inadequate in several ways and correcting it will be a conversion struggle. Two things could help:

  1. A clean and simple reference implementation.
  2. Test cases that demonstrate all the features of that implementation

It would be a real contribution to this project if you (anyone) were to isolate the extent versions and develop a test suite that demonstrates their features and limitations. This could be the basis of ongoing discussion of a critical issue.

Here is a template in coffeescript could belong in the /spec directory:

asSlug = (name) ->
  name.replace(/\s/g, '-').replace(/[^A-Za-z0-9-]/g, '').toLowerCase()

section = (comment) ->
  console.log "\n\t#{comment}\n"

test = (given, expected) ->
  actual = asSlug given
  console.log if actual == expected then "OK\t#{given}" else "YIKES\t#{given} => #{actual}, not #{expected} as expected"

# the following test cases presume to be implementation language agnostic
# perhaps they should be included from a common file

section 'case and hyphen insensitive'
test 'Welcome Visitors', 'welcome-visitors'
test 'welcome visitors', 'welcome-visitors'
test 'Welcome-visitors', 'welcome-visitors'


section 'numbers and punctuation'
test '2012 Report', '2012-report'
test 'Ward\'s Wiki', 'wards-wiki'

section 'foreign language'
test 'Les Misérables', 'les-misérables'

@WardCunningham
Copy link
Owner

Just pushed this code as 04d6e61. Couldn't help but add a few more cases.

@GerryG
Copy link
Author

GerryG commented Mar 21, 2012

Note a side benefit from having me make Wagn::Cardname into a more general gem: we already have internationalization in our work queue, and when we do this, we will certainly be making the changes in this new gem, or the Cardname class if we haven't made a gem yet. For example, plural/singular is part of the name folding and that clearly needs to be more general that what we have so far.

It almost requires the objects (Pages in SFW, Cards for Wagn) have full alias capability in the naming library.

@GerryG GerryG closed this as completed Mar 21, 2012
@GerryG GerryG reopened this Mar 21, 2012
@GerryG
Copy link
Author

GerryG commented Mar 23, 2012

I extracted it to a new repo and took out the Wagny parts. I'll add test cases and such as I have some more time.

https://github.com/GerryG/namelogic/blob/master/lib/name_logic.rb

Suggestions about what to name it or its methods or components are welcome.

@GerryG
Copy link
Author

GerryG commented Mar 24, 2012

Now it has some tests. They are all taken directly from Wagn tests, and I haven't looked at how good the coverage is yet.

https://github.com/GerryG/namelogic/commit/f640c695110164ecd49fc41fc95b9642a0a7969d

@WardCunningham
Copy link
Owner

@GerryG has tracked down some useful resources with respect to diacritical marks in unicode and how they might be handled in slug formation.

@GerryG
Copy link
Author

GerryG commented Apr 16, 2012

Oh, I think the point I've been dancing around be not stating clearly is about keys and equivalence classes. It is perfectly ok if different Wikis in the federation have a different representation of the key (slug), but the way the namespace is structured in terms of equivalence classes has to be mappable. For all names that map to the same key, they must map to the same key in all of the wikis. I'm pretty sure this will always mean that if you take the key from a foreign system, and convert it to a key in your system, and then reverse that process and re-encode your key into the foreign system, it will end up on the same key.

In fact, I think the latter is a lessor condition that may be enough for good interoperability. If I were a better mathematician I might be able to prove they are the same condition.

@GerryG
Copy link
Author

GerryG commented Apr 16, 2012

Now I have to find some time to document this and create a configuration variation that matches SFW rather than Wagn.

@GerryG
Copy link
Author

GerryG commented Apr 16, 2012

Do you have a preference for a name segment separator for SFW? '/' is out since you already use that. A double character like '--' seems like a reasonable options, but I don't like it that much.

Even if you don't think you'll need that, I'd reserve a syntactic character for it now.

@harlantwood
Copy link

@WardCunningham, you asked in #176 in this comment:

Is there an easy way to strip accents from latin characters? Maybe with a regex or something equally common and well supported?

Rails has a String#parameterize method, which creates a slug very much like what SFW uses (except that it allows underscores), including calling transliterate (both methods are part of the activesupport gem). Some of the docs and code from transliterate:

# Replaces non-ASCII characters with an ASCII approximation, or if none
# exists, a replacement character which defaults to "?".
#
#    transliterate("Ærøskøbing")
#    # => "AEroskobing"
#
def transliterate(string, replacement = "?")
  I18n.transliterate(ActiveSupport::Multibyte::Unicode.normalize(
    ActiveSupport::Multibyte::Unicode.tidy_bytes(string), :c),
      :replacement => replacement)
end

And parameterize:

# Replaces special characters in a string so that it may be used as part of a 'pretty' URL.
#
# ==== Examples
#
#   class Person
#     def to_param
#       "#{id}-#{name.parameterize}"
#     end
#   end
#
#   @person = Person.find(1)
#   # => #<Person id: 1, name: "Donald E. Knuth">
#
#   <%= link_to(@person.name, person_path(@person)) %>
#   # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>
def parameterize(string, sep = '-')
  # replace accented chars with their ascii equivalents
  parameterized_string = transliterate(string)
  # Turn unwanted chars into the separator
  parameterized_string.gsub!(/[^a-z0-9\-_]+/i, sep)
  unless sep.nil? || sep.empty?
    re_sep = Regexp.escape(sep)
    # No more than one of the separator in a row.
    parameterized_string.gsub!(/#{re_sep}{2,}/, sep)
    # Remove leading/trailing separator.
    parameterized_string.gsub!(/^#{re_sep}|#{re_sep}$/i, '')
  end
  parameterized_string.downcase
end

-- Both from https://github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb

@harlantwood
Copy link

In the hangout today we discussed whether non-ascii characters should be allowed in slugs, eg:

www.example.com/تماس-با-ما
www.example.com/jørgen-tuinman

I took this approach tonight in the slug generation on my 'Open Your Project' site:

https://github.com/harlantwood/software_zero/blob/c140caf64498c81a2d905afbafe4b3b9fc89f4a6/spec/lib/ruby_extensions/string_spec.rb

...I also brought in @WardCunningham's bilingual slug test idea, as posted above. Naturally the intention is to merge this code and the fedwiki slug code into one common place (gem?) when the dust settles.

I am also doing some additional processing of URLs when they used to generate slugs -- just using the path part.

The code itself is here:

https://github.com/harlantwood/software_zero/blob/c140caf64498c81a2d905afbafe4b3b9fc89f4a6/lib/ruby_extensions/string.rb#L10

@harlantwood
Copy link

Since I use slug generation in multiple contexts, I have moved the versions I find most useful into a new gem I created:

https://rubygems.org/gems/superstring

This has 2 "permissivity" settings (see #248 for permissivity discussion):

"page" (permissive) --

https://github.com/harlantwood/superstring/blob/ad82347273b6a5b844b413f70ee50e6ee3094568/spec/superstring_spec.rb#L73

and "subdomain" --

https://github.com/harlantwood/superstring/blob/ad82347273b6a5b844b413f70ee50e6ee3094568/spec/superstring_spec.rb#L97

If there is interest in using this gem in SFW, I am very open to adding other options as well. Probably the issues in #248 need to be sorted out, regardless of whether it's in an external gem or not. I mention this only in the hope that the work I've done on the gem might inspire further clarity around slugs in the SFW community.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants