Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gsub_diff: Compare LookupTypes 2, 3, 4, 7 #455

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

m4rc1e
Copy link
Contributor

@m4rc1e m4rc1e commented Oct 4, 2017

Currently gsub_diff.GsubDiffFinder can only parse and compare LookupType 1 Single subs. We cannot compare more complicated lookups. I'm in the middle of checking a collection of families and discovered this limitation. This pr will add support for the aforementioned lookup types.

My tests use some of the examples mentioned in the .fea spec. I've also added tests for the lookup types this pr does not support; ensuring they don't get parsed incorrectly.

I decided against adding support for LookupTypes 5 and 6 because parsing OT features using regex and .ttx files is rather messy. I'd prefer notodiff and its corresponding modules to get refactored once fontTools implements fonttools/fonttools#468 fully. It can currently 'build' features for not 'unbuild'

I apologise for the size of this pr, its pretty much a complete rewrite.

If you accept this pr, I'd like to add support for finding missing features as well. this pr will only find new/missing rules which is the same behaviour as before.

The current implementation only supports LookupType 1, Single sub
In the previous implementation, only LookupType 1, Single substitutions
could be compared. This commit adds support for LookupTypes 2, 3, 4 and 7.
LookupTypes 5, 6, 8 are still not supported by this commit.

Tests have been added to ensure each LookupType gets parsed correctly.
Further tests have been added for each missing LookupType to ensure they
are not getting parsed.
@dougfelt
Copy link
Contributor

@anthrotype Can you look at this?

if rule not in rules_a:
diffs.append(('+',) + rule)
# ('+', 'smcp', 'Q', 'Q.sc')
new = [self._format_rule(r, '+') for r in self.find_new_rules()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that refactoring those two for loops into new find_new_rules and find_missing_rules methods make the code more readable, but you are calling the same _get_gsub_rules method twice now, which may be an expensive operation. Perhaps you could compute rules_a and rules_b once and pass them as arguments to the previous methods.. but then again the methods would just become a label for a list comprehension (readable enough from POV). Well, up to you.

for name in re.findall(feature_name_rx, text):
contents = re.findall(contents_rx % (name, name), text, re.S)
assert len(contents) == 1, 'Multiple %s features in %s' % (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous code would assert that only one feature block with a given name exists (which might not be true, at least for handwritten, non-ttxn-generated, feature files, since the spec allows multiple feature blocks with same name).
In your _get_feature_content below you have removed the assert and simply returning the result of re.findall, which may contain more than one match. But you are only retaining the first one, since you do contents[0] later on.
Was this intentional?
If we only support reading lookups from the first feature block that matches, then we need to assert that there's only one. Probably, it would be better to collect lookup rules for all the feature blocks with that name.

s.append(item)
return s

def _gsub_rule_group_to_string(self, seq):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find defining a method a bit overkill for such a trivial operation as seq[1:-1], but I don't mind.

@m4rc1e
Copy link
Contributor Author

m4rc1e commented Oct 12, 2017

@anthrotype Thanks for this. We are working on a design proposal for modifying the diff objects. I'm aware of the repetition. We're probably going to split the OT parsing, diffing and reporting into separate objects.

I'd keep this open for now but expect more changes, to these diff modules.

if op == 'by': # parse LookupType 1, 2, 4
parsed.append((feature, left.split(), op, right.split()))
elif op == 'from': # parse LookupType 3
for glyph in right.split(): # 'a.alt a.sc' -> ['a.alt', 'a.sc']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is redundant

parsed.append((feature, left.split(), op, right.split()))
elif op == 'from': # parse LookupType 3
for glyph in right.split(): # 'a.alt a.sc' -> ['a.alt', 'a.sc']
parsed.append((feature, left.split(), op, [glyph]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left.split() shouldn't be necessary since in lookuptype 3 alternate substitutions the format is substitute <glyph> from <glyphclass>; so the left slot can contain only one glyph. A left.strip() would do.

@anthrotype
Copy link
Contributor

@m4rc1e status?

@m4rc1e
Copy link
Contributor Author

m4rc1e commented Feb 22, 2019

@anthrotype Unfortunately I've let this rot. I moved away from using ttxn. I don't know whether I should just spend an hour to finish this off?

I'm still looking into otlLibunbuild ;-)

Base automatically changed from master to main March 1, 2021 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants