-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gsub_diff: Compare LookupTypes 2, 3, 4, 7 #455
base: main
Are you sure you want to change the base?
Conversation
The current implementation only supports LookupType 1, Single sub
In the previous implementation, only LookupType 1, Single substitutions could be compared. This commit adds support for LookupTypes 2, 3, 4 and 7. LookupTypes 5, 6, 8 are still not supported by this commit. Tests have been added to ensure each LookupType gets parsed correctly. Further tests have been added for each missing LookupType to ensure they are not getting parsed.
@anthrotype Can you look at this? |
if rule not in rules_a: | ||
diffs.append(('+',) + rule) | ||
# ('+', 'smcp', 'Q', 'Q.sc') | ||
new = [self._format_rule(r, '+') for r in self.find_new_rules()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that refactoring those two for loops into new find_new_rules
and find_missing_rules
methods make the code more readable, but you are calling the same _get_gsub_rules
method twice now, which may be an expensive operation. Perhaps you could compute rules_a
and rules_b
once and pass them as arguments to the previous methods.. but then again the methods would just become a label for a list comprehension (readable enough from POV). Well, up to you.
for name in re.findall(feature_name_rx, text): | ||
contents = re.findall(contents_rx % (name, name), text, re.S) | ||
assert len(contents) == 1, 'Multiple %s features in %s' % ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous code would assert that only one feature block with a given name exists (which might not be true, at least for handwritten, non-ttxn-generated, feature files, since the spec allows multiple feature blocks with same name).
In your _get_feature_content
below you have removed the assert and simply returning the result of re.findall
, which may contain more than one match. But you are only retaining the first one, since you do contents[0]
later on.
Was this intentional?
If we only support reading lookups from the first feature block that matches, then we need to assert that there's only one. Probably, it would be better to collect lookup rules for all the feature blocks with that name.
s.append(item) | ||
return s | ||
|
||
def _gsub_rule_group_to_string(self, seq): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find defining a method a bit overkill for such a trivial operation as seq[1:-1]
, but I don't mind.
@anthrotype Thanks for this. We are working on a design proposal for modifying the diff objects. I'm aware of the repetition. We're probably going to split the OT parsing, diffing and reporting into separate objects. I'd keep this open for now but expect more changes, to these diff modules. |
if op == 'by': # parse LookupType 1, 2, 4 | ||
parsed.append((feature, left.split(), op, right.split())) | ||
elif op == 'from': # parse LookupType 3 | ||
for glyph in right.split(): # 'a.alt a.sc' -> ['a.alt', 'a.sc'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is redundant
parsed.append((feature, left.split(), op, right.split())) | ||
elif op == 'from': # parse LookupType 3 | ||
for glyph in right.split(): # 'a.alt a.sc' -> ['a.alt', 'a.sc'] | ||
parsed.append((feature, left.split(), op, [glyph])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left.split()
shouldn't be necessary since in lookuptype 3 alternate substitutions the format is substitute <glyph> from <glyphclass>;
so the left slot can contain only one glyph. A left.strip()
would do.
@m4rc1e status? |
@anthrotype Unfortunately I've let this rot. I moved away from using ttxn. I don't know whether I should just spend an hour to finish this off? I'm still looking into otlLibunbuild ;-) |
Currently gsub_diff.GsubDiffFinder can only parse and compare LookupType 1 Single subs. We cannot compare more complicated lookups. I'm in the middle of checking a collection of families and discovered this limitation. This pr will add support for the aforementioned lookup types.
My tests use some of the examples mentioned in the .fea spec. I've also added tests for the lookup types this pr does not support; ensuring they don't get parsed incorrectly.
I decided against adding support for LookupTypes 5 and 6 because parsing OT features using regex and .ttx files is rather messy. I'd prefer notodiff and its corresponding modules to get refactored once fontTools implements fonttools/fonttools#468 fully. It can currently 'build' features for not 'unbuild'
I apologise for the size of this pr, its pretty much a complete rewrite.
If you accept this pr, I'd like to add support for finding missing features as well. this pr will only find new/missing rules which is the same behaviour as before.