Skip to content

Is there a way to get CID values for a given character? #1140

Answered by JorjMcKie
inf3rnus asked this question in Q&A
Discussion options

You must be logged in to vote

Presumably, the CID means the glyph id.
Yes you can access that: there is a - yet internal - function page._getTexttrace(). I am using this currently for doc.subset_fonts() to overcome cases, where the unicode cannot be determined, but the glyph id can.
I have no official documentation yet, so here is the output of the first page of PyMuPDF's PDF documentation:

>>> import fitz
>>> from pprint import pprint
>>> doc=fitz.open("pymupdf.pdf")
>>> page=doc[0]
>>> pprint(page._getTexttrace())  # a list of dictionaries of the page's text spans
[{'ascender': 0.9490000009536743,  # font ascender
  'bidi': 0,  # ignore for now
  # list of character information:
  'chars': ((80,  # unicode

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@inf3rnus
Comment options

@JorjMcKie
Comment options

Answer selected by inf3rnus
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants