diff --git a/src/SUMMARY.md b/src/SUMMARY.md
index 323d407ed86..7fc81a22487 100644
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@@ -68,6 +68,7 @@
- [PDF File analysis](generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/pdf-file-analysis.md)
- [PNG tricks](generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/png-tricks.md)
- [Structural File Format Exploit Detection](generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/structural-file-format-exploit-detection.md)
+ - [Svg Font Glyph Analysis And Web Drm Deobfuscation](generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/svg-font-glyph-analysis-and-web-drm-deobfuscation.md)
- [Video and Audio file analysis](generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/video-and-audio-file-analysis.md)
- [ZIPs tricks](generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/zips-tricks.md)
- [Windows Artifacts](generic-methodologies-and-resources/basic-forensic-methodology/windows-forensics/README.md)
diff --git a/src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/README.md b/src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/README.md
index 72690f3a067..416f2310c15 100644
--- a/src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/README.md
+++ b/src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/README.md
@@ -35,6 +35,11 @@ pdf-file-analysis.md
{{#endref}}
+{{#ref}}
+svg-font-glyph-analysis-and-web-drm-deobfuscation.md
+{{#endref}}
+
+
{{#ref}}
structural-file-format-exploit-detection.md
{{#endref}}
diff --git a/src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/svg-font-glyph-analysis-and-web-drm-deobfuscation.md b/src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/svg-font-glyph-analysis-and-web-drm-deobfuscation.md
new file mode 100644
index 00000000000..5c64cbc66bd
--- /dev/null
+++ b/src/generic-methodologies-and-resources/basic-forensic-methodology/specific-software-file-type-tricks/svg-font-glyph-analysis-and-web-drm-deobfuscation.md
@@ -0,0 +1,290 @@
+# SVG/Font Glyph Analysis & Web DRM Deobfuscation (Raster Hashing + SSIM)
+
+{{#include ../../../banners/hacktricks-training.md}}
+
+This page documents practical techniques to recover text from web readers that ship positioned glyph runs plus per-request vector glyph definitions (SVG paths), and that randomize glyph IDs per request to prevent scraping. The core idea is to ignore request-scoped numeric glyph IDs and fingerprint the visual shapes via raster hashing, then map shapes to characters with SSIM against a reference font atlas. The workflow generalizes beyond Kindle Cloud Reader to any viewer with similar protections.
+
+Warning: Only use these techniques to back up content you legitimately own and in compliance with applicable laws and terms.
+
+## Acquisition (example: Kindle Cloud Reader)
+
+Endpoint observed:
+- [https://read.amazon.com/renderer/render](https://read.amazon.com/renderer/render)
+
+Required materials per session:
+- Browser session cookies (normal Amazon login)
+- Rendering token from a startReading API call
+- Additional ADP session token used by the renderer
+
+Behavior:
+- Each request, when sent with browser-equivalent headers and cookies, returns a TAR archive limited to 5 pages.
+- For a long book you will need many batches; each batch uses a different randomized mapping of glyph IDs.
+
+Typical TAR contents:
+- page_data_0_4.json — positioned text runs as sequences of glyph IDs (not Unicode)
+- glyphs.json — per-request SVG path definitions for each glyph and fontFamily
+- toc.json — table of contents
+- metadata.json — book metadata
+- location_map.json — logical→visual position mappings
+
+Example page run structure:
+```json
+{
+ "type": "TextRun",
+ "glyphs": [24, 25, 74, 123, 91],
+ "rect": {"left": 100, "top": 200, "right": 850, "bottom": 220},
+ "fontStyle": "italic",
+ "fontWeight": 700,
+ "fontSize": 12.5
+}
+```
+
+Example glyphs.json entry:
+```json
+{
+ "24": {"path": "M 450 1480 L 820 1480 L 820 0 L 1050 0 L 1050 1480 ...", "fontFamily": "bookerly_normal"}
+}
+```
+
+Notes on anti-scraping path tricks:
+- Paths may include micro relative moves (e.g., `m3,1 m1,6 m-4,-7`) that confuse many vector parsers and naïve path sampling.
+- Always render filled complete paths with a robust SVG engine (e.g., CairoSVG) instead of doing command/coordinate differencing.
+
+## Why naïve decoding fails
+
+- Per-request randomized glyph substitution: glyph ID→character mapping changes every batch; IDs are meaningless globally.
+- Direct SVG coordinate comparison is brittle: identical shapes may differ in numeric coordinates or command encoding per request.
+- OCR on isolated glyphs performs poorly (≈50%), confuses punctuation and look-alike glyphs, and ignores ligatures.
+
+## Working pipeline: request-agnostic glyph normalization and mapping
+
+1) Rasterize per-request SVG glyphs
+- Build a minimal SVG document per glyph with the provided `path` and render to a fixed canvas (e.g., 512×512) using CairoSVG or an equivalent engine that handles tricky path sequences.
+- Render filled black on white; avoid strokes to eliminate renderer- and AA-dependent artifacts.
+
+2) Perceptual hashing for cross-request identity
+- Compute a perceptual hash (e.g., pHash via `imagehash.phash`) of each glyph image.
+- Treat the hash as a stable ID: the same visual shape across requests collapses to the same perceptual hash, defeating randomized IDs.
+
+3) Reference font atlas generation
+- Download the target TTF/OTF fonts (e.g., Bookerly normal/italic/bold/bold-italic).
+- Render candidates for A–Z, a–z, 0–9, punctuation, special marks (em/en dashes, quotes), and explicit ligatures: `ff`, `fi`, `fl`, `ffi`, `ffl`.
+- Keep separate atlases per font variant (normal/italic/bold/bold-italic).
+- Use a proper text shaper (HarfBuzz) if you want glyph-level fidelity for ligatures; simple rasterization via Pillow ImageFont can be sufficient if you render the ligature strings directly and the shaping engine resolves them.
+
+4) Visual similarity matching with SSIM
+- For each unknown glyph image, compute SSIM (Structural Similarity Index) against all candidate images across all font variant atlases.
+- Assign the character string of the best-scoring match. SSIM absorbs small antialiasing, scale, and coordinate differences better than pixel-exact comparisons.
+
+5) Edge handling and reconstruction
+- When a glyph maps to a ligature (multi-char), expand it during decoding.
+- Use run rectangles (top/left/right/bottom) to infer paragraph breaks (Y deltas), alignment (X patterns), style, and sizes.
+- Serialize to HTML/EPUB preserving `fontStyle`, `fontWeight`, `fontSize`, and internal links.
+
+### Implementation tips
+
+- Normalize all images to the same size and grayscale before hashing and SSIM.
+- Cache by perceptual hash to avoid recomputing SSIM for repeated glyphs across batches.
+- Use a high-quality raster size (e.g., 256–512 px) for better discrimination; downscale as needed before SSIM to accelerate.
+- If using Pillow to render TTF candidates, set the same canvas size and center the glyph; pad to avoid clipping ascenders/descenders.
+
+
+Python: end-to-end glyph normalization and matching (raster hash + SSIM)
+
+```python
+# pip install cairosvg pillow imagehash scikit-image uharfbuzz freetype-py
+import io, json, tarfile, base64, math
+from PIL import Image, ImageOps, ImageDraw, ImageFont
+import imagehash
+from skimage.metrics import structural_similarity as ssim
+import cairosvg
+
+CANVAS = (512, 512)
+BGCOLOR = 255 # white
+FGCOLOR = 0 # black
+
+# --- SVG -> raster ---
+def rasterize_svg_path(path_d: str, canvas=CANVAS) -> Image.Image:
+ # Build a minimal SVG document; rely on CAIRO for correct path handling
+ svg = f''''''
+ png_bytes = cairosvg.svg2png(bytestring=svg.encode('utf-8'))
+ img = Image.open(io.BytesIO(png_bytes)).convert('L')
+ return img
+
+# --- Perceptual hash ---
+def phash_img(img: Image.Image) -> str:
+ # Normalize to grayscale and fixed size
+ img = ImageOps.grayscale(img).resize((128, 128), Image.LANCZOS)
+ return str(imagehash.phash(img))
+
+# --- Reference atlas from TTF ---
+def render_char(candidate: str, ttf_path: str, canvas=CANVAS, size=420) -> Image.Image:
+ # Render centered text on same canvas to approximate glyph shapes
+ font = ImageFont.truetype(ttf_path, size=size)
+ img = Image.new('L', canvas, color=BGCOLOR)
+ draw = ImageDraw.Draw(img)
+ w, h = draw.textbbox((0,0), candidate, font=font)[2:]
+ dx = (canvas[0]-w)//2
+ dy = (canvas[1]-h)//2
+ draw.text((dx, dy), candidate, fill=FGCOLOR, font=font)
+ return img
+
+# --- Build atlases for variants ---
+FONT_VARIANTS = {
+ 'normal': '/path/to/Bookerly-Regular.ttf',
+ 'italic': '/path/to/Bookerly-Italic.ttf',
+ 'bold': '/path/to/Bookerly-Bold.ttf',
+ 'bolditalic':'/path/to/Bookerly-BoldItalic.ttf',
+}
+CANDIDATES = [
+ *[chr(c) for c in range(0x20, 0x7F)], # basic ASCII
+ '–', '—', '“', '”', '‘', '’', '•', # common punctuation
+ 'ff','fi','fl','ffi','ffl' # ligatures
+]
+
+def build_atlases():
+ atlases = {} # variant -> list[(char, img)]
+ for variant, ttf in FONT_VARIANTS.items():
+ out = []
+ for ch in CANDIDATES:
+ img = render_char(ch, ttf)
+ out.append((ch, img))
+ atlases[variant] = out
+ return atlases
+
+# --- SSIM match ---
+
+def best_match(img: Image.Image, atlases) -> tuple[str, float, str]:
+ # Returns (char, score, variant)
+ img_n = ImageOps.grayscale(img).resize((128,128), Image.LANCZOS)
+ img_n = ImageOps.autocontrast(img_n)
+ best = ('', -1.0, '')
+ import numpy as np
+ candA = np.array(img_n)
+ for variant, entries in atlases.items():
+ for ch, ref in entries:
+ ref_n = ImageOps.grayscale(ref).resize((128,128), Image.LANCZOS)
+ ref_n = ImageOps.autocontrast(ref_n)
+ candB = np.array(ref_n)
+ score = ssim(candA, candB)
+ if score > best[1]:
+ best = (ch, score, variant)
+ return best
+
+# --- Putting it together for one TAR batch ---
+
+def process_tar(tar_path: str, cache: dict, atlases) -> list[dict]:
+ # cache: perceptual-hash -> mapping {char, score, variant}
+ out_runs = []
+ with tarfile.open(tar_path, 'r:*') as tf:
+ glyphs = json.load(tf.extractfile('glyphs.json'))
+ # page_data_0_4.json may differ in name; list members to find it
+ pd_name = next(m.name for m in tf.getmembers() if m.name.startswith('page_data_'))
+ page_data = json.load(tf.extractfile(pd_name))
+
+ # 1. Rasterize + hash all glyphs for this batch
+ id2hash = {}
+ for gid, meta in glyphs.items():
+ img = rasterize_svg_path(meta['path'])
+ h = phash_img(img)
+ id2hash[int(gid)] = (h, img)
+
+ # 2. Ensure all hashes are resolved to characters in cache
+ for h, img in {v[0]: v[1] for v in id2hash.values()}.items():
+ if h not in cache:
+ ch, score, variant = best_match(img, atlases)
+ cache[h] = { 'char': ch, 'score': float(score), 'variant': variant }
+
+ # 3. Decode text runs
+ for run in page_data:
+ if run.get('type') != 'TextRun':
+ continue
+ decoded = []
+ for gid in run['glyphs']:
+ h, _ = id2hash[gid]
+ decoded.append(cache[h]['char'])
+ run_out = {
+ 'text': ''.join(decoded),
+ 'rect': run.get('rect'),
+ 'fontStyle': run.get('fontStyle'),
+ 'fontWeight': run.get('fontWeight'),
+ 'fontSize': run.get('fontSize'),
+ }
+ out_runs.append(run_out)
+ return out_runs
+
+# Usage sketch:
+# atlases = build_atlases()
+# cache = {}
+# for tar in sorted(glob('batches/*.tar')):
+# runs = process_tar(tar, cache, atlases)
+# # accumulate runs for layout reconstruction → EPUB/HTML
+```
+
+
+
+## Layout/EPUB reconstruction heuristics
+
+- Paragraph breaks: If the next run’s top Y exceeds the previous line’s baseline by a threshold (relative to font size), start a new paragraph.
+- Alignment: Group by similar left X for left-aligned paragraphs; detect centered lines by symmetric margins; detect right-aligned by right edges.
+- Styling: Preserve italic/bold via `fontStyle`/`fontWeight`; vary CSS classes by `fontSize` buckets to approximate headings vs body.
+- Links: If runs include link metadata (e.g., `positionId`), emit anchors and internal hrefs.
+
+## Mitigating SVG anti-scraping path tricks
+
+- Use filled paths with `fill-rule: nonzero` and a proper renderer (CairoSVG, resvg). Do not rely on path token normalization.
+- Avoid stroke rendering; focus on filled solids to sidestep hairline artifacts caused by micro relative moves.
+- Keep a stable viewBox per render so that identical shapes rasterize consistently across batches.
+
+## Performance notes
+
+- In practice, books converge to a few hundred unique glyphs (e.g., ~361 including ligatures). Cache SSIM results by perceptual hash.
+- After initial discovery, future batches predominantly re-use known hashes; decoding becomes I/O-bound.
+- Average SSIM ≈0.95 is a strong signal; consider flagging low-scoring matches for manual review.
+
+## Generalization to other viewers
+
+Any system that:
+- Returns positioned glyph runs with request-scoped numeric IDs
+- Ships per-request vector glyphs (SVG paths or subset fonts)
+- Caps pages per request to prevent bulk export
+
+…can be handled with the same normalization:
+- Rasterize per-request shapes → perceptual hash → shape ID
+- Atlas of candidate glyphs/ligatures per font variant
+- SSIM (or similar perceptual metric) to assign characters
+- Reconstruct layout from run rectangles/styles
+
+## Minimal acquisition example (sketch)
+
+Use your browser’s DevTools to capture the exact headers, cookies and tokens used by the reader when requesting `/renderer/render`. Then replicate those from a script or curl. Example outline:
+
+```bash
+curl 'https://read.amazon.com/renderer/render' \
+ -H 'Cookie: session-id=...; at-main=...; sess-at-main=...' \
+ -H 'x-adp-session: ' \
+ -H 'authorization: Bearer ' \
+ -H 'User-Agent: ' \
+ -H 'Accept: application/x-tar' \
+ --compressed --output batch_000.tar
+```
+
+Adjust parameterization (book ASIN, page window, viewport) to match the reader’s requests. Expect a 5-page-per-request cap.
+
+## Results achievable
+
+- Collapse 100+ randomized alphabets to a single glyph space via perceptual hashing
+- 100% mapping of unique glyphs with average SSIM ~0.95 when atlases include ligatures and variants
+- Reconstructed EPUB/HTML visually indistinguishable from the original
+
+## References
+
+- [Kindle Web DRM: Breaking Randomized SVG Glyph Obfuscation with Raster Hashing + SSIM (Pixelmelt blog)](https://blog.pixelmelt.dev/kindle-web-drm/)
+- [CairoSVG – SVG to PNG renderer](https://cairosvg.org/)
+- [imagehash – Perceptual image hashing (pHash)](https://pypi.org/project/ImageHash/)
+- [scikit-image – Structural Similarity Index (SSIM)](https://scikit-image.org/docs/stable/api/skimage.metrics.html#skimage.metrics.structural_similarity)
+
+{{#include ../../../banners/hacktricks-training.md}}