Word to Markdown

Convert Word documents to beautiful Markdown. Via command line or in your browser. An even better version of the original word-to-markdown.

Supports

Paragraphs
Numbered lists
Bullet lists
Nested Lists
Headings
Lists
Tables
Footnotes and endnotes
Images
Bold, italics, underlines, strikethrough, superscript and subscript.
Links
Line breaks
Text boxes
Comments

How is this different from the original?

TL;DR: This project is a complete rewrite, using modern tools and libraries, and is much faster and more reliable. The output should be the same or better. Feedback welcome!

A note on privacy

Word to Markdown can be run locally or in your browser. In either event, the conversion happens locally, and no information ever leaves your browser.

Running Locally

Get Setup

Clone the repo
Run npm install

Command line

Run w2m path/to/your/file.docx

Web server (static HTML)

npm run server:web

Web server (HTTP API)

You can also run Word to Markdown as an HTTP API server, where you can make requests from elsewhere.

npm run server

The server exposes a POST /raw endpoint, which returns the converted Markdown.

More context

See the README of the original Word to Markdown for the project's motivation.

The old way

The Original Word to Markdown is 10 years old. The conversion process was as follows:

Use LibreOffice to convert the Word document to HTML.
Use a bunch of RegEx to clean up the HTML
User Premailer to inline the CSS
Use Nokogiri to manipulate the HTML further
Use Reverse Markdown to convert the HTML to Markdown
Use a bunch of RegEx to clean up the Markdown

Not only did this process require installing and shelling out to a huge binary (LibreOffice), but it was very fragile, and key projects like Reverse Markdown are no longer maintained. I tried experimenting with Pandoc, but it had many of the same limitation.

The new way

Use Mammoth.js to convert the Word document to HTML.
Use Turndown to convert the HTML to Markdown.
Use Markdownlint to clean up the Markdown.

All three of these projects are actively maintained and heavily used, and allows us to convert the document faster, and entirely in JavaScript. Heck, I think theoretically, this could run in the browser for added privacy.

It's still in beta, but so far, I've found the output to be better, with much less manual cleanup required. Notice something is off? Please open an issue.

One note: This project does not yet attempt to guess heading levels based on font size. It could, but it's not yet implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
.github		.github
build		build
dist		dist
src		src
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
.tool-versions		.tool-versions
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.release.json		tsconfig.release.json
webpack.config.cjs		webpack.config.cjs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word to Markdown

Supports

How is this different from the original?

A note on privacy

Running Locally

Get Setup

Command line

Web server (static HTML)

Web server (HTTP API)

More context

The old way

The new way

About

Releases

Packages

Languages

License

nmaniwa/word-to-markdown-js

Folders and files

Latest commit

History

Repository files navigation

Word to Markdown

Supports

How is this different from the original?

A note on privacy

Running Locally

Get Setup

Command line

Web server (static HTML)

Web server (HTTP API)

More context

The old way

The new way

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages