diff --git a/README.md b/README.md index 041e9a9..b79d2c6 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,55 @@ curl -X 'POST' \ }' ``` +Example extracting Markdown with a placeholder for images (http source): + +```sh +curl -s -X POST "http://localhost:8000/convert/markdown" -H "Content-Type: application/json" -d '{ + "options": { + "include_images": false + }, + "http_source": { + "url": "https://arxiv.org/pdf/2206.01062" + } +}' > output.md +``` + +Example posting a file for conversion or explicit Markdown conversion: + +When your PDF or other file type is too large, encoding it as a base64 string +and passing it inline to curl can lead to an “Argument list too long” error on +some systems. To avoid this, we write the JSON request body to a file and have +curl read from that file. + +```sh +# 1. Base64-encode the file +B64_DATA=$(base64 -w 0 /path/to/file/pdf-to-convert.pdf) + +# 2. Build the JSON with your options +cat < /tmp/request_body.json +{ + "options": { + "output_markdown": true, + "include_images": false + }, + "file_source": { + "base64_string": "${B64_DATA}", + "filename": "pdf-to-convert.pdf" + } +} +EOF + +# 3. POST the request to the docling service +curl -X POST "http://localhost:8000/convert" \ + -H "Content-Type: application/json" \ + -d @/tmp/request_body.json + +# Or explicitly convert to Markdown +curl -X POST "http://localhost:8000/convert/markdown" \ + -H "Content-Type: application/json" \ + -d @/tmp/request_body.json +``` + ### Cuda GPU Support For GPU support try the following: