The GET /_matrix/media/r0/preview_url
endpoint provides a generic preview API
for URLs which outputs Open Graph responses (with some Matrix
specific additions).
This does have trade-offs compared to other designs:
- Pros:
- Simple and flexible; can be used by any clients at any point
- Cons:
- If each homeserver provides one of these independently, all the HSes in a room may needlessly DoS the target URI
- The URL metadata must be stored somewhere, rather than just using Matrix itself to store the media.
- Matrix cannot be used to distribute the metadata between homeservers.
When Synapse is asked to preview a URL it does the following:
- Checks against a URL blacklist (defined as
url_preview_url_blacklist
in the config). - Checks the in-memory cache by URLs and returns the result if it exists. (This is also used to de-duplicate processing of multiple in-flight requests at once.)
- Kicks off a background process to generate a preview:
- Checks the database cache by URL and timestamp and returns the result if it has not expired and was successful (a 2xx return code).
- Checks if the URL matches an oEmbed pattern. If it does, update the URL to download.
- Downloads the URL and stores it into a file via the media storage provider and saves the local media metadata.
- If the media is an image:
- Generates thumbnails.
- Generates an Open Graph response based on image properties.
- If the media is HTML:
- Decodes the HTML via the stored file.
- Generates an Open Graph response from the HTML.
- If a JSON oEmbed URL was found in the HTML via autodiscovery:
- Downloads the URL and stores it into a file via the media storage provider and saves the local media metadata.
- Convert the oEmbed response to an Open Graph response.
- Override any Open Graph data from the HTML with data from oEmbed.
- If an image exists in the Open Graph response:
- Downloads the URL and stores it into a file via the media storage provider and saves the local media metadata.
- Generates thumbnails.
- Updates the Open Graph response based on image properties.
- If the media is JSON and an oEmbed URL was found:
- Convert the oEmbed response to an Open Graph response.
- If a thumbnail or image is in the oEmbed response:
- Downloads the URL and stores it into a file via the media storage provider and saves the local media metadata.
- Generates thumbnails.
- Updates the Open Graph response based on image properties.
- Stores the result in the database cache.
- Returns the result.
The in-memory cache expires after 1 hour.
Expired entries in the database cache (and their associated media files) are deleted every 10 seconds. The default expiration time is 1 hour from download.