Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict NGINX (public?) access to "large" files only (e.g. .mp3, .mp4, .webm) in /library/calibre-web — for PR #51 OOM workaround #57

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

deldesir
Copy link
Collaborator

@deldesir deldesir commented Dec 2, 2023

This ensures that access is restricted to only video files with "webm" or "mp4" extensions within the /library/calibre-web/ path. Any other files or directories are denied access. This PR doesn't cover the download of videos, they are still being loaded first in memory like the other file formats. Another PR will be made to address this issue affecting huge files access.

@deldesir deldesir requested a review from holta December 2, 2023 04:36
@deldesir deldesir self-assigned this Dec 2, 2023
@deldesir deldesir added the enhancement New feature or request label Dec 2, 2023
@deldesir
Copy link
Collaborator Author

deldesir commented Dec 2, 2023

I smoke and functional tested this PR on Ubuntu 22.04. Waiting for your review @holta.

Explaining the wider context
@holta
Copy link
Member

holta commented Dec 4, 2023

@deldesir
Copy link
Collaborator Author

deldesir commented Dec 4, 2023

Smoke-tested together with #51 on Ubuntu 24.04 (Noble Numbat)

@holta
Copy link
Member

holta commented Dec 4, 2023

@deldesir, 4 questions below if you can help shine a light on where this is going:

This ensures that access is restricted to only video files with "webm" or "mp4" extensions within the /library/calibre-web/ path. Any other files or directories are denied access.

  1. Have these 2 companion PR's ([Workaround for] OOM / memory "502 Bad Gateway" crashes w/ large videos' playback or download-to-client [SEE ALSO #37 & #53] #51 with Restrict NGINX (public?) access to "large" files only (e.g. .mp3, .mp4, .webm) in /library/calibre-web — for PR #51 OOM workaround #57) been tested to make sure that real books, documents, MP3's and all other Calibre-Webfiles are properly viewable, playable and downloadable? (Via Calibre-Web's regular username/password security model? Regardless whether or not Calibre-Web has been configured for passwordless / anonymous / kiosk-style browssing?)

This PR doesn't cover the download of videos, they are still being loaded first in memory like the other file formats. Another PR will be made to address this issue affecting huge files access.

  1. When you say "Another PR will be made" are you referring to companion PR [Workaround for] OOM / memory "502 Bad Gateway" crashes w/ large videos' playback or download-to-client [SEE ALSO #37 & #53] #51 — or will another PR be needed? (If you're speaking about [Workaround for] OOM / memory "502 Bad Gateway" crashes w/ large videos' playback or download-to-client [SEE ALSO #37 & #53] #51, has that been tested to confirm out-of-memory problems no longer happen with large videos?)

  2. How can we make sure that very large books (e.g. 100's of MB!) and very long audiobooks / symphonies / radio recordings (e.g. 100's of MB!) and other giant media files do not likewise cause "502 Bad Gateway" out-of-memory crashes? As outlined within:

  3. Can Lines 26-45 of scripts/calibre-web-nginx.conf possibly be tightened up substantially? And/Or better explained if possible?

    https://github.com/deldesir/calibre-web/blob/c362d63894fd4664d0d3c75208d04cd261a14a27/scripts/calibre-web-nginx.conf#L26-L45

    The above 21 lines (would appear!) to do several other things that we (might!) not need — above-and-beyond restricting public browsing to a few "potentially greedy / gigantic" file types. QUESTION: Can extraneous code ideally/possibly be removed here, towards avoiding software maintenance headaches down the road?

@deldesir
Copy link
Collaborator Author

deldesir commented Dec 5, 2023

  1. Have these 2 companion PR's ([Workaround for] OOM / memory "502 Bad Gateway" crashes w/ large videos' playback or download-to-client [SEE ALSO #37 & #53] #51 with Deny wide access to entire calibre-web library directory #57) been tested to make sure that real books, documents, MP3's and all other Calibre-Web files are properly viewable, playable and downloadable? (Via Calibre-Web's regular username/password security model? Regardless whether or not Calibre-Web has been configured for passwordless / anonymous / kiosk-style browsing?)

The implementation of these pull requests does not alter the fundamental user experience in consuming content through Calibre-Web. Instead, their primary focus is on enabling the direct playback of video webm and mp4 files. Whether users have an account or not, the ability to browse and read books has always been available. The specific issue addressed pertained to users being served a "copy" of the book, which, when loaded into memory, resulted in an out-of-memory problem, especially with larger video files.

While the current fix does not comprehensively cover the download aspect, a workaround exists for videos. Users can click on the video to obtain its URL and subsequently save/download it. However, this workaround may not be as practical for books, where the conventional download button remains the more suitable option. It's worth noting that significant issues related to reading larger books have not been encountered thus far. Therefore, in terms of viewability and playability, the pull requests have proven effective. The downloadability aspect, however, is contingent on the specific content type and the workaround mentioned.


  1. When you say "Another PR will be made" are you referring to companion PR [Workaround for] OOM / memory "502 Bad Gateway" crashes w/ large videos' playback or download-to-client [SEE ALSO #37 & #53] #51 — or will another PR be needed? (If you're speaking about [Workaround for] OOM / memory "502 Bad Gateway" crashes w/ large videos' playback or download-to-client [SEE ALSO #37 & #53] #51, has that been tested to confirm out-of-memory problems no longer happen with large videos?)

The mention of "Another PR will be made" specifically alludes to an additional pull request that will be crafted to address the download aspect, particularly for media files that tend to be larger in size, such as videos and occasionally audios. While PR [Workaround for] OOM / memory "502 Bad Gateway" crashes w/ large videos' playback or download-to-client [SEE ALSO #37 & #53] #51 primarily focuses on resolving out-of-memory issues related to video playback, this forthcoming pull request aims to further optimize and enhance the download functionality for large media files.


  1. How can we make sure that very large books (e.g. 100's of MB!) and very long audiobooks / symphonies / radio recordings (e.g. 100's of MB!) and other giant media files do not likewise cause "502 Bad Gateway" out-of-memory crashes? As outlined within:

Calibre-Web has been designed with a primary focus on handling books. The issue of "502 Bad Gateway" out-of-memory crashes, as outlined in Some videos download as overweight (giant) files that (1) are not appropriate for schools, and (2) crash Calibre-Web with "502 Bad Gateway" #37 and Give everyone access to all books, intentionally ignoring Calibre-Web ACL accounts/authorization? #53, is primarily associated with specific scenarios involving videos, rather than the core functionality of serving books and possibly audiobooks.

Given the nature of books and audiobooks/podcasts, which typically have modest file sizes, the likelihood of encountering out-of-memory issues with very large size of them is minimal. The current architecture is adept at efficiently handling their storage, retrieval, and display, ensuring a stable and secure experience for users, even on devices with constrained resources.


  1. Can Lines 26-45 of scripts/calibre-web-nginx.conf possibly be tightened up substantially? And/Or better explained if possible?
    https://github.com/deldesir/calibre-web/blob/c362d63894fd4664d0d3c75208d04cd261a14a27/scripts/calibre-web-nginx.conf#L26-L45
    The above 21 lines (would appear!) to do several other things that we (might!) not need — above-and-beyond restricting public browsing to a few "potentially greedy / gigantic" file types. QUESTION: Can extraneous code ideally/possibly be removed here, towards avoiding software maintenance headaches down the road?

In terms of tightening it up, the commented-out lines (# fancyindex on; and # autoindex on;) were removed as they were deemed unnecessary for the current configuration. The regex patterns for file types and directories are combined for better readability and maintenance while ensuring the temporary security measures remain intact.

location /library/calibre-web/ {
    alias /library/www/html/calibre-web/;
    
    location ~* \.(webm|mp4)$ {
        # Configuration specific to video files
        add_header X-Robots-Tag "noindex, nofollow, nosnippet, noarchive";
        expires 30d;
        add_header Cache-Control "public, max-age=2592000";
        try_files $uri =404;  # Ensure only existing files are served
    }

    # Default deny for other file types and directories
    location ~* ^/library/calibre-web/.*\.(?!webm|mp4)[a-z0-9]+$,
             ~* ^/library/calibre-web/.*[^/]*$ {
        deny all;
        access_log off;
        log_not_found off;
    }
}

@holta
Copy link
Member

holta commented Dec 5, 2023

@deldesir thanks much for explaining as we work this out.

Can we please keep things extremely simple at this time? Ideally something like:

location /books-direct/ {
    alias /library/calibre-web/;

    # https://serverfault.com/a/222928
    location / { deny all; }
    location ~* \.(webm|mp4)$ { }
}

Or if absolutely necessary something like:

location /books-direct/ {
    alias /library/calibre-web/;

    location / { deny all; }
    location ~* \.(webm|mp4)$ { try_files $uri =404; }
}

@deldesir
Copy link
Collaborator Author

deldesir commented Dec 5, 2023

Can we please keep things extremely simple at this time? Ideally something like:

location /books-direct/ {
    alias /library/calibre-web/;

    # https://serverfault.com/a/222928
    location / { deny all; }
    location ~* \.(webm|mp4)$ { }
}

Absolutely, simplicity is key! I'll just make a small adjustment to it to ensure the intended access restriction.

location /books-direct/ {
    alias /library/calibre-web/;
    index off;  # Disable automatic index file processing

    # Allow serving of webm and mp4 files
    location ~* \.(webm|mp4)$ { }

    # Deny access to all other content within /books-direct/
    location /books-direct/ {deny all;}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep in mind this is a temporary fix. We'll need to align with Calibre-Web architecture eventually.

Comment on lines 23 to 29
index off; # Disable automatic index file processing

# Allow serving of webm and mp4 files
location ~* \.(webm|mp4)$ { }

# Deny access to all other content within /books-direct/
location /books-direct/ {deny all;}
Copy link
Member

@holta holta Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. index off; appears erroneous:
    autoindex off; is what you mean I suspect? Specifically:
    https://nginx.org/en/docs/http/ngx_http_autoindex_module.html in contrast to...
    https://nginx.org/en/docs/http/ngx_http_index_module.html which is really used for index index.html; etc.

    (While we investigate preserving Calibre-Web's default behaviors, as much as possible anyway!)

  2. Many NGINX folks seem to prefer location /books-direct/ {deny all;} on top of the file — if only as a reminder — to make clear that NGINX examines all prefixes first (before regex's, which are typically put on the bottom of the file, to make that clear).

  3. Does location / { deny all; } also work? (Or is longer-form location /books-direct/ { deny all; } truly required?)

Suggested change
index off; # Disable automatic index file processing
# Allow serving of webm and mp4 files
location ~* \.(webm|mp4)$ { }
# Deny access to all other content within /books-direct/
location /books-direct/ {deny all;}
#autoindex off; # Directory listings (should be off by default?)
# Deny access to all other content: https://serverfault.com/a/222928
location /books-direct/ { deny all; }
# Allow serving of webm and mp4 files
location ~* \.(webm|mp4)$ { }

Copy link
Collaborator Author

@deldesir deldesir Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- I meant autoindex off;, thanks for catching this. And yes, it's off by default.

2- Regarding the placement of location /books-direct/ {deny all;}, it won't work for me if it's on top. I would get a "404 not found" error.

3- Adding location / { deny all; } will result in the following error:

root@box:~# nginx -t
2023/12/05 15:19:23 [emerg] 2387#2387: location "/" is outside location "/books-direct/" in /etc/nginx/conf.d/calibre-web-nginx.conf:36
nginx: configuration file /etc/nginx/nginx.conf test failed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2- Regarding the placement of location /books-direct/ {deny all;}, it won't work for me if it's on top. I would get a "404 not found" error.

"404 not found" error occured with which URL(s) ?

Copy link
Member

@holta holta Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deldesir I'm not sure what you tested. Please explain.

This PR works correctly as is:

(1) I tested http://box/books-direct/ == http://10.8.0.50/books-direct/ which correctly shows 403 Forbidden.

(2) The following URL correctly plays its (tiny 256x144 rectangle!) video...

http://box/books-direct/Chubbyemu/A%20Man%20Ate%20Only%20Instant%20Noodles%20For%206%20Months.%20This%20Is%20What%20Happened%20To%20His%20Organs_%20%281%29/A%20Man%20Ate%20Only%20Instant%20Noodles%20For%206%20Month%20-%20Chubbyemu.webm

( After I clicked "Download to IIAB" and entered URL https://www.youtube.com/watch?v=fDB4TpZIgzQ )

@holta
Copy link
Member

holta commented Dec 5, 2023

  1. How can we make sure that very large books (e.g. 100's of MB!) and very long audiobooks / symphonies / radio recordings (e.g. 100's of MB!) and other giant media files do not likewise cause "502 Bad Gateway" out-of-memory crashes? As outlined within [Some videos download as overweight (giant) files that (1) are not appropriate for schools, and (2) crash Calibre-Web with "502 Bad Gateway" #37 & Give everyone access to all books, intentionally ignoring Calibre-Web ACL accounts/authorization? #53]

Given the nature of books and audiobooks/podcasts, which typically have modest file sizes, the likelihood of encountering out-of-memory issues with very large size of them is minimal. The current architecture is adept at efficiently handling their storage, retrieval, and display, ensuring a stable and secure experience for users, even on devices with constrained resources.

@deldesir How will Raspberry Pi Zero 2 W (which has just 512MB RAM) serve multi-hour audiobooks and multi-hour symphonies[*] to many people at the same time?

[*] Both of which are common on YouTube and other places, for background listening!

@holta holta changed the title Deny wide access to entire calibre-web library directory Restrict NGINX access to "large" files only (e.g. .mp3, .mp4, .webm) in /library/calibre-web — for PR #51 OOM workaround Dec 6, 2023
@holta holta changed the title Restrict NGINX access to "large" files only (e.g. .mp3, .mp4, .webm) in /library/calibre-web — for PR #51 OOM workaround Restrict NGINX (public?) access to "large" files only (e.g. .mp3, .mp4, .webm) in /library/calibre-web — for PR #51 OOM workaround Dec 6, 2023
Copy link
Collaborator Author

@deldesir deldesir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This was smoke-tested on Ubuntu 24.04.

@deldesir deldesir closed this Jul 1, 2024
@deldesir deldesir deleted the deldesir-access-fix branch July 1, 2024 21:42
@deldesir deldesir restored the deldesir-access-fix branch July 1, 2024 22:02
@deldesir deldesir reopened this Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Give everyone access to all books, intentionally ignoring Calibre-Web ACL accounts/authorization?
2 participants