Prevent API service from becoming totally unresponsive/DoS #2085

yarikoptic · 2024-11-26T15:58:14Z

Last saturday (Nov 23, 2024) we got main archive to become unresponsive because of being unable to communicate to API server. To the visitor it hanged for awhile eventually showing

more on the situation could be discovered in slack: https://app.slack.com/client/E01044K0LBZ/GMRLT5RQ8

Sample of logs from around that point in time happen someone decides to check:

2024-11-23T01:25:01.383611+00:00 app[web.1]: 10.1.87.10 - - [23/Nov/2024:01:25:01 +0000] "GET /api/dandisets/000026/versions/draft/assets/?path=sub-I48%2Fses-SPIM%2Fmicr%2Fsub-I48_ses-SPIM_sample-BrocaAreaS08_stain-Calretinin_SPIM.ome.zarr&metadata=1&order=path HTTP/1.1" 200 1684 "-" "dandidav/0.5.0 (https://github.com/dandi/dandidav)"
2024-11-23T01:25:01.626650+00:00 app[web.1]: 10.1.61.240 - - [23/Nov/2024:01:25:01 +0000] "GET /api/dandisets/000026/versions/draft/assets/?path=sub-I48%2Fses-SPIM%2Fmicr%2Fsub-I48_ses-SPIM_sample-BrocaAreaS09_stain-Nuclei_SPIM.ome.zarr&metadata=1&order=path HTTP/1.1" 200 1676 "-" "dandidav/0.5.0 (https://github.com/dandi/dandidav)"
2024-11-23T01:25:01.466962+00:00 app[analytics-worker.1]: [2024-11-23 01:25:01,466: INFO/ForkPoolWorker-1] Task dandiapi.analytics.tasks.process_s3_log_file_task[44b248eb-084c-4fc0-b43b-30e00a4c4783] succeeded in 0.9786828100041021s: None

and specific "trigger" to the situation is webdav needing to make per-asset requests on a heavy in number of assets dandiset 000026. The particular issue to be addressed to allow for more efficient API is

Add endpoint for querying a folder or asset path in a Dandiset #1837

but the point of this issue is different. IMHO API service should be made more robust against DoS situations where one client or some specific set of IPs hog it up preventing others entirely. Possibly it could be done via limiting but I think it is worth looking into some "QoS" (quality of service) balancing/throttling.

The text was updated successfully, but these errors were encountered:

waxlamp · 2024-12-12T14:40:11Z

Possibly it could be done via limiting but I think it is worth looking into some "QoS" (quality of service) balancing/throttling.

The standard solution is rate limiting. QoS balancing/throttling sounds orders of magnitude more complex. If we want to look into it as a research-type solution, that's fine, but rate limiting is really the thing to do.

yarikoptic added the performance Improve performance of an existing feature label Nov 26, 2024

yarikoptic mentioned this issue Dec 10, 2024

Design and implement rate limiting policy #1902

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent API service from becoming totally unresponsive/DoS #2085

Prevent API service from becoming totally unresponsive/DoS #2085

yarikoptic commented Nov 26, 2024

waxlamp commented Dec 12, 2024

Prevent API service from becoming totally unresponsive/DoS #2085

Prevent API service from becoming totally unresponsive/DoS #2085

Comments

yarikoptic commented Nov 26, 2024

waxlamp commented Dec 12, 2024