Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reboot results in non-working IMAGE server #926

Open
jeffbl opened this issue Dec 9, 2024 · 3 comments
Open

Reboot results in non-working IMAGE server #926

jeffbl opened this issue Dec 9, 2024 · 3 comments
Assignees

Comments

@jeffbl
Copy link
Member

jeffbl commented Dec 9, 2024

tldr: Containers not working/running after pegasus reboot until manual docker compose up -d

Rebooted pegasus
sent request for photo
Expect: full results
Actual (logs below):

  • memcached container not running, so request takes forever, errors in logs
  • semseg fails to return results, gives CUDA errors
  • YOLO does run, so object detection is ok

go to /var/docker/image
docker compose up -d
Note: bunch of containers restart/start
Note: when done, requests work fine

Expect: A reboot of pegasus should automatically result in a working system unless there is a serious problem

orchestrator-1                   | 2024-12-09T17:30:11.007759948Z Received request                                                                                                                                                                                                                                                                     
orchestrator-1                   | 2024-12-09T17:30:11.012597582Z Running preprocessors in parallel...                                                                                                                                                                                                                                                 
orchestrator-1                   | 2024-12-09T17:30:11.012610666Z Now on priority group 1                                                                                                                                                                                                                                                              
image-pegasus-cim-mcgill-ca-1    | 2024-12-09T17:30:13.205921020Z 50.17.185.102 - - [09/Dec/2024:17:30:13 +0000] "GET / HTTP/1.1" 200 10785 "-" "FreshpingBot/1.0 (+https://freshping.io/)"                                                                                                     
orchestrator-1                   | 2024-12-09T17:30:13.214607590Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.                                                                                                                  
orchestrator-1                   | 2024-12-09T17:30:13.214617499Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.                                                                                                                  
orchestrator-1                   | 2024-12-09T17:30:13.214620124Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.                                                                                                                  
orchestrator-1                   | 2024-12-09T17:30:13.214622328Z Error getting response from the cache                                                                                                     
orchestrator-1                   | 2024-12-09T17:30:13.214624292Z Error getting response from the cache                                                                                                     
orchestrator-1                   | 2024-12-09T17:30:13.214626145Z Error getting response from the cache                                                                                                     
orchestrator-1                   | 2024-12-09T17:30:13.214627989Z Sending to preprocessor "collage-detector-preprocessor"                                                                                 
orchestrator-1                   | 2024-12-09T17:30:13.215209574Z Sending to preprocessor "content-categoriser"
orchestrator-1                   | 2024-12-09T17:30:13.215517309Z Sending to preprocessor "autour-preprocessor"
collage-detector-preprocessor-1  | 2024-12-09T17:30:16.019181161Z [2024-12-09 17:30:16 +0000] [7] [DEBUG] POST /preprocessor                                                                                                                          
content-categoriser-1            | 2024-12-09T17:30:16.019395379Z [2024-12-09 17:30:16 +0000] [7] [DEBUG] POST /preprocessor                                                                                                                          
autour-preprocessor-1            | 2024-12-09T17:30:16.019422971Z [2024-12-09 17:30:16 +0000] [7] [DEBUG] POST /preprocessor                                                                                                                          
collage-detector-preprocessor-1  | 2024-12-09T17:30:16.019414375Z DEBUG:root:Received request                             
autour-preprocessor-1            | 2024-12-09T17:30:16.019629854Z DEBUG:root:Received request                                          
content-categoriser-1            | 2024-12-09T17:30:16.019648550Z DEBUG:root:Received request                                          
autour-preprocessor-1            | 2024-12-09T17:30:16.021040146Z INFO:root:Not map content. Skipping...                               
content-categoriser-1            | 2024-12-09T17:30:16.022304609Z DEBUG:fsspec.local:open file: /app/latest-0.ckpt         
content-categoriser-1            | 2024-12-09T17:30:16.041385073Z Lightning automatically upgraded your loaded checkpoint from v1.4.4 to v1.9.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file latest-0.ckpt`            
content-categoriser-1            | 2024-12-09T17:30:16.148506130Z DEBUG:root:{'category': 'photograph'}                                
orchestrator-1                   | 2024-12-09T17:30:16.149748121Z Saving Response for ca.mcgill.a11y.image.preprocessor.contentCategoriser in cache with key b5732e9d1ad8eeccc4496f41fba8b5a68b048779                                                                                           
orchestrator-1                   | 2024-12-09T17:30:16.149769241Z storing data in memcache with key b5732e9d1ad8eeccc4496f41fba8b5a68b048779                                                                                                                                    
collage-detector-preprocessor-1  | 2024-12-09T17:30:16.196171065Z DEBUG:root:{'collage': False}                                                                                                                                                                                 
orchestrator-1                   | 2024-12-09T17:30:16.196814388Z Saving Response for ca.mcgill.a11y.image.preprocessor.collageDetector in cache with key 95e22bb42bd7e0d017d0096ac60b119f81deaa91                                                                                              
orchestrator-1                   | 2024-12-09T17:30:16.196824758Z storing data in memcache with key 95e22bb42bd7e0d017d0096ac60b119f81deaa91                                                                                                                                    
orchestrator-1                   | 2024-12-09T17:30:16.196923986Z Now on priority group 2                                                                                                                                                                                       
orchestrator-1                   | 2024-12-09T17:30:18.351419135Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.                                                                                                                  
orchestrator-1                   | 2024-12-09T17:30:18.351436818Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.                                                                                                                  
orchestrator-1                   | 2024-12-09T17:30:18.351439033Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.                                                                                                                  
orchestrator-1                   | 2024-12-09T17:30:18.351440836Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:18.351442359Z Error setting response in the cache                                                                      
orchestrator-1                   | 2024-12-09T17:30:18.351443732Z Error setting response in the cache                                                                     
orchestrator-1                   | 2024-12-09T17:30:18.351445104Z Error getting response from the cache                                                                   
orchestrator-1                   | 2024-12-09T17:30:18.351446547Z Error getting response from the cache                                                                   
orchestrator-1                   | 2024-12-09T17:30:18.351447930Z Saved Response for ca.mcgill.a11y.image.preprocessor.contentCategoriser in cache with key b5732e9d1ad8eeccc4496f41fba8b5a68b048779
orchestrator-1                   | 2024-12-09T17:30:18.351449703Z Saved Response for ca.mcgill.a11y.image.preprocessor.collageDetector in cache with key 95e22bb42bd7e0d017d0096ac60b119f81deaa91
orchestrator-1                   | 2024-12-09T17:30:18.351451156Z Sending to preprocessor "graphic-tagger"                                                                 
orchestrator-1                   | 2024-12-09T17:30:18.352048582Z Sending to preprocessor "nominatim-preprocessor"                                                        
graphic-tagger-1                 | 2024-12-09T17:30:21.156337694Z [2024-12-09 17:30:21 +0000] [7] [DEBUG] POST /preprocessor                                               
graphic-tagger-1                 | 2024-12-09T17:30:21.156531312Z DEBUG:root:Received request                                                                             
nominatim-preprocessor-1         | 2024-12-09T17:30:21.157203721Z Received request                                                                                         
nominatim-preprocessor-1         | 2024-12-09T17:30:21.157347384Z Coordinates not available, cannot make a request for reverse geocode.                                   
graphic-tagger-1                 | 2024-12-09T17:30:21.158307120Z DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): canadacentral.api.cognitive.microsoft.com:443                                                                                                                                                                        
graphic-tagger-1                 | 2024-12-09T17:30:21.423153920Z DEBUG:urllib3.connectionpool:https://canadacentral.api.cognitive.microsoft.com:443 "POST /vision/v1.0/analyze?visualFeatures=Categories HTTP/11" 200 159                                                                                                                             
graphic-tagger-1                 | 2024-12-09T17:30:21.423680972Z DEBUG:root:{'category': 'other'}                                                                         
orchestrator-1                   | 2024-12-09T17:30:21.424601322Z Saving Response for ca.mcgill.a11y.image.preprocessor.graphicTagger in cache with key 54354cce78799b32bfc3f7377a848694b4a104cf
orchestrator-1                   | 2024-12-09T17:30:21.424611672Z storing data in memcache with key 54354cce78799b32bfc3f7377a848694b4a104cf                               
orchestrator-1                   | 2024-12-09T17:30:21.424733955Z Now on priority group 3                                                                                  
orchestrator-1                   | 2024-12-09T17:30:23.557707441Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:23.557732158Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:23.557736626Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:23.557740344Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:23.557743810Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:23.557747217Z Error setting response in the cache                                                                      
orchestrator-1                   | 2024-12-09T17:30:23.557920095Z Error getting response from the cache                                                                    
orchestrator-1                   | 2024-12-09T17:30:23.557936256Z Error getting response from the cache                                                                    
orchestrator-1                   | 2024-12-09T17:30:23.557974549Z Error getting response from the cache                                                                    
orchestrator-1                   | 2024-12-09T17:30:23.557977104Z Error getting response from the cache                                                                    
orchestrator-1                   | 2024-12-09T17:30:23.557979318Z Saved Response for ca.mcgill.a11y.image.preprocessor.graphicTagger in cache with key 54354cce78799b32bfc3f7377a848694b4a104cf
orchestrator-1                   | 2024-12-09T17:30:23.557981582Z Sending to preprocessor "depth-map-generator"                                                            
orchestrator-1                   | 2024-12-09T17:30:23.558557798Z Sending to preprocessor "semantic-segmentation"                                                          
orchestrator-1                   | 2024-12-09T17:30:23.558852379Z Sending to preprocessor "openstreetmap"                                                                  
orchestrator-1                   | 2024-12-09T17:30:23.559156788Z Sending to preprocessor "object-detection"                                                               
depth-map-generator-1            | 2024-12-09T17:30:26.431717576Z [2024-12-09 17:30:26 +0000] [7] [DEBUG] POST /preprocessor                                               
semantic-segmentation-1          | 2024-12-09T17:30:26.431769906Z [2024-12-09 17:30:26 +0000] [8] [DEBUG] POST /preprocessor                                               
openstreetmap-1                  | 2024-12-09T17:30:26.431877831Z [2024-12-09 17:30:26 +0000] [7] [DEBUG] POST /preprocessor                                               
depth-map-generator-1            | 2024-12-09T17:30:26.431922155Z DEBUG:root:Received request                                                                              
object-detection-1               | 2024-12-09T17:30:26.431961059Z [2024-12-09 17:30:26 +0000] [7] [DEBUG] POST /preprocessor                                               
semantic-segmentation-1          | 2024-12-09T17:30:26.431987229Z DEBUG:root:Received request                                                                              
openstreetmap-1                  | 2024-12-09T17:30:26.432081348Z 24-12-09 17:30 UTC [DEBUG]: Received request                                                             
object-detection-1               | 2024-12-09T17:30:26.432161952Z DEBUG:root:Received request                                                                              
openstreetmap-1                  | 2024-12-09T17:30:26.432928029Z 24-12-09 17:30 UTC [DEBUG]: Validating request                                                           
openstreetmap-1                  | 2024-12-09T17:30:26.433742338Z 24-12-09 17:30 UTC [INFO]: Not map content. Skipping...                                                  
semantic-segmentation-1          | 2024-12-09T17:30:26.477368438Z INFO:root:Schemas loaded                                                                                 
semantic-segmentation-1          | 2024-12-09T17:30:27.228898809Z load checkpoint from local path: /app/upernet_beit-base_8x2_640x640_160k_ade20k-eead221d.pth             
depth-map-generator-1            | 2024-12-09T17:30:27.437861723Z ERROR:depth-map-generator:Exception on /preprocessor [POST]                                              
depth-map-generator-1            | 2024-12-09T17:30:27.437889065Z Traceback (most recent call last):                                                                       
depth-map-generator-1            | 2024-12-09T17:30:27.437893263Z   File "/opt/conda/lib/python3.9/site-packages/flask/app.py", line 2529, in wsgi_app                     
depth-map-generator-1            | 2024-12-09T17:30:27.437896930Z     response = self.full_dispatch_request()                                                              
depth-map-generator-1            | 2024-12-09T17:30:27.437900156Z   File "/opt/conda/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request        
depth-map-generator-1            | 2024-12-09T17:30:27.437903552Z     rv = self.handle_user_exception(e)                                                                   
depth-map-generator-1            | 2024-12-09T17:30:27.437906959Z   File "/opt/conda/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request        
depth-map-generator-1            | 2024-12-09T17:30:27.437910165Z     rv = self.dispatch_request()                                                                         
depth-map-generator-1            | 2024-12-09T17:30:27.437913551Z   File "/opt/conda/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request             
depth-map-generator-1            | 2024-12-09T17:30:27.437916788Z     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)                             
depth-map-generator-1            | 2024-12-09T17:30:27.437919954Z   File "/app/depth-map-generator.py", line 135, in depthgenerator                                        
depth-map-generator-1            | 2024-12-09T17:30:27.437923100Z     checkpoint = torch.load("/app/res101.pth")                                                           
depth-map-generator-1            | 2024-12-09T17:30:27.437926326Z   File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 789, in load                
depth-map-generator-1            | 2024-12-09T17:30:27.437929582Z     return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)                        
depth-map-generator-1            | 2024-12-09T17:30:27.437932688Z   File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1131, in _load              
depth-map-generator-1            | 2024-12-09T17:30:27.437935914Z     result = unpickler.load()                                                                            
depth-map-generator-1            | 2024-12-09T17:30:27.437939050Z   File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1101, in persistent_load    
depth-map-generator-1            | 2024-12-09T17:30:27.437952155Z     load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))                                       
depth-map-generator-1            | 2024-12-09T17:30:27.437954249Z   File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1083, in load_tensor        
depth-map-generator-1            | 2024-12-09T17:30:27.437956253Z     wrap_storage=restore_location(storage, location),                                                    
depth-map-generator-1            | 2024-12-09T17:30:27.437958146Z   File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 215, in default_restore_location
depth-map-generator-1            | 2024-12-09T17:30:27.437960150Z     result = fn(storage, location)                                                                       
depth-map-generator-1            | 2024-12-09T17:30:27.437962014Z   File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 182, in _cuda_deserialize   
depth-map-generator-1            | 2024-12-09T17:30:27.437963957Z     device = validate_cuda_device(location)                                                              
depth-map-generator-1            | 2024-12-09T17:30:27.437965941Z   File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 166, in validate_cuda_device 
depth-map-generator-1            | 2024-12-09T17:30:27.437967985Z     raise RuntimeError('Attempting to deserialize object on a CUDA '                                     
depth-map-generator-1            | 2024-12-09T17:30:27.437970690Z RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
object-detection-1               | 2024-12-09T17:30:28.176741760Z DEBUG:root:Object Detected -person                                                                       
object-detection-1               | 2024-12-09T17:30:28.177643135Z DEBUG:root:Total number of Objects Detected - 1                                                          
object-detection-1               | 2024-12-09T17:30:28.177665037Z DEBUG:root:Sending response                                                                              
orchestrator-1                   | 2024-12-09T17:30:28.178407980Z Saving Response for ca.mcgill.a11y.image.preprocessor.objectDetection in cache with key 9748a649396a45489d2b5cdd84c0d93b4b07fb88
orchestrator-1                   | 2024-12-09T17:30:28.178420785Z storing data in memcache with key 9748a649396a45489d2b5cdd84c0d93b4b07fb88                               
orchestrator-1                   | 2024-12-09T17:30:30.380313270Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:30.380331164Z Error setting response in the cache                                                                      
orchestrator-1                   | 2024-12-09T17:30:30.380335453Z Saved Response for ca.mcgill.a11y.image.preprocessor.objectDetection in cache with key 9748a649396a45489d2b5cdd84c0d93b4b07fb88
orchestrator-1                   | 2024-12-09T17:30:36.428220742Z Error occured fetching from semantic-segmentation                                                        
orchestrator-1                   | 2024-12-09T17:30:36.428642816Z AbortError: The user aborted a request.                                                                  
orchestrator-1                   | 2024-12-09T17:30:36.428649819Z     at abort (/usr/src/app/node_modules/node-fetch/lib/index.js:1448:16)                                 
orchestrator-1                   | 2024-12-09T17:30:36.428652314Z     at AbortSignal.abortAndFinalize (/usr/src/app/node_modules/node-fetch/lib/index.js:1463:4)           
orchestrator-1                   | 2024-12-09T17:30:36.428654288Z     at [nodejs.internal.kHybridDispatch] (node:internal/event_target:816:20)                             
orchestrator-1                   | 2024-12-09T17:30:36.428656191Z     at AbortSignal.dispatchEvent (node:internal/event_target:751:26)                                     
orchestrator-1                   | 2024-12-09T17:30:36.428658005Z     at abortSignal (node:internal/abort_controller:374:10)                                               
orchestrator-1                   | 2024-12-09T17:30:36.428659838Z     at AbortController.abort (node:internal/abort_controller:396:5)                                      
orchestrator-1                   | 2024-12-09T17:30:36.428661632Z     at Timeout._onTimeout (/usr/src/app/dist/server.js:133:32)                                           
orchestrator-1                   | 2024-12-09T17:30:36.428663445Z     at listOnTimeout (node:internal/timers:581:17)                                                       
orchestrator-1                   | 2024-12-09T17:30:36.428665299Z     at process.processTimers (node:internal/timers:519:7) {                                              
orchestrator-1                   | 2024-12-09T17:30:36.428667112Z   type: 'aborted'                                                                                        
orchestrator-1                   | 2024-12-09T17:30:36.428668855Z }
orchestrator-1                   | 2024-12-09T17:30:36.428757554Z Error occured on fetch from http://depth-map-generator:5000/preprocessor                                 
orchestrator-1                   | 2024-12-09T17:30:36.428798031Z FetchError: invalid json response body at http://depth-map-generator:5000/preprocessor reason: Unexpected token '<', "<!doctype "... is not valid JSON
orchestrator-1                   | 2024-12-09T17:30:36.428806568Z     at /usr/src/app/node_modules/node-fetch/lib/index.js:273:32                                          
orchestrator-1                   | 2024-12-09T17:30:36.428810044Z     at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {                      
orchestrator-1                   | 2024-12-09T17:30:36.428826335Z   type: 'invalid-json'                                                                                   
orchestrator-1                   | 2024-12-09T17:30:36.428829060Z }                                                                                                        
orchestrator-1                   | 2024-12-09T17:30:36.428840612Z Now on priority group 4                                                                                  
orchestrator-1                   | 2024-12-09T17:30:38.631855602Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:38.631880199Z Error getting response from the cache                                                                    
orchestrator-1                   | 2024-12-09T17:30:38.631883836Z Sending to preprocessor "object-grouping"                                                                
object-grouping-1                | 2024-12-09T17:30:41.434526434Z [2024-12-09 17:30:41 +0000] [8] [DEBUG] POST /preprocessor                                               
object-grouping-1                | 2024-12-09T17:30:41.434755621Z DEBUG:root:Received request                                                                              
object-grouping-1                | 2024-12-09T17:30:41.436348374Z DEBUG:root:Number of groups 0                                                                            
object-grouping-1                | 2024-12-09T17:30:41.436358043Z DEBUG:root:Number of ungrouped objects 1                                                                 
object-grouping-1                | 2024-12-09T17:30:41.436510333Z DEBUG:root:Sending response                                                                              
orchestrator-1                   | 2024-12-09T17:30:41.436999865Z Saving Response for ca.mcgill.a11y.image.preprocessor.grouping in cache with key 875d2a7e27961f8b04c15c1915659d8b0e968d70
orchestrator-1                   | 2024-12-09T17:30:41.437007941Z storing data in memcache with key 875d2a7e27961f8b04c15c1915659d8b0e968d70                               
orchestrator-1                   | 2024-12-09T17:30:41.437097792Z Now on priority group 5                                                                                  
image-pegasus-cim-mcgill-ca-1    | 2024-12-09T17:30:42.998779485Z 50.17.185.102 - - [09/Dec/2024:17:30:42 +0000] "GET / HTTP/1.1" 200 10785 "-" "FreshpingBot/1.0 (+https://freshping.io/)"
orchestrator-1                   | 2024-12-09T17:30:43.638811937Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:43.638830582Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:43.638835461Z Error setting response in the cache                                                                      
orchestrator-1                   | 2024-12-09T17:30:43.638839459Z Error getting response from the cache                                                                    
orchestrator-1                   | 2024-12-09T17:30:43.638843126Z Saved Response for ca.mcgill.a11y.image.preprocessor.grouping in cache with key 875d2a7e27961f8b04c15c1915659d8b0e968d70
orchestrator-1                   | 2024-12-09T17:30:43.638847043Z Sending to preprocessor "object-sorting"                                                                 
object-sorting-1                 | 2024-12-09T17:30:46.442784758Z [2024-12-09 17:30:46 +0000] [8] [DEBUG] POST /preprocessor                                               
object-sorting-1                 | 2024-12-09T17:30:46.442998045Z DEBUG:root:Received request                                                                              
object-sorting-1                 | 2024-12-09T17:30:46.445036638Z DEBUG:root:Sending response                                                                              
orchestrator-1                   | 2024-12-09T17:30:46.445764445Z Saving Response for ca.mcgill.a11y.image.preprocessor.sorting in cache with key 3042e0502599dd755b9f2817dc982a7a46213b0a
orchestrator-1                   | 2024-12-09T17:30:46.445789633Z storing data in memcache with key 3042e0502599dd755b9f2817dc982a7a46213b0a                               
orchestrator-1                   | 2024-12-09T17:30:46.448157434Z Waiting for handlers...                                                                                  
map-tactile-svg-handler-1        | 2024-12-09T17:30:47.644066667Z [2024-12-09 17:30:47 +0000] [7] [DEBUG] POST /handler                                                    
map-tactile-svg-handler-1        | 2024-12-09T17:30:47.644241380Z DEBUG:root:Received request                                                                              
svg-od-handler-1                 | 2024-12-09T17:30:47.644297828Z [2024-12-09 17:30:47 +0000] [7] [DEBUG] POST /handler                                                    
svg-depth-map-1                  | 2024-12-09T17:30:47.644592669Z [2024-12-09 17:30:47 +0000] [7] [DEBUG] POST /handler                                                    
photo-tactile-svg-handler-1      | 2024-12-09T17:30:47.644846703Z [2024-12-09 17:30:47 +0000] [7] [DEBUG] POST /handler                                                    
svg-semantic-seg-handler-1       | 2024-12-09T17:30:47.644936875Z [2024-12-09 17:30:47 +0000] [7] [DEBUG] POST /handler                                                    
photo-tactile-svg-handler-1      | 2024-12-09T17:30:47.645139291Z DEBUG:root:Received request                                                                              
map-tactile-svg-handler-1        | 2024-12-09T17:30:47.645194125Z DEBUG:root:TactileSVG Renderer not supported                                                             
map-tactile-svg-handler-1        | 2024-12-09T17:30:47.645241796Z DEBUG:root:Missing 'ca.mcgill.a11y.image.renderer.TactileSVG'. Sending empty response.                   
photo-audio-haptics-handler-1    | 2024-12-09T17:30:47.645295819Z Received request                                                                                         
high-charts-handler-1            | 2024-12-09T17:30:47.645336136Z Received request                                                                                         
autour-handler-1                 | 2024-12-09T17:30:47.645378066Z Request received!                                                                                        
photo-audio-haptics-handler-1    | 2024-12-09T17:30:47.645412020Z Photo audio-haptic renderer not supported!                                                               
photo-audio-handler-1            | 2024-12-09T17:30:47.645456315Z Received request                                                                                         
autour-handler-1                 | 2024-12-09T17:30:47.645474069Z Not enough data to generate a rendering.                                                                 
high-charts-handler-1            | 2024-12-09T17:30:47.645526829Z No high charts data in this request. Skipping.                                                           
photo-audio-handler-1            | 2024-12-09T17:30:47.645557578Z Rendering title                                                                                          
photo-audio-handler-1            | 2024-12-09T17:30:47.645561746Z Constructing text renderings                                                                             
photo-audio-handler-1            | 2024-12-09T17:30:47.645564571Z Generating TTS Response                                                                                  
photo-audio-handler-1            | 2024-12-09T17:30:47.645607172Z Getting TTS in "en"                                                                                      
photo-tactile-svg-handler-1      | 2024-12-09T17:30:47.645761105Z DEBUG:root:Validating request schema                                                                     
photo-tactile-svg-handler-1      | 2024-12-09T17:30:47.646392609Z DEBUG:root:Checking whether renderer is supported                                                        
photo-tactile-svg-handler-1      | 2024-12-09T17:30:47.646398149Z DEBUG:root:TactileSVG Renderer not supported                                                             
photo-tactile-svg-handler-1      | 2024-12-09T17:30:47.646472010Z DEBUG:root:Sending response                                                                              
espnet-tts-1                     | 2024-12-09T17:30:47.646960140Z [2024-12-09 17:30:47 +0000] [9] [DEBUG] POST /service/tts/segments                                       
espnet-tts-1                     | 2024-12-09T17:30:47.647224383Z 2024-12-09 17:30:47,647 Received request                                                                 
espnet-tts-1                     | 2024-12-09T17:30:47.672206028Z 2024-12-09 17:30:47,672 RTF: 0.007834585223680557                                                        
espnet-tts-1                     | 2024-12-09T17:30:47.672223852Z 2024-12-09 17:30:47,672 Elapsed text2speech: 0.019118309020996094                                        
espnet-tts-1                     | 2024-12-09T17:30:47.672234702Z 2024-12-09 17:30:47,672 Elapsed vocoder: 0.005258798599243164                                            
espnet-tts-1                     | 2024-12-09T17:30:47.683202107Z 2024-12-09 17:30:47,683 RTF: 0.013925324875249792                                                        
espnet-tts-1                     | 2024-12-09T17:30:47.683212847Z 2024-12-09 17:30:47,683 Elapsed text2speech: 0.007828235626220703                                        
espnet-tts-1                     | 2024-12-09T17:30:47.683225201Z 2024-12-09 17:30:47,683 Elapsed vocoder: 0.003003835678100586                                            
espnet-tts-1                     | 2024-12-09T17:30:47.683314742Z 2024-12-09 17:30:47,683 Done performing TTS                                                              
espnet-tts-1                     | 2024-12-09T17:30:47.683918141Z 2024-12-09 17:30:47,683 Encoded                                                                          
espnet-tts-1                     | 2024-12-09T17:30:47.684680724Z 2024-12-09 17:30:47,684 Sending response                                                                 
photo-audio-handler-1            | 2024-12-09T17:30:47.695064257Z Forming OSC...                                                                                           
photo-audio-handler-1            | 2024-12-09T17:30:47.854083546Z Constructing segment audio rendering                                                                     
photo-audio-handler-1            | 2024-12-09T17:30:47.854186162Z Sending response                                                                                         
orchestrator-1                   | 2024-12-09T17:30:47.855732628Z Valid response generated.                                                                                
image-pegasus-cim-mcgill-ca-1    | 2024-12-09T17:30:47.856854015Z 174.88.167.186 - - [09/Dec/2024:17:30:47 +0000] "POST /render HTTP/1.1" 200 87949 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
orchestrator-1                   | 2024-12-09T17:30:47.857484977Z Wrote temporary files to /var/log/IMAGE/0de79424-6db7-470f-957c-714062e7e8b3                             
orchestrator-1                   | 2024-12-09T17:30:48.647827845Z MemJS: Server <memcached:11211> failed after (2) retries with error - socket timed out connecting to server.
orchestrator-1                   | 2024-12-09T17:30:48.647847202Z Error setting response in the cache                                                                      
orchestrator-1                   | 2024-12-09T17:30:48.647850889Z Saved Response for ca.mcgill.a11y.image.preprocessor.sorting in cache with key 3042e0502599dd755b9f2817dc982a7a46213b0a
semantic-segmentation-1          | 2024-12-09T17:30:56.482513518Z [2024-12-09 17:30:56 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:8)                                        
semantic-segmentation-1          | 2024-12-09T17:30:57.569414999Z [2024-12-09 17:30:57 +0000] [1] [ERROR] Worker (pid:8) was sent SIGKILL! Perhaps out of memory?          
semantic-segmentation-1          | 2024-12-09T17:30:57.570896673Z [2024-12-09 17:30:57 +0000] [29] [INFO] Booting worker with pid: 29                                      
semantic-segmentation-1          | 2024-12-09T17:30:58.039707981Z No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
@jeffbl
Copy link
Member Author

jeffbl commented Dec 9, 2024

Question: Is unless-stopped the best restart policy for IMAGE containers? Seems like a reboot should reset any manually stopped containers, and make sure everything is running? I did a system update that updated docker before the reboot. Maybe that stopped some containers?

@jeffbl
Copy link
Member Author

jeffbl commented Dec 9, 2024

I don't think this is related to #778, but linking since related to reboots causing bad state.

@jeffbl
Copy link
Member Author

jeffbl commented Jan 10, 2025

I just marked memcached as restart: unless-stopped since it was not running on reboot of unicorn, and should be in alignment with other containers... @jaydeepsingh25 please ping if there was a reason this was not already marked with a restart policy (default is 'no').

Leaving open since there were also problems with semseg, and maybe other issues that will be found with more testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants