Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy ms image region #430

Merged
merged 7 commits into from
Sep 6, 2024
Merged

Conversation

khaledk2
Copy link
Contributor

This PR adds an Ansible playbook to deploy the MS image region. It uses the ansible-role-omero-ms-image-region, and it has been added to the requirements. It should be altered once the PRs are merged and the role is published.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions to integration the role into the existing deployment logic.

ansible/idr-ms-image-region.yml Outdated Show resolved Hide resolved
ansible/idr-ms-image-region.yml Outdated Show resolved Hide resolved
}}
- name: get database password
set_fact: >-
database_user_password: "{{ idr_secret_postgresql_password_ro | default('omero') }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use set_fact rather than using "{{ idr_secret_postgresql_password_ro | default('omero') }}" directly in the variable?

ansible/idr-ms-image-region.yml Outdated Show resolved Hide resolved
# this should be updated after merging the PRs and publich the role
- name: ome.omero_ms_image_region
src: https://github.com/khaledk2/ansible-role-omero-ms-image-region/
version: new_ms_release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the prod122 update we can certainly use the in-progress version but for future deployments this role will need to be released in order to merge this step

@sbesson
Copy link
Member

sbesson commented Aug 15, 2024

Another thought here while working through the prod122 upgrade with @dominikl. Are we limiting the deployment of the micro-service to production environments or should it also be deployed on pilot environments? In the latter case, the role will also need to be configured to work against the omeroreadwrite hosts

ansible/group_vars/omeroreadonly-hosts.yml Outdated Show resolved Hide resolved
ansible/group_vars/omeroreadonly-hosts.yml Outdated Show resolved Hide resolved
ansible/group_vars/omeroreadonly-hosts.yml Outdated Show resolved Hide resolved
@khaledk2
Copy link
Contributor Author

khaledk2 commented Aug 26, 2024

The download URL for a task inside the ome.analysis_tools role was broken (system packages | analysis utils]). I have fixed it and created a PR to fix that, i.e. ome/ansible-role-analysis-tools#7
I have edited the requirements file to use the branch for the ome.analysis_tools PR.

@sbesson sbesson self-requested a review August 26, 2024 15:46
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was successfully deployed on test123 and ready for a quick round of functional testing @pwalczysko @jburel @khaledk2 @dominikl @will-moore

The primary issue I ran into is that the initial data access via the micro-service failed with a FileNotFoundException / Permission denied when accessing the data

2024-08-27 16:16:48,376 [render-image-region-pool-8] ERROR ome.io.nio.PixelsService - Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2022-08/11/11-22-30.109/IMG_0006-0060 Diplophyllum albicans stature ventral side (2.5x).ome.tiff
java.lang.RuntimeException: java.io.FileNotFoundException: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2022-08/11/11-22-30.109/IMG_0006-0060 Diplophyllum albicans stature ventral side (2.5x).ome.tiff (Permission denied)
        at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:79)
        at ome.io.bioformats.BfPixelBuffer.setSeries(BfPixelBuffer.java:124)
        at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:898)
        at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:606)
        at com.glencoesoftware.omero.zarr.ZarrPixelsService.getPixelBuffer(ZarrPixelsService.java:373)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionRequestHandler.getPixelBuffer(ImageRegionRequestHandler.java:419)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionRequestHandler.render(ImageRegionRequestHandler.java:516)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionRequestHandler.getRegion(ImageRegionRequestHandler.java:349)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionRequestHandler.renderImageRegion(ImageRegionRequestHandler.java:326)
        at com.glencoesoftware.omero.ms.core.OmeroRequest.execute(OmeroRequest.java:109)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionVerticle.renderImageRegion(ImageRegionVerticle.java:187)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionVerticle.lambda$start$0(ImageRegionVerticle.java:133)
        at io.vertx.core.eventbus.impl.HandlerRegistration.deliver(HandlerRegistration.java:271)
        at io.vertx.core.eventbus.impl.HandlerRegistration.handle(HandlerRegistration.java:249)
        at io.vertx.core.eventbus.impl.EventBusImpl$InboundDeliveryContext.next(EventBusImpl.java:573)
        at io.vertx.core.eventbus.impl.EventBusImpl.lambda$deliverToHandler$5(EventBusImpl.java:532)
        at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
        at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
        at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.FileNotFoundException: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2022-08/11/11-22-30.109/IMG_0006-0060 Diplophyllum albicans stature ventral side (2.5x).ome.tiff (Permission denied)
        at java.base/java.io.RandomAccessFile.open0(Native Method)
        at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:345)
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259)
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214)
        at loci.common.NIOFileHandle.<init>(NIOFileHandle.java:130)
        at loci.common.NIOFileHandle.<init>(NIOFileHandle.java:151)
        at loci.common.NIOFileHandle.<init>(NIOFileHandle.java:165)
        at loci.common.Location.getHandle(Location.java:522)
        at loci.common.Location.getHandle(Location.java:462)
        at loci.common.Location.getHandle(Location.java:443)
        at loci.common.Location.getHandle(Location.java:426)
        at loci.common.Location.checkValidId(Location.java:551)
        at loci.formats.ImageReader.getReader(ImageReader.java:183)
        at loci.formats.ImageReader.setId(ImageReader.java:859)
        at ome.io.nio.PixelsService$3.setId(PixelsService.java:869)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
        at loci.formats.ChannelFiller.setId(ChannelFiller.java:258)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
        at loci.formats.ChannelSeparator.setId(ChannelSeparator.java:317)
        at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
        at loci.formats.Memoizer.setId(Memoizer.java:726)
        at ome.io.bioformats.BfPixelsWrapper.<init>(BfPixelsWrapper.java:52)
        at ome.io.bioformats.BfPixelBuffer.reader(BfPixelBuffer.java:73)
        ... 22 common frames omitted
2024-08-27 16:16:48,377 [render-image-region-pool-8] INFO  c.g.omero.ms.core.LogSpanReporter - {"traceId":"76755e09a50af5a8","parentId":"23a63d7e555009f4","id":"74c40ee9d1857b13","name":"get_pixel_buffer","timestamp":1724775408355957,"duration":21133,"localEndpoint":{"serviceName":"omero-ms-image-region","ipv4":"10.35.199.90"},"tags":{"error":"Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2022-08/11/11-22-30.109/IMG_0006-0060 Diplophyllum albicans stature ventral side (2.5x).ome.tiff","omero.pixels_id":"14234493"}}
2024-08-27 16:16:48,377 [render-image-region-pool-8] INFO  c.g.omero.ms.core.LogSpanReporter - {"traceId":"76755e09a50af5a8","parentId":"80b226d96a9da92a","id":"23a63d7e555009f4","name":"render_as_packed_int","timestamp":1724775408355922,"duration":21635,"localEndpoint":{"serviceName":"omero-ms-image-region","ipv4":"10.35.199.90"},"tags":{"omero.pixels_id":"14234493"}}
2024-08-27 16:16:48,377 [render-image-region-pool-8] ERROR c.g.o.m.i.r.ImageRegionRequestHandler - Exception while retrieving image region
ome.conditions.ResourceError: Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2022-08/11/11-22-30.109/IMG_0006-0060 Diplophyllum albicans stature ventral side (2.5x).ome.tiff
        at ome.io.nio.PixelsService.createBfPixelBuffer(PixelsService.java:907)
        at ome.io.nio.PixelsService._getPixelBuffer(PixelsService.java:606)
        at com.glencoesoftware.omero.zarr.ZarrPixelsService.getPixelBuffer(ZarrPixelsService.java:373)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionRequestHandler.getPixelBuffer(ImageRegionRequestHandler.java:419)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionRequestHandler.render(ImageRegionRequestHandler.java:516)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionRequestHandler.getRegion(ImageRegionRequestHandler.java:349)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionRequestHandler.renderImageRegion(ImageRegionRequestHandler.java:326)
        at com.glencoesoftware.omero.ms.core.OmeroRequest.execute(OmeroRequest.java:109)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionVerticle.renderImageRegion(ImageRegionVerticle.java:187)
        at com.glencoesoftware.omero.ms.image.region.ImageRegionVerticle.lambda$start$0(ImageRegionVerticle.java:133)
        at io.vertx.core.eventbus.impl.HandlerRegistration.deliver(HandlerRegistration.java:271)
        at io.vertx.core.eventbus.impl.HandlerRegistration.handle(HandlerRegistration.java:249)
        at io.vertx.core.eventbus.impl.EventBusImpl$InboundDeliveryContext.next(EventBusImpl.java:573)
        at io.vertx.core.eventbus.impl.EventBusImpl.lambda$deliverToHandler$5(EventBusImpl.java:532)
        at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
        at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
        at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)

Restarting the micro-service via systemctl on all read-only servers was sufficient to fix the problem. We encountered something very similar in the past with the omero-server service needing a restart at the end of the deployment. This has been mitigated by the introduction of the monitoring playbook which is restarting the omero-server service

The Permission denied error is likely related to the sequence of the IDR deployment:

  • the idr-01-install-idr.yml playbook creates the service user, installs the OMERO services and and starts the omero-server, omero-web, omero-ms-image-region services
  • a later internal phase creates internal symlinks from /uod/idr/filesets to /nfs/bioimage/drop with the correct ownerships (omero-server) so that the services can access the in-place imported data

Listing possible options:

  • add instructions to restart the image-region micro-services by hand at the end of the deployment
  • add a step at the end of the monitoring playbook to restart the image-region micro-service
  • move the micro-service deployment to the idr-02-services.yml phase and ensure the links are created before their deployment
  • rearchitect the deployment phases to create the links before idr-01-install-idr.yml
  • investigate what causes the Permission denied in the first place e.g. does another service need to be restarted during the link creation phase

@khaledk2
Copy link
Contributor Author

khaledk2 commented Sep 5, 2024

If no one has an obligation, as @sbesson suggested (option 2), I have added a step to ansible/idr-09-monitoring.yml to restart the image-region micro-service.
We have tested the Idr-testing with microservices today and it works fine. So, we think microservices deployment should be applied to the Idr-next.
I have made some Nginx configuration changes on the `Idr-testing to return the Omero read-only instance which handles the request and to improve the performance, I will open a dedicated PR for it.

@sbesson
Copy link
Member

sbesson commented Sep 5, 2024

Thanks @khaledk2, bb0a48f makes sense to me and should fix the issue I ran across while spinning up test123. This will be tested as part of the next full deployment and we can iterate if we encounter issues.

Code-wise, the only outstanding action is to release the underlying Ansible roles and update requirements.yml. Ideally, everything should be merged before upgrading prod123 but being mindful of timelines, we can certainly upgrade the software on prod123 from this branch like test123 /cc @francesw @jburel

@jburel
Copy link
Member

jburel commented Sep 5, 2024

@khaledk2 could you open a PR for the ansible roles?
The deployment is based on 2 of your branches and nothing has been opened

@khaledk2
Copy link
Contributor Author

khaledk2 commented Sep 5, 2024

@jburel There is an open PR to fix ansible-role-analysis-tools
ome/ansible-role-analysis-tools#7
@sbesson has already approved it, could you please check it?

I have made some changes in the ansible-role-omero-ms-image-region , based on your PR
ome/ansible-role-omero-ms-image-region#4
I have already referenced that in the PR
ome/ansible-role-omero-ms-image-region#4 (comment)

@jburel
Copy link
Member

jburel commented Sep 5, 2024

If you have made changes based on my branch, please open a PR and we close mine

@khaledk2
Copy link
Contributor Author

khaledk2 commented Sep 5, 2024

@jburel I have opened the new ome/ansible-role-omero-ms-image-region#5 PR

@sbesson sbesson merged commit 911bc8f into IDR:master Sep 6, 2024
3 checks passed
@sbesson
Copy link
Member

sbesson commented Sep 6, 2024

Thanks all. Deployed on prod123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants