Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add OpenSearch #21 #31

Merged
merged 2 commits into from
Jul 12, 2023
Merged

feat: add OpenSearch #21 #31

merged 2 commits into from
Jul 12, 2023

Conversation

cmltaWt0
Copy link
Contributor

@cmltaWt0 cmltaWt0 commented Apr 11, 2023

OpenSearch integration as a Shared resource.

Testing instructions

  1. Install harmony chart.
  2. Install and enable k8s_harmony tutor plugin.
  3. Create opensearch user.
tutor harmony create-opensearch-user
  1. Verify created User from an opensearch node:
curl --insecure -u harmony:${HARMONY_PASSWORD} -XGET https://localhost:9200/_plugins/_security/api/internalusers/openedx-01

result:

{"openedx-01":{"hash":"","reserved":false,"hidden":false,"backend_roles":[],"attributes":{},"opendistro_security_roles":["openedx-01_role"],"static":false}}
  1. Verify created Role from an OpenSearch node:
 curl --insecure -u harmony:${HARMONY_PASSWORD} -XGET https://localhost:9200/_plugins/_security/api/roles/openedx-01_role

result:

{"openedx-01_role":{"reserved":false,"hidden":false,"cluster_permissions":[],"index_permissions":[{"index_patterns":["openedx-01-uwur-*"],"fls":[],"masked_fields":[],"allowed_actions":["all"]}],"tenant_permissions":[],"static":false}}
  1. Install openedx via tutor k8s.
  2. Open CMS pod.
  3. Reindex content library
./manage.py cms reindex_content_library
  1. Open Studio > Create a Course > Reindex it.

Implements #21

Issues

  • edx-platform has several places where Elastic is accessed directly w/o edx-search library. This leads to AccessDenied response due to Index prefix separation.
    ex.
File "/openedx/edx-platform/cms/djangoapps/contentstore/management/commands/reindex_course.py", line 81, in handle
    index_exists = searcher._es.indices.exists(index=index_name)  # pylint: disable=protected-access
    ....
elasticsearch.exceptions.AuthorizationException: AuthorizationException(403, '')

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Apr 11, 2023
@openedx-webhooks
Copy link

openedx-webhooks commented Apr 11, 2023

Thanks for the pull request, @cmltaWt0! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@bradenmacdonald
Copy link
Contributor

Awesome, thanks @cmltaWt0! let us know when you want a review.

@itsjeyd itsjeyd added the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Apr 13, 2023
@cmltaWt0 cmltaWt0 force-pushed the opensearch branch 7 times, most recently from 6f80b98 to 7d08f5d Compare April 25, 2023 12:41
@cmltaWt0 cmltaWt0 marked this pull request as ready for review April 27, 2023 08:00
@cmltaWt0
Copy link
Contributor Author

@bradenmacdonald PR is ready for review.

Some concerns I want to emphasise:

  • settings patching - I'm struggling to decide how to deal with enabled shared Elastic and OpenSearch at the same time.
  • I updated ELASTIC config with OpenSearch creds. But I'm not fully experienced with compatibility details between ES and OpenSearch. Tested very briefly.
  • [Super minor] naming - I named OpenSearch cluster as opensearch-cluster so the full service name is a bit different from elastic one (not cluster word for clarification the nature of deployment).

@bradenmacdonald
Copy link
Contributor

@cmltaWt0 Awesome, thanks! I probably can't get to this until next week but I'll take a look then.

how to deal with enabled shared Elastic and OpenSearch at the same time.

I don't think we need to support that? People should be using one or the other.

Copy link
Contributor

@MoisesGSalas MoisesGSalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Braden's comment about not supporting both opensearch and elasticsearch at the same time.

Would it be possible to use a single endpoint for the cluster using the
clusterName value? Seems like it's available in both the opensearch and elasticsearch charts. Setting that to a common value in our values.yaml for both opensearch.clusterName and elasticsearch.clusterName should do the trick I would think. Or is there any concern if we do that?

On a similar note, can we use a single helper class to interact with both APIs? The code in both elasticsearch.py and opensearch.py seems quite similar, so maybe instead we can use a single class that receives additional parameters besides namespace (like the endpoint for example) and each command instantiates it accordingly.

@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented May 2, 2023

I agree. Looks like the current simple implementation is enough.

@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented May 2, 2023

I agree with Braden's comment about not supporting both opensearch and elasticsearch at the same time.

Would it be possible to use a single endpoint for the cluster using the clusterName value? Seems like it's available in both the opensearch and elasticsearch charts. Setting that to a common value in our values.yaml for both opensearch.clusterName and elasticsearch.clusterName should do the trick I would think. Or is there any concern if we do that?

On a similar note, can we use a single helper class to interact with both APIs? The code in both elasticsearch.py and opensearch.py seems quite similar, so maybe instead we can use a single class that receives additional parameters besides namespace (like the endpoint for example) and each command instantiates it accordingly.

@MoisesGSalas Good idea, btw.
I'll unite the entrypoint (cluster name) and try to check for incidentally enabling both elasticsearch and opensearch - what we want to prevent.

@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented May 2, 2023

The code in both elasticsearch.py and opensearch.py seems quite similar, so maybe instead we can use a single class that receives additional parameters besides namespace (like the endpoint for example) and each command instantiates it accordingly.

Definitely! Thanks.

@cmltaWt0
Copy link
Contributor Author

@MoisesGSalas @bradenmacdonald I've added some sort of unification. Could we continue the review?

@itsjeyd itsjeyd removed the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label May 23, 2023
@itsjeyd itsjeyd requested a review from MoisesGSalas May 23, 2023 12:19
Copy link
Contributor

@MoisesGSalas MoisesGSalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cmltaWt0, while trying to deploy this in an EKS cluster I noticed a few errors at the start:

vm.max_map_count [65530] is too low, increase to at least [262144]

Seems like configuring that on opensearch is disabled by default while in the elasticsearch is enabled by default. Maybe we should enabled it in our chart defaults? I didn't manage to make it work in my first test so I copy pasted this, but it was probably an mistake on my part.

@@ -21,7 +21,7 @@
# The workaround is to manually add a list of hosts to be routed to the caddy
# instance.
"INGRESS_HOST_LIST": [],
"ENABLE_SHARED_ELASTICSEARCH": False,
"K8S_HARMONY_ENABLE_SHARED_HARMONY_SEARCH": False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prefix (K8S_HARMONY_) shouldn't be included here because we add the prefix to each key while loading the configs in the hook.

hooks.Filters.CONFIG_DEFAULTS.add_items(
[(f"K8S_HARMONY_{key}", value) for key, value in config["defaults"].items()]
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

ELASTIC_SEARCH_CONFIG = [{
"use_ssl": True,
"host": "elasticsearch-master.{{K8S_HARMONY_NAMESPACE}}.svc.cluster.local",
"host": "harmony-search-cluster-master.{{K8S_HARMONY_NAMESPACE}}.svc.cluster.local",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service name that I got was actually harmony-search-cluster.{{K8S_HARMONY_NAMESPACE}}.svc.cluster.local instead of harmony-search-cluster-master.{{K8S_HARMONY_NAMESPACE}}.svc.cluster.local. After changing this I was able to index a course successfully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MoisesGSalas. I tested with these config successfully. Will re-test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed change after unification.
Fixed.

@itsjeyd itsjeyd added the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label May 31, 2023
@itsjeyd
Copy link

itsjeyd commented Jun 7, 2023

Hey @cmltaWt0, a friendly reminder to follow up on @MoisesGSalas's comments above.

@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented Jun 7, 2023

Hey @cmltaWt0, a friendly reminder to follow up on @MoisesGSalas's comments above.

Thanks, @itsjeyd. I'm going to resolve the comments.

@cmltaWt0 cmltaWt0 force-pushed the opensearch branch 2 times, most recently from 5696010 to 5253c73 Compare June 13, 2023 17:01
@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented Jun 13, 2023

@MoisesGSalas it's a bit hardcoded here to be able to test everything else. And I'm going to come up with a final configuration for certificates.

UPD: probably the issue is in cert generation template both for elasticsearch and opensearch:

$ openssl x509 -subject -nameopt RFC2253 -noout -in tls.crt
subject= CN=harmony-search-cluster-master.{{ Release.Namespace }}.local

UPD2:
fixed by this

{{- $cn := printf "opensearch-master.%s.local" .Release.Namespace }}
{{- $cert := genSignedCert $cn nil (list $cn) 1825 $ca }}

Fixed for both ElasticSearch and OpenSearch.

@itsjeyd
Copy link

itsjeyd commented Jun 15, 2023

@cmltaWt0 Thanks for the updates! Are the changes ready for another round of review or will you be making further changes to the code?

CC @MoisesGSalas

@cmltaWt0
Copy link
Contributor Author

@itsjeyd PR is ready for the second review.
I've committed all changes I want. Thanks for ping!

@itsjeyd itsjeyd removed the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Jun 15, 2023
@itsjeyd
Copy link

itsjeyd commented Jun 27, 2023

No problem @cmltaWt0.

@MoisesGSalas Could you please give this another look (or let us know when you think you'll have time for a second review pass)?

@itsjeyd itsjeyd added the waiting for eng review PR is ready for review. Review and merge it, or suggest changes. label Jun 27, 2023
Comment on lines 1 to 6
{% if K8S_HARMONY_ENABLE_SHARED_HARMONY_SEARCH %}
# This is needed otherwise the previously installed edx-search
# package doesn't get replaced. Once the below branch is merged
# upstream it will no longer be needed.
RUN pip uninstall -y edx-search
RUN pip install --upgrade git+https://github.com/open-craft/edx-search.git@keith/prefixed-index-names
RUN pip install edx-search==3.5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version of edx-search landed on Palm, same as the changes to forum. Maybe this is no longer necessary and we should bump the appVersion to v16.0.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MoisesGSalas Updated and re-tested with Palm with any patching. Everything works pretty good.

@MoisesGSalas
Copy link
Contributor

@cmltaWt0, I think the issue with the reindex shouldn't be a blocker considering the --setup flag seems to be only used while using the devstack. I think the rest of the pr looks fine.

@MoisesGSalas
Copy link
Contributor

@cmltaWt0, actually I've noticed an error while recreating the opensearch cluster and testing again. Whenever I try to refresh a course index I get a 403 error on the cms logs and the following error on the opensearch logs:

[2023-06-27T22:11:22,024][INFO ][o.o.s.p.PrivilegesEvaluator] [harmony-search-cluster-master-1] No index-level perm match for User [name=openedx-01, backend_roles=[], requestedTenant=null] Resolved [aliases=[], allIndices=[courseware_content], types=[*], originalRequested=[courseware_content], remoteIndices=[]] [Action [indices:admin/get]] [RolesChecked [own_index, openedx-01_role]]
[2023-06-27T22:11:22,024][INFO ][o.o.s.p.PrivilegesEvaluator] [harmony-search-cluster-master-1] No permissions for [indices:admin/get]

I don't know if I missed something.

@cmltaWt0 cmltaWt0 self-assigned this Jun 28, 2023
@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented Jun 28, 2023

./manage.py cms reindex_course --all --setup

@MoisesGSalas
You didn't miss anything. That's exactly the same issue I described here.

Your logs says the openedx-01 is trying to access the originalRequested=[courseware_content] index without any prefix.
So you probably encountered another place in edx-platform the elasticsearch is accessed without edx-search and asking for index without a prefix.

Sample log with a prefixed index:

[2023-06-28T07:49:21,985][INFO ][o.o.p.PluginsService     ] [harmony-search-cluster-master-0] PluginService:onIndexModule index:[openedx-01-yts1-courseware_content/dLsvVeyATfWRIM1By-OibA]

Could you show exactly how do you do the reindexing?

@MoisesGSalas
Copy link
Contributor

@cmltaWt0 you are right, is weird that it isn't adding the prefix. I tried both, hitting the reindex button in studio and running ./manage.py cms reindex_course <course-id> with out any additional flags and it throws the same error. I double checked the image and the edx-search code and it's indeed using the version with the changes.

@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented Jul 1, 2023

@MoisesGSalas

Could you please check whether you image uses open-craft original version or edx-search==3.5.0?
They use different settings for some reason so please ensure you are using edx-search==3.5.0 and you django setting called ELASTIC_SEARCH_INDEX_PREFIX .

I changed the setting name here:

- ELASTICSEARCH_INDEX_PREFIX = "{{ELASTICSEARCH_INDEX_PREFIX}}"
+ ELASTIC_SEARCH_INDEX_PREFIX = "{{HARMONY_SEARCH_INDEX_PREFIX}}"

This is due the original work (open-craft/edx-search.git@keith/prefixed-index-names) uses ELASTICSEARCH_INDEX_PREFIX so our early work with shared elasticsearch was using the same naming.

@cmltaWt0
Copy link
Contributor Author

cmltaWt0 commented Jul 1, 2023

@MoisesGSalas

I've updated appVersion, removed any docker image patching and re-tested everything from scratch - recreated cluster and installed Palm release from a scratch too.

I'm able to

  1. Create a new course and reindex it

image

  1. Create and reindex content library

image

[2023-07-01T19:15:23,716][INFO ][o.o.p.PluginsService     ] [harmony-search-cluster-master-2] PluginService:onIndexModule index:[openedx-02-gz4m-library_index/xyBpxOiAQ0GwQqFN1NkGNQ]
[2023-07-01T19:15:23,720][INFO ][o.o.c.m.MetadataMappingService] [harmony-search-cluster-master-2] [openedx-02-gz4m-library_index/xyBpxOiAQ0GwQqFN1NkGNQ] update_mapping [_doc]


I'm still unable to get importdemocourse task to work due to edx-platform bug, however it actually imports the course but can't index it:

File "/openedx/edx-platform/cms/djangoapps/contentstore/management/commands/reindex_course.py", line 81, in handle
    index_exists = searcher._es.indices.exists(index=index_name)  # pylint: disable=protected-access
    ....
elasticsearch.exceptions.AuthorizationException: AuthorizationException(403, '')

IMPORTANT: I've rebased the branch on top of latest main (project structure has been changed).

@MoisesGSalas
Copy link
Contributor

@cmltaWt0, you are 100% right. I didn't even noticed the name changed. I tested again using a Palm image and everything seems to be working.

I think we can go ahead with what we have here.

@itsjeyd itsjeyd removed the waiting for eng review PR is ready for review. Review and merge it, or suggest changes. label Jul 11, 2023
Changes:
- set the same cluster and service name
- unify tutor plugin configuration variables
@cmltaWt0
Copy link
Contributor Author

@felipemontoya

I've re-tested ElasticSearch deployment and pushed one missed setting for the service name (I tested opensearch but forgot to test elastic for any regression).

Retested again for ElasticSearch and for OpenSearch - everything is great.

I'm convinced we can merge it now.

@felipemontoya
Copy link
Member

Thanks @cmltaWt0, this is wonderful.
I'll merge then.

@felipemontoya felipemontoya merged commit 1289b3a into openedx:main Jul 12, 2023
1 check passed
@openedx-webhooks
Copy link

@cmltaWt0 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@cmltaWt0
Copy link
Contributor Author

Thanks @felipemontoya 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants