Stampy chat module #325

mruwnik · 2023-10-31T14:24:44Z

A bunch of small refactors + little bug fixes, and a new module that uses the https://chat.stampy.ai chat bot

mruwnik · 2023-10-31T14:27:09Z

modules/stampy_chat.py

+        query = message.content
+        nlp = top_nlp_search(query)
+        if nlp.get('score', 0) > STAMPY_ANSWER_MIN_SCORE and nlp.get('status') == 'Live on site':
+            return Response(confidence=5, text=f'Check out {nlp.get("url")} ({nlp.get("title")})')


this will also get picked up by the semanticsearch module, which has a lower threshold (0.5) and higher confidence (8)

mruwnik · 2023-10-31T14:27:52Z

modules/stampy_chat.py

+        if nlp.get('score', 0) > STAMPY_ANSWER_MIN_SCORE and nlp.get('status') == 'Live on site':
+            return Response(confidence=5, text=f'Check out {nlp.get("url")} ({nlp.get("title")})')
+        if nlp.get('score', 0) > STAMPY_CHAT_MIN_SCORE:
+            return Response(confidence=6, callback=self.query, args=[query, history, message])


is this the right confidence?

mruwnik · 2023-10-31T14:29:18Z

modules/stampy_chat.py

+NLP_SEARCH_ENDPOINT = "https://stampy-nlp-t6p37v2uia-uw.a.run.app/"
+
+STAMPY_ANSWER_MIN_SCORE = 0.75
+STAMPY_CHAT_MIN_SCORE = 0.4


NLP must return something with at least this score for the module to do anything. Should filter out most messages, but might need to be fiddled with a bit. Or maybe it would be worth there also being an explicit way of triggering this module with a specific phrase or something?

I suspect the STAMPY_CHAT_MIN_SCORE = 0.4 can actually be an even lower value (0.2?) but you'll want to experiment.

mruwnik · 2023-10-31T14:29:37Z

modules/stampy_chat.py

+utils = Utilities.get_instance()
+
+
+LOG_MAX_MESSAGES = 15  # don't store more than X messages back


these are Discord messages, counted per channel

mruwnik · 2023-10-31T14:30:47Z

modules/stampy_chat.py

+    yield chunk
+
+
+def filter_citations(text, citations):


the chatbot returns a whole bunch of potential citations, but not all of them will be referenced in the text. This will remove the unused ones

mruwnik · 2023-10-31T14:32:32Z

modules/stampy_chat.py

+    return items[0]
+
+
+def chunk_text(text: str, chunk_limit=2000, delimiter='.'):


Discord has a limit of 2000 characters per message (or at least other places in the code claim this), so this function will split the LLM's answer into smaller chunks, splitting them on full stops in order to make sure that sentences don't get chopped up. Though maybe newlines would be better, so it doesn't split on decimal points?

mruwnik · 2023-10-31T14:37:14Z

modules/stampy_chat.py

+STAMPY_CHAT_MIN_SCORE = 0.4
+
+
+def stream_lines(stream: Iterable):


The chatbot server returns messages as server-sent events, so these 2 functions basically transform a requests stream into a generator of js objects ready for using

ccstan99

The default for NLP search is to only return live on site questions. But if we're using it as a proxy to identify whether a question is alignment related, might want to add the showLive=0 which will use the larger set of unpublished questions. Just make sure to check it's published before giving the link to the user.

ccstan99 · 2023-10-31T19:35:43Z

modules/stampy_chat.py

+NLP_SEARCH_ENDPOINT = "https://stampy-nlp-t6p37v2uia-uw.a.run.app/"
+
+STAMPY_ANSWER_MIN_SCORE = 0.75
+STAMPY_CHAT_MIN_SCORE = 0.4


I suspect the STAMPY_CHAT_MIN_SCORE = 0.4 can actually be an even lower value (0.2?) but you'll want to experiment.

ccstan99 · 2023-10-31T19:36:04Z

modules/stampy_chat.py

+DATA_HEADER = 'data: '
+
+STAMPY_CHAT_ENDPOINT = "https://chat.stampy.ai:8443/chat"
+NLP_SEARCH_ENDPOINT = "https://stampy-nlp-t6p37v2uia-uw.a.run.app/"


NLP_SEARCH_ENDPOINT = "https://stampy-nlp-t6p37v2uia-uw.a.run.app/"
Should we set it to "https://nlp.stampy.ai/" instead? If we add an additional service in the Europe region and add a load balancer later, it won't be tied to this specific service which is in us-west1 region.

good catch - updated

ProducerMatt · 2023-10-31T22:59:10Z

If the new chat and ChatGPT are both enabled, what kind of message will get picked up by the new one?

ProducerMatt · 2023-10-31T23:17:33Z

I'm trying to submit changes but for some reason my local copy isn't matching what's on GitHub 🤔 I'm probably doing something wrong

ProducerMatt · 2023-10-31T23:29:21Z

Btw you've done a very good job compressing my code, thank you :)

mruwnik · 2023-11-01T08:25:41Z

modules/stampy_chat.py

+
+
+def top_nlp_search(query: str) -> Dict[str, Any]:
+    resp = requests.get(NLP_SEARCH_ENDPOINT + '/api/search', params={'query': query, 'status': 'all'})


@ccstan99 this 'status': 'all' does the same as showLive=0

mruwnik · 2023-11-01T08:33:46Z

if both are enabled. whichever one claims to have a higher confidece will be chosen. If they both have the same confidence, then I believe it's undefined which will be chosen. In practice this means that the new chat will win, as it has higher confidence.
The way this works is that all enabled modules get to answer all messages - they return a Response(text, confidence) instance, where text is either something to display or a callback. Then all the messages are sorted by confidence, and whichever has the highest confidence is picked. If it has a text message, then it returns that then and there, but if it's a callback it will execute it and add the result to the list of messages. This happens in a loop. So generally the callback will return something with higher confidence if it's a valid answer, as that will then certainly be returned (if the previously picked message had confidence n and the new message n+m, then the new message is guaranteed to have the higest confidecce). And if the callback couldn't come up with anything decent, it can return a response with a very low confidence - this will result in a different module having a chance to reply

mruwnik · 2023-11-04T14:50:33Z

ping

ProducerMatt · 2023-11-04T16:28:45Z

@mruwnik seen this? #325 (comment)

mruwnik · 2023-11-04T16:33:40Z

it looks like something went wrong with that comment. Could you repost it, or plop it in discord?

ProducerMatt · 2023-11-04T17:09:38Z

@mruwnik weird. Config.py line 229, you added a default None for the private channel, and i was arguing it should be a required parameter, else the only choice is to silently suppress error logging.

mruwnik · 2023-11-04T17:27:47Z

ah, that. Could you update the README to explain how to set it? I set the default, as otherwise I couldn't get it to even start, and didn't know what to put there

mruwnik · 2023-11-09T12:14:24Z

ping

ProducerMatt · 2023-11-09T17:06:58Z

@mruwnik Sorry. Added the note in README and made it required again

ProducerMatt · 2023-11-09T18:15:32Z

I think it's ready for merge?

ProducerMatt

Seems ok from my limited view

ProducerMatt · 2023-10-31T22:56:30Z

config.py

@@ -222,7 +226,7 @@ def getenv_unique_set(var_name: str, default: T = frozenset()) -> Union[frozense
    bot_dev_roles = getenv_unique_set("BOT_DEV_ROLES", frozenset())
    bot_dev_ids = getenv_unique_set("BOT_DEV_IDS", frozenset())
    bot_control_channel_ids = getenv_unique_set("BOT_CONTROL_CHANNEL_IDS", frozenset())
-    bot_private_channel_id = getenv("BOT_PRIVATE_CHANNEL_ID")
+    bot_private_channel_id = getenv("BOT_PRIVATE_CHANNEL_ID", None)


I think this should be required (no default). Else the bot can't communicate in private with admins.

ProducerMatt · 2023-10-31T23:06:01Z

stam.py

@@ -44,12 +44,11 @@ def get_stampy_modules() -> dict[str, Module]:
    loaded_module_filenames = set()

    # filenames of modules that were skipped because not enabled
-    skipped_module_filenames = set(ALL_STAMPY_MODULES - enabled_modules)


I'm not a Python expert so correct me if I'm wrong. I think you should use FrozenSet anywhere a set won't be modified, because it'll be faster?

meh. Depends a lot on your use case. Either way, it'll be negligible here, with network issues taking most of the time

mruwnik force-pushed the stampy_chat_module branch from f2f66c5 to bec8544 Compare October 31, 2023 14:40

mruwnik commented Oct 31, 2023

View reviewed changes

mruwnik requested review from ProducerMatt, tayler6000 and Aprillion and removed request for ProducerMatt October 31, 2023 14:40

ccstan99 reviewed Oct 31, 2023

View reviewed changes

mruwnik commented Nov 1, 2023

View reviewed changes

Aprillion removed their request for review November 1, 2023 09:15

mruwnik requested review from Aprillion and removed request for Aprillion November 1, 2023 11:02

mruwnik added 3 commits November 1, 2023 22:42

various fixes and refactors

94155ee

Add a module that calls chat.aisafety.info

983f2b3

update NLP endpoint

170e261

mruwnik force-pushed the stampy_chat_module branch from 0532fb8 to 170e261 Compare November 1, 2023 21:42

limit to stampy addressed messages

3626f40

ProducerMatt added 2 commits November 9, 2023 11:04

README: note requirement of BOT_PRIVATE_CHANNEL_ID

814c128

config: make BOT_PRIVATE_CHANNEL_ID required again

a26d2fa

ProducerMatt approved these changes Nov 9, 2023

View reviewed changes

mruwnik merged commit 84cfd1c into master Nov 12, 2023
2 checks passed

mruwnik deleted the stampy_chat_module branch November 12, 2023 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stampy chat module #325

Stampy chat module #325

mruwnik commented Oct 31, 2023

mruwnik Oct 31, 2023

mruwnik Oct 31, 2023

mruwnik Oct 31, 2023

ccstan99 Oct 31, 2023

mruwnik Oct 31, 2023

mruwnik Oct 31, 2023

mruwnik Oct 31, 2023

mruwnik Oct 31, 2023

ccstan99 left a comment

ccstan99 Oct 31, 2023

ccstan99 Oct 31, 2023

mruwnik Nov 1, 2023

ProducerMatt commented Oct 31, 2023 •

edited

Loading

ProducerMatt commented Oct 31, 2023

ProducerMatt commented Oct 31, 2023 •

edited

Loading

mruwnik Nov 1, 2023 •

edited

Loading

mruwnik commented Nov 1, 2023

mruwnik commented Nov 4, 2023

ProducerMatt commented Nov 4, 2023

mruwnik commented Nov 4, 2023

ProducerMatt commented Nov 4, 2023

mruwnik commented Nov 4, 2023

mruwnik commented Nov 9, 2023

ProducerMatt commented Nov 9, 2023

ProducerMatt commented Nov 9, 2023

ProducerMatt left a comment

ProducerMatt Oct 31, 2023

ProducerMatt Oct 31, 2023

mruwnik Nov 9, 2023

		utils = Utilities.get_instance()


		LOG_MAX_MESSAGES = 15 # don't store more than X messages back

		return items[0]


		def chunk_text(text: str, chunk_limit=2000, delimiter='.'):

		STAMPY_CHAT_MIN_SCORE = 0.4


		def stream_lines(stream: Iterable):



		def top_nlp_search(query: str) -> Dict[str, Any]:
		resp = requests.get(NLP_SEARCH_ENDPOINT + '/api/search', params={'query': query, 'status': 'all'})

Stampy chat module #325

Stampy chat module #325

Conversation

mruwnik commented Oct 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ccstan99 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ProducerMatt commented Oct 31, 2023 • edited Loading

ProducerMatt commented Oct 31, 2023

ProducerMatt commented Oct 31, 2023 • edited Loading

mruwnik Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

mruwnik commented Nov 1, 2023

mruwnik commented Nov 4, 2023

ProducerMatt commented Nov 4, 2023

mruwnik commented Nov 4, 2023

ProducerMatt commented Nov 4, 2023

mruwnik commented Nov 4, 2023

mruwnik commented Nov 9, 2023

ProducerMatt commented Nov 9, 2023

ProducerMatt commented Nov 9, 2023

ProducerMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ProducerMatt commented Oct 31, 2023 •

edited

Loading

ProducerMatt commented Oct 31, 2023 •

edited

Loading

mruwnik Nov 1, 2023 •

edited

Loading