wip use our internal llm + switch from json to markdown

simplify LLM's job. Do not request Json output with a single key. Instead, make sure LLM don't output any extra information. By simplifying LLM's job, we're making sure its output can be parsed. I did a quick test with the Translate prompt. Adding instructions to output only translated text seems enough after a bunch of tests. I did a small prompt engineering, using ChatGPT and Claude to generate a proper system prompt … it works quite okay BUT there is room for improvement for sure. I'ven't searched yet OS prompts we could find in a prompt library. Perfect translation job seems to be a difficult job for a 8B model. Please note I haven't updated yet the other prompts, let's discuss it before. I ran my experiment with our internal LLM which is optimized for throughput, and not latency (there is a trade-off). I'll try fine tune few of its parameters to see if I can reduce its latency. For 880 tokens (based on chatgpt tokens counter online). It takes roughly 17s, vs ~40s for Albert CNRS 70B. For 180 tokens it takes roughly 3s. Without a proper UX (eg. a nicer loading animation, streaming tokens) it feels a decade. However, asking Chatgpt the same job take the same amount, from submitting the request to the last token being generated.
suitenumerique · Dec 15, 2024 · 8b60bc5 · 8b60bc5
1 parent 65fdf11
commit 8b60bc5
Show file tree

Hide file tree

Showing 2 changed files with 27 additions and 23 deletions.
diff --git a/src/backend/core/services/ai_services.py b/src/backend/core/services/ai_services.py
@@ -35,10 +35,29 @@
     ),
 }
 
+
 AI_TRANSLATE = (
-    "Translate the markdown text to {language:s}, preserving markdown formatting. "
-    'Return JSON: {{"answer": "your translated markdown text in {language:s}"}}. '
-    "Do not provide any other information."
+    """  
+    You are a professional translator for `{language:s}`.  
+
+    ### Guidelines:  
+    1. **Preserve exactly as-is:**  
+       - All formatting, markdown, symbols, tags  
+       - Names, numbers, URLs, citations  
+       - Code blocks and technical terms  
+    
+    2. **Translation Rules:**  
+       - Use natural expressions in the target language  
+       - Match the tone of the source text (default: professional)  
+       - Maintain original meaning precisely  
+       - Adapt idioms to suit the target culture  
+       - Ensure grammatical correctness stylistic coherence
+    
+    3. **Do Not:**  
+       - Add, remove, or explain any content  
+    
+    Output only the translated text, keeping all original formatting intact.
+    """
 )
 
 
@@ -59,32 +78,14 @@ def call_ai_api(self, system_content, text):
         """Helper method to call the OpenAI API and process the response."""
         response = self.client.chat.completions.create(
             model=settings.AI_MODEL,
-            response_format={"type": "json_object"},
             messages=[
                 {"role": "system", "content": system_content},
-                {"role": "user", "content": json.dumps({"markdown_input": text})},
+                {"role": "user", "content": text},
             ],
         )
 
         content = response.choices[0].message.content
-
-        try:
-            sanitized_content = re.sub(r'\s*"answer"\s*:\s*', '"answer": ', content)
-            sanitized_content = re.sub(r"\s*\}", "}", sanitized_content)
-            sanitized_content = re.sub(r"(?<!\\)\n", "\\\\n", sanitized_content)
-            sanitized_content = re.sub(r"(?<!\\)\t", "\\\\t", sanitized_content)
-
-            json_response = json.loads(sanitized_content)
-        except (json.JSONDecodeError, IndexError):
-            try:
-                json_response = json.loads(content)
-            except json.JSONDecodeError as err:
-                raise RuntimeError("AI response is not valid JSON", content) from err
-
-        if "answer" not in json_response:
-            raise RuntimeError("AI response does not contain an answer")
-
-        return json_response
+        return {"answer": content}
 
     def transform(self, text, action):
         """Transform text based on specified action."""

diff --git a/src/helm/env.d/dev/values.impress.yaml.gotmpl b/src/helm/env.d/dev/values.impress.yaml.gotmpl
@@ -50,6 +50,9 @@ backend:
     AWS_S3_SECRET_ACCESS_KEY: password
     AWS_STORAGE_BUCKET_NAME: impress-media-storage
     STORAGES_STATICFILES_BACKEND: django.contrib.staticfiles.storage.StaticFilesStorage
+    AI_API_KEY: **ask antoine**
+    AI_BASE_URL: https://albertine.beta.numerique.gouv.fr/v1/
+    AI_MODEL: meta-llama/Llama-3.1-8B-Instruct
 
   migrate:
     command: