-
Notifications
You must be signed in to change notification settings - Fork 60.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support \(...\) and \[...\] style math formula #4186
Conversation
@MrrDrr is attempting to deploy a commit to the NextChat Team on Vercel. A member of the Team first needs to authorize it. |
Your build has completed! |
that nothing help/fix related to this issue: |
that nothing help/fix related to this issue: |
d083b53
to
524c9be
Compare
Hi The issues you mentioned were caused by the dollar sign followed immediately by numbers, which triggered the This PR will not introduce additional bugs. The content between \(...\) and \[...\] can be confidently identified as formulas because if the content between the brackets is not a formula, openai would not prepend "\" to the brackets, thus not triggering the replacement operation of this PR. This PR also skips over content in code blocks to avoid mistakenly replacing brackets. Here are the results of my attempts to replicate the issues you mentioned, and everything is fine. openai now occasionally uses plain text to represent formulas, without involving the rendering of LaTeX formulas. |
it's so fucking bad if openai literally use |
for example how hard/difficult to handling that stupid pattern is |
Note I am not going to cherry-pick into my forks this because it seems unreasonable in this scenario. Trying to manage such a flawed stupid pattern could possibly lead to bugs or degrade performance. |
haha, markdown and openai make life very difficult. However there doesn't seem to be any performance bottleneck as I have tested it on both computer and mobile phone without experiencing additional latency. The previous issue also did not occur, so this might already be the most acceptable solution. |
it literally model issue and they don't fucking know how their model it's so fucking bad can't handle LaTeX markdown |
It is interesting that the code for markdown.tsx also seems to have a rendering issue with markdowns. import ReactMarkdown from "react-markdown";
import "katex/dist/katex.min.css";
import RemarkMath from "remark-math";
import RemarkBreaks from "remark-breaks";
import RehypeKatex from "rehype-katex";
import RemarkGfm from "remark-gfm";
import RehypeHighlight from "rehype-highlight";
import { useRef, useState, RefObject, useEffect, useMemo } from "react";
import { copyToClipboard } from "../utils";
import mermaid from "mermaid";
import LoadingIcon from "../icons/three-dots.svg";
import React from "react";
import { useDebouncedCallback } from "use-debounce";
import { showImageModal } from "./ui-lib";
export function Mermaid(props: { code: string }) {
const ref = useRef<HTMLDivElement>(null);
const [hasError, setHasError] = useState(false);
useEffect(() => {
if (props.code && ref.current) {
mermaid
.run({
nodes: [ref.current],
suppressErrors: true,
})
.catch((e) => {
setHasError(true);
console.error("[Mermaid] ", e.message);
});
}
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [props.code]);
function viewSvgInNewWindow() {
const svg = ref.current?.querySelector("svg");
if (!svg) return;
const text = new XMLSerializer().serializeToString(svg);
const blob = new Blob([text], { type: "image/svg+xml" });
showImageModal(URL.createObjectURL(blob));
}
if (hasError) {
return null;
}
return (
<div
className="no-dark mermaid"
style={{
cursor: "pointer",
overflow: "auto",
}}
ref={ref}
onClick={() => viewSvgInNewWindow()}
>
{props.code}
</div>
);
}
export function PreCode(props: { children: any }) {
const ref = useRef<HTMLPreElement>(null);
const refText = ref.current?.innerText;
const [mermaidCode, setMermaidCode] = useState("");
const renderMermaid = useDebouncedCallback(() => {
if (!ref.current) return;
const mermaidDom = ref.current.querySelector("code.language-mermaid");
if (mermaidDom) {
setMermaidCode((mermaidDom as HTMLElement).innerText);
}
}, 600);
useEffect(() => {
setTimeout(renderMermaid, 1);
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [refText]);
return (
<>
{mermaidCode.length > 0 && (
<Mermaid code={mermaidCode} key={mermaidCode} />
)}
<pre ref={ref}>
<span
className="copy-code-button"
onClick={() => {
if (ref.current) {
const code = ref.current.innerText;
copyToClipboard(code);
}
}}
></span>
{props.children}
</pre>
</>
);
}
function escapeDollarNumber(text: string) {
let escapedText = "";
for (let i = 0; i < text.length; i += 1) {
let char = text[i];
const nextChar = text[i + 1] || " ";
if (char === "$" && nextChar >= "0" && nextChar <= "9") {
char = "\\$";
}
escapedText += char;
}
return escapedText;
}
function escapeBrackets(text: string) {
const pattern =
/(```[\s\S]*?```|`.*?`)|\\\[([\s\S]*?[^\\])\\\]|\\\((.*?)\\\)/g;
return text.replace(
pattern,
(match, codeBlock, squareBracket, roundBracket) => {
if (codeBlock) {
return codeBlock;
} else if (squareBracket) {
return `$${squareBracket}$`;
} else if (roundBracket) {
return `${roundBracket};
}
return match;
},
);
}
function _MarkDownContent(props: { content: string }) {
const escapedContent = useMemo(
() => escapeBrackets(escapeDollarNumber(props.content)),
[props.content],
);
return (
<ReactMarkdown
remarkPlugins={[RemarkMath, RemarkGfm, RemarkBreaks]}
rehypePlugins={[
RehypeKatex,
[
RehypeHighlight,
{
detect: false,
ignoreMissing: true,
},
],
]}
components={{
pre: PreCode,
p: (pProps) => <p {...pProps} dir="auto" />,
a: (aProps) => {
const href = aProps.href || "";
const isInternal = /^\/#/i.test(href);
const target = isInternal ? "_self" : aProps.target ?? "_blank";
return <a {...aProps} target={target} />;
},
}}
>
{escapedContent}
</ReactMarkdown>
);
}
export const MarkdownContent = React.memo(_MarkDownContent);
export function Markdown(
props: {
content: string;
loading?: boolean;
fontSize?: number;
parentRef?: RefObject<HTMLDivElement>;
defaultShow?: boolean;
} & React.DOMAttributes<HTMLDivElement>,
) {
const mdRef = useRef<HTMLDivElement>(null);
return (
<div
className="markdown-body"
style={{
fontSize: `${props.fontSize ?? 14}px`,
}}
ref={mdRef}
onContextMenu={props.onContextMenu}
onDoubleClickCapture={props.onDoubleClickCapture}
dir="auto"
>
{props.loading ? (
<LoadingIcon />
) : (
<MarkdownContent content={props.content} />
)}
</div>
);
} |
#4230 |
哈喽 我的pr和"美元符号+数字"这种模式没有关系,我解决的是\(...\)和 \[...\]样式的公式渲染问题。 如果gpt将来都用这种样式来渲染公式,那么问题其实会少很多,因为斜杠+括号这种符号很少见,其他场景几乎都不会出现这个,几乎不会冲突。但是我还没找到有哪个公式渲染器支持这种样式的。openai官方用的也是 我这个pr我自己已经用了一段时间了,还没有发现问题。 |
hello My PR has nothing to do with the "dollar sign + number" mode. What I solve is the problem of formula rendering in the \(...\) and \[...\] styles. If gpt uses this style to render formulas in the future, then there will actually be a lot less problems, because symbols like slash + brackets are rare and will hardly appear in other scenarios, and there will be almost no conflict. But I haven't found any formula renderer that supports this style. Openai officially uses I have been using this PR myself for a while and haven't found any problems yet. |
直接放弃用$这也是一种解决方案,毕竟用$确实问题很大,对于英语区用户和码农都更友好些,但放弃现在的书写习惯,恐怕也有其难度。 搞这$的兼容险些把我脑浆搞烧。 |
Simply giving up using $ is also a solution. After all, using $ is indeed a big problem. It is more friendly to English-speaking areas and coders, but I am afraid it will be difficult to give up the current writing habit. |
function _MarkDownContent(props: { content: string }) { | ||
const escapedContent = useMemo( | ||
() => escapeDollarNumber(props.content), | ||
() => escapeBrackets(escapeDollarNumber(props.content)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intention of the escapeDollarNumber function was to resolve latex syntax conflicts, but unfortunately, it failed to resolve the problem, so the function can be directly modified, rather than wrapped on top of it
@@ -116,9 +116,27 @@ function escapeDollarNumber(text: string) { | |||
return escapedText; | |||
} | |||
|
|||
function escapeBrackets(text: string) { | |||
const pattern = | |||
/(```[\s\S]*?```|`.*?`)|\\\[([\s\S]*?[^\\])\\\]|\\\((.*?)\\\)/g; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the execution performance of the the block of code is a little bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the execution performance of the the block of code is a little bad
It's no wonder that regular expressions (regex) often have poor performance, especially for interpreting languages, unlike regex in compiled languages (e.g., regex in Golang, which performs better as it is compiled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was also worried about the performance, but later I found that this worry was unnecessary.
I used the following code to log the time consumed for each call, and the results showed that the function is fast enough.
function escapeBrackets(text: string) {
let begin_time = performance.now();
const pattern =
/(```[\s\S]*?```|`.*?`)|\\\[([\s\S]*?[^\\])\\\]|\\\((.*?)\\\)/g;
let res = text.replace(
pattern,
(match, codeBlock, squareBracket, roundBracket) => {
if (codeBlock) {
return codeBlock;
} else if (squareBracket) {
return `$$${squareBracket}$$`;
} else if (roundBracket) {
return `$${roundBracket}$`;
}
return match;
},
);
let endTime = performance.now();
console.log(`escapeBrackets, string length=${text.length}, time consumed=${endTime - begin_time} ms`);
return res;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modern js engines compile and cache regexp at load time, so this function will not recompile it every time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was also worried about the performance, but later I found that this worry was unnecessary. I used the following code to log the time consumed for each call, and the results showed that the function is fast enough.
function escapeBrackets(text: string) { let begin_time = performance.now(); const pattern = /(```[\s\S]*?```|`.*?`)|\\\[([\s\S]*?[^\\])\\\]|\\\((.*?)\\\)/g; let res = text.replace( pattern, (match, codeBlock, squareBracket, roundBracket) => { if (codeBlock) { return codeBlock; } else if (squareBracket) { return `$$${squareBracket}$$`; } else if (roundBracket) { return `$${roundBracket}$`; } return match; }, ); let endTime = performance.now(); console.log(`escapeBrackets, string length=${text.length}, time consumed=${endTime - begin_time} ms`); return res; }
it because got handle by react useMemo
that's why it looks faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modern js engines compile and cache regexp at load time, so this function will not recompile it every time
more likely depends of framework web front-end (e.g, react)
support \(...\) and \[...\] style math formula
解决gpt回复的公式使用 \(...\) 和 \[...\] 格式导致的渲染问题 #3436 (comment)
remark-math 里有人讨论了这个问题,不过结论是不会支持这种格式 remarkjs/remark-math#39
曾尝试用rehype-mathjax来代替rehype-katex,不过也没有正确渲染
最后参考了这里的实现,把\(...\) 和 \[...\]格式替换为美元符号:
danny-avila/LibreChat#1585
https://github.com/danny-avila/LibreChat/blob/v0.6.10/client/src/utils/latex.ts#L36
现在的效果:
![均方误差的公式](https://private-user-images.githubusercontent.com/28617777/309315231-37787e82-4acb-46bc-83d7-fcf9dc53328c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4MzgyNDIsIm5iZiI6MTczODgzNzk0MiwicGF0aCI6Ii8yODYxNzc3Ny8zMDkzMTUyMzEtMzc3ODdlODItNGFjYi00NmJjLTgzZDctZmNmOWRjNTMzMjhjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA2VDEwMzIyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE1ZDVkODk2ZGI2MjE5YjA2ZjdlOGRiYzZkMDg0NTFlZmUxZjg2ZTA2NDBjZDk2MGUxOTZlMGZmNGFiZGNlNTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.e__yPKiBP0O6-Ayw9HjbJrb0M5kEGeetiQbpoPqXtyU)
此前的效果:
![模型评估](https://private-user-images.githubusercontent.com/28617777/309315749-745bb7b3-238b-4de4-83fd-2958bcd0ccfa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4MzgyNDIsIm5iZiI6MTczODgzNzk0MiwicGF0aCI6Ii8yODYxNzc3Ny8zMDkzMTU3NDktNzQ1YmI3YjMtMjM4Yi00ZGU0LTgzZmQtMjk1OGJjZDBjY2ZhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA2VDEwMzIyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM3YjU0Y2JjNjViN2RlNzYzMGI4ZjE0NzgzMTdjNDlhYTkxOWMxMTgxYTVjYjJmNTI4YTU3YWYzNGM5NGFjNjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.n3bmRoodxpqamkV5x_1HEhfuDBq92G27HU0eoDqhdsc)