Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citations not being rendered #2352

Open
moelk-atea opened this issue Feb 11, 2025 · 2 comments
Open

Citations not being rendered #2352

moelk-atea opened this issue Feb 11, 2025 · 2 comments

Comments

@moelk-atea
Copy link

moelk-atea commented Feb 11, 2025

Hi!

Why is my citations not being rendered properly? I'm only getting the citationname in brackets showing in the answer instead of a clickable citation index number which triggers the documentviewer when clicked on.

Example:

bla bla bla .... [citationName1.docx] bla bla bla [citationName2.pdf] (see image

Image

)

This is my deployment setup:

azd env set DEPLOYMENT_TARGET appservice
azd env set AZURE_USE_AUTHENTICATION true
azd env set AZURE_ENABLE_GLOBAL_DOCUMENT_ACCESS true
azd env set AZURE_AUTH_TENANT_ID <THE TENANT-ID>
azd env set USE_USER_UPLOAD true

azd up

python scripts/adlsgen2setup.py 'data' --data-access-control './scripts/acls.json' -v 
python ./scripts/manageacl.py  -v --acl-action enable_acls
python ./scripts/manageacl.py  -v --acl-action update_storage_urls --url https://<NAME_OF_MAIN_STORAGE_ACCOUNT>.blob.core.windows.net/content/

I have seen that the citationPath is fetched from the following path:
BACKENDUri/content/citationName

What am I missing or doing wrong here?

Note: I have tried with documents that I have uploaded through user uploads with and without group acls and I have tried documents uploaded to the content container in main storage account. Both are providing the citation name in brackets. This happens when I chat through the azure hosted appservice. I have also tried with a new deployment setup where I only run azd env set DEPLOYMENT_TARGET appservice then azd up and still i'm getting the same issue.

@moelk-atea moelk-atea changed the title Citations for user uploaded documents not working (using acls) Citations not being rendered (using acls) Feb 11, 2025
@moelk-atea moelk-atea changed the title Citations not being rendered (using acls) Citations not being rendered Feb 12, 2025
@pamelafox
Copy link
Collaborator

If you are seeing it rendered in plain text in the answer, then something has gone wrong with the frontend answer parsing code, here:

const parts = parsedAnswer.split(/\[([^\]]+)\]/g);

You could either put a "debugger" statement in there or put console.log() statements to see where it goes wrong. It should first identify the citation as a fragment, then return true that it's a valid citation, then return it as a clickable anchor link.
If it's not doing that, I'm guessing that either it doesn't match the regular expression (which I can't verify myself, given it's blanked out in the screenshot) or that it doesn't match the filename found in the associated data points, perhaps due to a hallucination.

Please report back with what you find.

@moelk-atea
Copy link
Author

moelk-atea commented Feb 13, 2025

Sorry for the redacted content!

But I've looked into the isCitationValid function in AnswerParser and it seems to return false but looking at the console.log I don't understand why it is not returning true, see below:

function isCitationValid(contextDataPoints: any, citationCandidate: string): boolean {
    const regex = /.+\.\w{1,}(?:#\S*)?$/;
    if (!regex.test(citationCandidate)) {
        return false;
    }

    // Check if contextDataPoints is an object with a text property that is an array
    let dataPointsArray: string[];
    if (Array.isArray(contextDataPoints)) {
        dataPointsArray = contextDataPoints;
    } else if (contextDataPoints && Array.isArray(contextDataPoints.text)) {
        dataPointsArray = contextDataPoints.text;
    } else {
        return false;
    }

    const isValidCitation = dataPointsArray.some(dataPoint => {
        console.log("This is datapoint: ", dataPoint);
        console.log("This is citation candidate: ", citationCandidate);
        console.log("Startswith: ", dataPoint.startsWith(citationCandidate));
        return dataPoint.startsWith(citationCandidate);
    });
    return isValidCitation;
}
This is citation candidate:  Policy_Lön.pdf
Startswith:  false
This is datapoint:  Policy_Lön.pdf#page=1:  .........

EDIT:

I have managed to solve the citation rendering now doing the following:

function isCitationValid(contextDataPoints: any, citationCandidate: string): boolean {
    const regex = /.+\.\w{1,}(?:#\S*)?$/;
    if (!regex.test(citationCandidate)) {
        return false;
    }

    // Check if contextDataPoints is an object with a text property that is an array
    let dataPointsArray: string[];
    if (Array.isArray(contextDataPoints)) {
        dataPointsArray = contextDataPoints;
    } else if (contextDataPoints && Array.isArray(contextDataPoints.text)) {
        dataPointsArray = contextDataPoints.text;
    } else {
        return false;
    }

    const normalizedCitation = citationCandidate.normalize("NFC");

    const isValidCitation = dataPointsArray.some(dataPoint => {
        const normalizedDataPoint = dataPoint.normalize("NFC");

        // Check if the dataPoint starts with citationCandidate OR is an exact match
        return normalizedDataPoint.startsWith(normalizedCitation) || normalizedDataPoint.split("#")[0] === normalizedCitation;
    });

    console.log(isValidCitation);

    // const isValidCitation = dataPointsArray.some(dataPoint => {
    //     console.log("This is datapoint: ", dataPoint);
    //     console.log("This is citation candidate: ", citationCandidate);
    //     console.log("Startswith: ", dataPoint.startsWith(citationCandidate));
    //     return dataPoint.startsWith(citationCandidate);
    // });
    return isValidCitation;

but now there is another issue where I'm not allowed to access the document the error is being thrown in AnalysisPanel.tsx in the following fetch request:

const response = await fetch(activeCitation, {
                method: "GET",
                headers: await getHeaders(token)
            });

In the function below, I have ensured that the accesstoken is being retrieved correctly.

export const AnalysisPanel = ({ answer, activeTab, activeCitation, citationHeight, className, onActiveTabChanged }: Props) => {
    const isDisabledThoughtProcessTab: boolean = !answer.context.thoughts;
    const isDisabledSupportingContentTab: boolean = !answer.context.data_points;
    const isDisabledCitationTab: boolean = !activeCitation;
    const [citation, setCitation] = useState("");

    const client = useLogin ? useMsal().instance : undefined;
    const { t } = useTranslation();

    const fetchCitation = async () => {
        const token = client ? await getToken(client) : undefined;
        if (activeCitation) {
            // Get hash from the URL as it may contain #page=N
            // which helps browser PDF renderer jump to correct page N
            const originalHash = activeCitation.indexOf("#") ? activeCitation.split("#")[1] : "";
            const response = await fetch(activeCitation, {
                method: "GET",
                headers: await getHeaders(token)
            });
            const citationContent = await response.blob();
            let citationObjectUrl = URL.createObjectURL(citationContent);
            // Add hash back to the new blob URL
            if (originalHash) {
                citationObjectUrl += "#" + originalHash;
            }
            setCitation(citationObjectUrl);
        }
    };
    useEffect(() => {
        fetchCitation();
    }, []);

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants