Azure-Samples · sinedied · Mar 28, 2024 · Mar 27, 2024 · Mar 27, 2024 · Mar 27, 2024
diff --git a/docs/development/04-session.md b/docs/development/04-session.md
@@ -17,10 +17,10 @@ Open the `chat.ts` file and let's make some significant changes to this code.
 - `packages/packages/api/functions/chat.ts`:
 
 ```typescript
-import { HttpRequest, HttpResponseInit, InvocationContext } from "@azure/functions";
-import { badRequest, serviceUnavailable } from "../utils";
-import { AzureChatOpenAI, AzureOpenAIEmbeddings } from "@langchain/azure-openai";
-import { ChatPromptTemplate } from "@langchain/core/prompts";
+import { HttpRequest, HttpResponseInit, InvocationContext } from '@azure/functions';
+import { badRequest, serviceUnavailable } from '../utils';
+import { AzureChatOpenAI, AzureOpenAIEmbeddings } from '@langchain/azure-openai';
+import { ChatPromptTemplate } from '@langchain/core/prompts';
 
 import 'dotenv/config';
 
@@ -41,27 +41,21 @@ export async function testChat(request: HttpRequest, context: InvocationContext)
     const model = new AzureChatOpenAI();
 
     const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
-      [
-        "system",
-        "Answer the user's questions based on the below context:\n\n{context}",
-      ],
-      [
-        "human",
-        "{input}",
-      ]
+      ['system', "Answer the user's questions based on the below context:\n\n{context}"],
+      ['human', '{input}'],
     ]);
 
     return {
       status: 200,
-      body: "Testing chat function.",
+      body: 'Testing chat function.',
     };
   } catch (error: unknown) {
     const error_ = error as Error;
     context.error(`Error when processing request: ${error_.message}`);
 
     return serviceUnavailable(new Error('Service temporarily unavailable. Please try again later.'));
   }
-};
+}
 ```
 
 Let's understand what we did here:
@@ -112,18 +106,18 @@ Agora que já criamos um chat mais dinâmico, vamos implementar o `chain` para q
 
 Let's understand again in each line what we did:
 
-We created a `combineDocsChain` using the `createStuffDocumentsChain` function. 
+We created a `combineDocsChain` using the `createStuffDocumentsChain` function.
 
 This function is used to create a string that passes a list of documents to a template. There a few parameters in this function. There include `llm`, which is the language model we are using, and `prompt`, which is the conversation we are having with the model. We will use them to create the chain.
 
 Just as we did in the `upload` API, we will need to store the vectors in the database. To do this, we created a variable called `store` so that we can instantiate the `AzureCosmosDBVectorStore` class. This class is used to create a vector that can be used to store and retrieve vectors from the language model.
 
-We create the `chain` using the `createRetrievalChain` function. This function is used precisely to create a retrieval chain that will retrieve the documents and then pass them on to the chat. That's why this function has two parameters: 
+We create the `chain` using the `createRetrievalChain` function. This function is used precisely to create a retrieval chain that will retrieve the documents and then pass them on to the chat. That's why this function has two parameters:
 
 - `retriever`: which aims to return a list of documents.
-- `combineDocsChain`: which will reproduce a string output. 
+- `combineDocsChain`: which will reproduce a string output.
 
-Finally, we invoked the `chain` using the `invoke` method. This method is used to invoke the chain with the input question and get the response from the language model. 
+Finally, we invoked the `chain` using the `invoke` method. This method is used to invoke the chain with the input question and get the response from the language model.
 
 Wow! we have completed our `chat` API. Now, let's test our API together with the `upload` API.
 
@@ -157,18 +151,6 @@ You will see the exact response requested in the `chat` request. If you want to
 
 ![chat API](./images/chat-final-result.gif)
 
-Ainda não concluímos o nosso projeto. Ainda temos mais um item muito importante que não podemos esquecer de implementar num chat: `stream` response. Vamos aprender como fazer isso na próxima sessão.
+We haven't finished our project yet. We still have one more very important item that we mustn't forget to implement in a chat: `stream` response. We'll learn how to do this in the next session.
 
 ▶ **[Next Step: Generate `stream` response](./05-session.md)**
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/docs/development/05-session.md b/docs/development/05-session.md
@@ -1,3 +1,240 @@
 # Generate a stream response in the `chat` API
 
-**todo**
+In this session, we will learn how to generate a stream response in the `chat` API, using LangChain.js and including the new feature on stream available also for the v4 of the Azure Functions programming model.
+
+## What is streaming?
+
+Streaming is crucial for Large Language models (LLMs) for several reasons:
+
+- **It manages memory resources efficiently**: allowing models to process long texts without overloading memory.
+- **It improves scalability**: making it easier to process inputs of virtually unlimited size.
+- **Reduces latency in real-time interactions**: providing faster responses in virtual assistants and dialog systems.
+- **Facilitates training and inference** on large data sets, making the use of LLMs more practical and efficient.
+- **It can improve the quality of the text generated**: helping models to focus on smaller pieces of text for greater cohesion and contextual relevance.
+- **Supports distributed workflows**: allowing models to be scaled to meet intense processing demands.
+
+As such, certain large language models (LLMs) have the ability to send responses sequentially. This means that you don't need to wait for the full response to be received before you can start working with it. This feature is especially advantageous if you want to show the response to the user as it is produced or if you need to analyze and use the response while it is being formed.
+
+And LangChain.js supports the use of streaming. Making use of the `.stream()` method. If you want to know more about the `.stream()` method, you can access the **[official LangChain.js documentation](https://js.langchain.com/docs/use_cases/question_answering/streaming#chain-with-sources)**.
+
+## Support for HTTP Streams in Azure Functions
+
+The Azure Functions product team recently announced the availability of support for HTTP Streams in version 4 of the Azure Functions programming model. With this, it is now possible to return stream responses in HTTP APIs, which is especially useful for real-time data streaming scenarios.
+
+To find out more about streaming support in Azure Functions v4, you can visit Microsoft's Tech Community blog by clicking **[here](https://techcommunity.microsoft.com/t5/apps-on-azure-blog/azure-functions-support-for-http-streams-in-node-js-is-now-in/ba-p/4066575)**.
+
+## Enabling HTTP Streams support in Azure Functions
+
+Well, now that we've understood the importance of using streaming in a chat and how useful it can be, let's learn how we can introduce it into the `chat` API.
+
+The first thing we need to do is enable the new Azure Functions feature, which is streaming support. To do this, open the file `index.ts` and include the following code:
+
+- `index.ts`
+
+```typescript
+app.setup({ enableHttpStream: true });
+```
+
+So the `index.ts` file will look like this:
+
+- `index.ts`
+
+```typescript
+import { app } from '@azure/functions';
+import { chat } from './functions/chat';
+import { upload } from './functions/upload';
+
+app.setup({ enableHttpStream: true });
+app.post('chat', {
+  route: 'chat',
+  authLevel: 'anonymous',
+  handler: chat,
+});
+
+app.post('upload', {
+  route: 'upload',
+  authLevel: 'anonymous',
+  handler: upload,
+});
+```
+
+And that's it! Azure Functions is now enabled to support streaming.
+
+## Generating a stream response in the `chat` API
+
+Now, let's move on and create the logic to generate a stream response in the `chat` API.
+
+Open the `chat.ts` file and let's make some significant changes:
+
+- `chat.ts`
+
+```typescript
+import { IterableReadableStream } from '@langchain/core/dist/utils/stream';
+import { Document } from '@langchain/core/documents';
+import { HttpRequest, InvocationContext, HttpResponseInit } from '@azure/functions';
+import { AzureOpenAIEmbeddings, AzureChatOpenAI } from '@langchain/azure-openai';
+import { ChatPromptTemplate } from '@langchain/core/prompts';
+import { createStuffDocumentsChain } from 'langchain/chains/combine_documents';
+import { AzureCosmosDBVectorStore } from '@langchain/community/vectorstores/azure_cosmosdb';
+import { createRetrievalChain } from 'langchain/chains/retrieval';
+import 'dotenv/config';
+import { badRequest, serviceUnavailable, okStreamResponse } from '../utils';
+import { Readable } from 'stream';
+
+export async function chat(request: HttpRequest, context: InvocationContext): Promise<HttpResponseInit> {
+  context.log(`Http function processed request for url "${request.url}"`);
+
+  try {
+    const requestBody: any = await request.json();
+
+    if (!requestBody?.question) {
+      return badRequest(new Error('No question provided'));
+    }
+
+    const { question } = requestBody;
+
+    const embeddings = new AzureOpenAIEmbeddings();
+
+    const prompt = `Question: ${question}`;
+    context.log(`Sending prompt to the model: ${prompt}`);
+
+    const model = new AzureChatOpenAI();
+
+    const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
+      ['system', "Answer the user's questions based on the below context:\n\n{context}"],
+      ['human', '{input}'],
+    ]);
+
+    const combineDocsChain = await createStuffDocumentsChain({
+      llm: model,
+      prompt: questionAnsweringPrompt,
+    });
+
+    const store = new AzureCosmosDBVectorStore(embeddings, {});
+
+    const chain = await createRetrievalChain({
+      retriever: store.asRetriever(),
+      combineDocsChain,
+    });
+
+    const response = await chain.stream({
+      input: question,
+    });
+
+    return {
+      body: createStream(response),
+      headers: {
+        'Content-Type': 'text/plain',
+      },
+    };
+  } catch (error: unknown) {
+    const error_ = error as Error;
+    context.error(`Error when processing chat request: ${error_.message}`);
+
+    return serviceUnavailable(new Error('Service temporarily unavailable. Please try again later.'));
+  }
+}
+
+function createStream(
+  chunks: IterableReadableStream<
+    {
+      context: Document[];
+      answer: string;
+    } & {
+      [key: string]: unknown;
+    }
+  >,
+) {
+  const buffer = new Readable({
+    read() {},
+  });
+
+  const stream = async () => {
+    for await (const chunk of chunks) {
+      buffer.push(chunk.answer);
+    }
+
+    buffer.push(null);
+  };
+
+  stream();
+
+  return buffer;
+}
+```
+
+Several changes here, right? But let's understand what has been changed and included here:
+
+```typescript
+const response = await chain.stream({
+  input: question,
+});
+```
+
+Before, the `chain` variable was using the `invoke()` method. However, as we now want to generate a stream response, we are using the `stream()` method. And passing the `input` parameter with the question the user asked.
+
+After that, we're returning the stream response, using the `createStream()` function.
+
+```typescript
+function createStream(
+  chunks: IterableReadableStream<
+    {
+      context: Document[];
+      answer: string;
+    } & {
+      [key: string]: unknown;
+    }
+  >,
+) {
+  const buffer = new Readable({
+    read() {},
+  });
+
+  const stream = async () => {
+    for await (const chunk of chunks) {
+      buffer.push(chunk.answer);
+    }
+
+    buffer.push(null);
+  };
+
+  stream();
+
+  return buffer;
+}
+```
+
+The `createStream()` function is responsible for creating a response stream. It receives as a parameter `chunks`, which is an `IterableReadableStream` that contains the chunks of the response. And for each chunk, it is added to the buffer, which is a read stream. And at the end, the buffer is returned.
+
+Note that we are importing:
+
+- `IterableReadableStream` from the `@langchain/core/dist/utils/stream` package: which is an iterable stream that can be used to generate stream responses.
+- `Document` from the `@langchain/core/documents` package: which is an interface for interacting with a document.
+- `Readable` from the `node:stream` package: class that belongs to the `stream` module of Node.js, which is an interface for reading data from a stream.
+
+```typescript
+return {
+  headers: { 'Content-Type': 'text/plain' },
+  body: createStream(response),
+};
+```
+
+And finally, we're returning the stream response using the `createStream()` function. And setting the `Content-Type` header to `text/plain`.
+
+And that's it! Now the `chat` API is ready to generate stream responses.
+
+Let's test the `chat` API and see how it behaves when generating a stream response. To do this, open the terminal again in the `api` folder and run the command:
+
+```bash
+npm run start
+```
+
+And then open the `api.http` file and send the `chat` API request, and now see the return of the response using the stream in the gif below:
+
+![chat-stream](./images/stream-response.gif)
+
+Note that when we send the request, the `Response` header shows `Transfer-Encoding: chunked`, which indicates that the response is being sent in chunks. And the response is displayed sequentially, i.e. as the response is generated, it is displayed.
+
+![chat-stream-response](./images/stream-response.png)
+
+And that's it! You've now learned how to generate a stream response in the `chat` API using LangChain.js and the new stream feature that is also available for v4 of the Azure Functions programming model.
diff --git a/docs/development/images/chat-stream-response.png b/docs/development/images/chat-stream-response.png
diff --git a/docs/development/images/stream-response.gif b/docs/development/images/stream-response.gif