Skip to content

Commit

Permalink
Merge branch 'main' into logan/merge_next
Browse files Browse the repository at this point in the history
  • Loading branch information
logan-markewich authored Feb 15, 2024
2 parents 0859241 + b5e96d4 commit 9e84e4f
Show file tree
Hide file tree
Showing 450 changed files with 15,384 additions and 2,961 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ venv/
.__pycache__
dev_notebooks/
llamaindex_registry.txt
packages_to_bump_deduped.txt
2 changes: 1 addition & 1 deletion docs/examples/llm/llama_2_llama_cpp.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
"source": [
"from llama_index.core import SimpleDirectoryReader, VectorStoreIndex\n",
"from llama_index.llms.llama_cpp import LlamaCPP\n",
"from llama_index.core.llms.llama_utils import (\n",
"from llama_index.llms.llama_cpp.llama_utils import (\n",
" messages_to_prompt,\n",
" completion_to_prompt,\n",
")"
Expand Down
139 changes: 51 additions & 88 deletions docs/examples/multi_modal/multi_modal_video_RAG.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@
"outputs": [],
"source": [
"%pip install llama-index-multi-modal-llms-openai\n",
"%pip install llama-index-vector-stores-lancedb"
"%pip install llama-index-vector-stores-lancedb\n",
"%pip install llama-index-embeddings-clip"
]
},
{
Expand All @@ -49,6 +50,7 @@
"outputs": [],
"source": [
"%pip install llama_index ftfy regex tqdm\n",
"%pip install -U openai-whisper\n",
"%pip install git+https://github.com/openai/CLIP.git\n",
"%pip install torch torchvision\n",
"%pip install matplotlib scikit-image\n",
Expand All @@ -57,7 +59,8 @@
"%pip install pytube\n",
"%pip install pydub\n",
"%pip install SpeechRecognition\n",
"%pip install ffmpeg-python"
"%pip install ffmpeg-python\n",
"%pip install soundfile"
]
},
{
Expand All @@ -67,7 +70,6 @@
"outputs": [],
"source": [
"from moviepy.editor import VideoFileClip\n",
"from pydub import AudioSegment\n",
"from pathlib import Path\n",
"import speech_recognition as sr\n",
"from pytube import YouTube\n",
Expand Down Expand Up @@ -230,46 +232,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Moviepy - Writing frames ./mixed_data/frame%04d.png.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Moviepy - Done writing frames ./mixed_data/frame%04d.png.\n",
"MoviePy - Writing audio in ./mixed_data/output_audio.wav\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"MoviePy - Done.\n",
"Text data saved to file\n",
"Audio file removed\n"
]
}
],
"outputs": [],
"source": [
"try:\n",
" metadata_vid = download_video(video_url, output_video_path)\n",
Expand Down Expand Up @@ -420,7 +383,7 @@
{
"data": {
"text/markdown": [
"**Node ID:** 2b0dab05-1469-48a2-9f36-2103458ba252<br>**Similarity:** 0.742850661277771<br>**Text:** The basic function underlying a normal distribution, aka a Gaussian, is e to the negative x squared. But you might wonder why this function? Of all the expressions we could dream up that give you s...<br>"
"**Node ID:** bda08ef1-137c-4d69-9bcc-b7005a41a13c<br>**Similarity:** 0.7431071996688843<br>**Text:** The basic function underlying a normal distribution, aka a Gaussian, is e to the negative x squared. But you might wonder why this function? Of all the expressions we could dream up that give you s...<br>"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
Expand All @@ -432,7 +395,7 @@
{
"data": {
"text/markdown": [
"**Node ID:** 6856d43b-9978-4882-a1ff-09d11913157b<br>**Similarity:** 0.7337126731872559<br>**Text:** This step is actually pretty technical, it goes a little beyond what I want to talk about here. Often use these objects called moment generating functions, that gives you a very abstract argument t...<br>"
"**Node ID:** 7d6d0f32-ce16-461b-be54-883241252e50<br>**Similarity:** 0.7335695028305054<br>**Text:** This step is actually pretty technical, it goes a little beyond what I want to talk about here. Often use these objects called moment generating functions, that gives you a very abstract argument t...<br>"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
Expand All @@ -444,7 +407,7 @@
{
"data": {
"text/markdown": [
"**Node ID:** 27e3d1f6-4b30-4087-a93d-5484edc814d3<br>**Similarity:** 0.7068501114845276<br>**Text:** This is the important point. All of the stuff that's involving s is now entirely separate from the integrated variable. This remaining integral is a little bit tricky. I did a whole video on it. It...<br>"
"**Node ID:** 519fb788-3927-4842-ad5c-88be61deaf65<br>**Similarity:** 0.7069740295410156<br>**Text:** The essence of what we want to compute is what the convolution between two copies of this function looks like. If you remember, in the last video, we had two different ways to visualize convolution...<br>"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
Expand All @@ -456,7 +419,7 @@
{
"data": {
"text/markdown": [
"**Node ID:** 320ed2fc-eeaa-48a3-a216-b90e0316b1a8<br>**Similarity:** 0.7065496444702148<br>**Text:** The essence of what we want to compute is what the convolution between two copies of this function looks like. If you remember, in the last video, we had two different ways to visualize convolution...<br>"
"**Node ID:** f265c3fb-3c9f-4f36-aa2a-fb15efff9783<br>**Similarity:** 0.706935465335846<br>**Text:** This is the important point. All of the stuff that's involving s is now entirely separate from the integrated variable. This remaining integral is a little bit tricky. I did a whole video on it. It...<br>"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
Expand Down Expand Up @@ -503,63 +466,63 @@
"name": "stdout",
"output_type": "stream",
"text": [
"('The video titled \"A pretty reason why Gaussian + Gaussian = Gaussian\" by '\n",
" '3Blue1Brown covers several aspects of the Gaussian function, also known as '\n",
" 'the normal distribution. Here are the key points discussed in the video:\\n'\n",
"('The video by 3Blue1Brown, titled \"A pretty reason why Gaussian + Gaussian = '\n",
" 'Gaussian,\" covers several aspects of the Gaussian function, also known as '\n",
" \"the normal distribution. Here's a summary of the key points discussed in the \"\n",
" 'video:\\n'\n",
" '\\n'\n",
" '1. **Central Limit Theorem**: The video begins by discussing the central '\n",
" 'limit theorem, which states that the sum of multiple copies of a random '\n",
" 'variable tends to look like a normal distribution. As the number of summed '\n",
" 'variable tends to look like a normal distribution. As the number of '\n",
" 'variables increases, the approximation to a normal distribution becomes '\n",
" 'better.\\n'\n",
" '\\n'\n",
" '2. **Convolution of Random Variables**: The process of adding two random '\n",
" 'variables is mathematically represented by a convolution of their respective '\n",
" 'distributions. The video explains how to visualize the convolution operation '\n",
" 'using two methods, with a focus on the second method involving diagonal '\n",
" 'slices.\\n'\n",
" 'distributions. The video explains the concept of convolution and how it is '\n",
" 'used to find the distribution of the sum of two random variables.\\n'\n",
" '\\n'\n",
" '3. **Gaussian Function**: The Gaussian function is more complex than just '\n",
" '\\\\(e^{-x^2}\\\\). The full formula includes a normalization factor to ensure '\n",
" 'the area under the curve is one, making it a valid probability distribution. '\n",
" 'The standard deviation (\\\\(\\\\sigma\\\\)) is used to describe the spread of the '\n",
" 'distribution, and the mean (\\\\(\\\\mu\\\\)) can be included to shift the center '\n",
" 'of the distribution.\\n'\n",
" '\\\\( e^{-x^2} \\\\). The full formula includes a scaling factor to ensure the '\n",
" 'area under the curve is 1 (making it a valid probability distribution), a '\n",
" 'standard deviation parameter \\\\( \\\\sigma \\\\) to describe the spread, and a '\n",
" 'mean parameter \\\\( \\\\mu \\\\) to shift the center. However, the video focuses '\n",
" 'on centered distributions with \\\\( \\\\mu = 0 \\\\).\\n'\n",
" '\\n'\n",
" '4. **Convolution of Two Gaussians**: The video explores what happens when '\n",
" 'you add two normally distributed random variables, which is equivalent to '\n",
" 'computing the convolution of two Gaussian functions. The author presents a '\n",
" 'visual approach to understand this calculation by exploiting the rotational '\n",
" 'symmetry of the graph of \\\\(e^{-x^2}\\\\).\\n'\n",
" '4. **Visualizing Convolution**: The video presents a visual method to '\n",
" 'understand the convolution of two Gaussian functions using diagonal slices '\n",
" 'on the xy-plane. This method involves looking at the probability density of '\n",
" 'landing on a point (x, y) as \\\\( f(x) \\\\times g(y) \\\\), where f and g are '\n",
" 'the two distributions being convolved.\\n'\n",
" '\\n'\n",
" '5. **Rotational Symmetry and Slices**: The video demonstrates that the graph '\n",
" 'of \\\\(e^{-x^2} \\\\cdot e^{-y^2}\\\\) is rotationally symmetric around the '\n",
" 'origin, which is a unique property of Gaussian functions. By examining '\n",
" 'diagonal slices of this graph, the author shows how to compute the area '\n",
" 'under these slices, which corresponds to the convolution of the two '\n",
" 'functions.\\n'\n",
" '5. **Rotational Symmetry**: A key property of the Gaussian function is its '\n",
" 'rotational symmetry, which is unique to bell curves. This symmetry is '\n",
" 'exploited in the video to simplify the calculation of the convolution. By '\n",
" 'rotating the graph 45 degrees, the computation becomes easier because the '\n",
" 'integral only involves one variable.\\n'\n",
" '\\n'\n",
" '6. **Resulting Distribution**: The convolution of two Gaussian functions is '\n",
" 'shown to be another Gaussian function. This is a special property because '\n",
" 'convolutions typically result in a different kind of function. The video '\n",
" 'explains that this property is key to understanding why the Gaussian '\n",
" 'function is the universal shape approached by the central limit theorem.\\n'\n",
" '6. **Result of Convolution**: The video demonstrates that the convolution of '\n",
" 'two Gaussian functions is another Gaussian function. This is a special '\n",
" 'property because convolutions typically result in a different kind of '\n",
" 'function. The standard deviation of the resulting Gaussian is \\\\( \\\\sqrt{2} '\n",
" '\\\\times \\\\sigma \\\\) if the original Gaussians had the same standard '\n",
" 'deviation.\\n'\n",
" '\\n'\n",
" '7. **Standard Deviation of the Result**: When reintroducing the constants '\n",
" 'for a normal distribution, the video concludes that the convolution of two '\n",
" 'normal distributions with mean 0 and standard deviation \\\\(\\\\sigma\\\\) '\n",
" 'results in another normal distribution with a standard deviation of '\n",
" '\\\\(\\\\sqrt{2} \\\\cdot \\\\sigma\\\\).\\n'\n",
" '7. **Proof of Central Limit Theorem**: The video explains that the '\n",
" 'convolution of two Gaussians being another Gaussian is a crucial step in '\n",
" 'proving the central limit theorem. It shows that the Gaussian function is a '\n",
" 'fixed point in the space of distributions, and since all distributions with '\n",
" 'finite variance tend towards a single universal shape, that shape must be '\n",
" 'the Gaussian.\\n'\n",
" '\\n'\n",
" '8. **Implications for the Central Limit Theorem**: The video emphasizes that '\n",
" 'the computation of the convolution of two Gaussians is fundamental to the '\n",
" 'central limit theorem. It shows that the Gaussian distribution is a fixed '\n",
" 'point in the space of distributions, which is why it is the shape that '\n",
" 'emerges from the central limit theorem.\\n'\n",
" '8. **Connection to Pi**: The video also touches on the connection between '\n",
" 'the Gaussian function and the number Pi, which appears in the formula for '\n",
" 'the normal distribution.\\n'\n",
" '\\n'\n",
" 'Throughout the video, the author provides visual examples and explanations '\n",
" 'to help viewers understand the mathematical concepts involved in the '\n",
" 'Gaussian function and its properties related to probability and statistics.')\n"
" 'The video aims to provide an intuitive geometric argument for why the sum of '\n",
" 'two normally distributed random variables is also normally distributed, and '\n",
" 'how this relates to the central limit theorem and the special properties of '\n",
" 'the Gaussian function.')\n"
]
}
],
Expand Down
104 changes: 104 additions & 0 deletions docs/examples/node_postprocessor/PII.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,110 @@
"new_nodes[0].node.metadata[\"__pii_node_info__\"]"
]
},
{
"cell_type": "markdown",
"id": "6cc87ed4",
"metadata": {},
"source": [
"### Option 3: Use Presidio for PII Masking\n",
"\n",
"Use presidio to identify and anonymize PII"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ac215117",
"metadata": {},
"outputs": [],
"source": [
"# load documents\n",
"text = \"\"\"\n",
"Hello Paulo Santos. The latest statement for your credit card account \\\n",
"4095-2609-9393-4932 was mailed to Seattle, WA 98109. \\\n",
"IBAN GB90YNTU67299444055881 and social security number is 474-49-7577 were verified on the system. \\\n",
"Further communications will be sent to [email protected] \n",
"\"\"\"\n",
"presidio_node = TextNode(text=text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a745520",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.postprocessor.presidio import PresidioPIINodePostprocessor\n",
"\n",
"processor = PresidioPIINodePostprocessor()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89cb17ed",
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core.schema import NodeWithScore\n",
"\n",
"presidio_new_nodes = processor.postprocess_nodes(\n",
" [NodeWithScore(node=presidio_node)]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8fe9cef",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\nHello <PERSON_1>. The latest statement for your credit card account <CREDIT_CARD_1> was mailed to <LOCATION_2>, <LOCATION_1>. IBAN <IBAN_CODE_1> and social security number is <US_SSN_1> were verified on the system. Further communications will be sent to <EMAIL_ADDRESS_1> \\n'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# view redacted text\n",
"presidio_new_nodes[0].node.get_text()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80203af0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'<EMAIL_ADDRESS_1>': '[email protected]',\n",
" '<US_SSN_1>': '474-49-7577',\n",
" '<IBAN_CODE_1>': 'GB90YNTU67299444055881',\n",
" '<LOCATION_1>': 'WA 98109',\n",
" '<LOCATION_2>': 'Seattle',\n",
" '<CREDIT_CARD_1>': '4095-2609-9393-4932',\n",
" '<PERSON_1>': 'Paulo Santos'}"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get mapping in metadata\n",
"# NOTE: this is not sent to the LLM!\n",
"presidio_new_nodes[0].node.metadata[\"__pii_node_info__\"]"
]
},
{
"attachments": {},
"cell_type": "markdown",
Expand Down
Loading

0 comments on commit 9e84e4f

Please sign in to comment.