diff --git a/notebooks/meta-learning-results.csv b/notebooks/meta-learning-results.csv new file mode 100644 index 0000000..ad70961 --- /dev/null +++ b/notebooks/meta-learning-results.csv @@ -0,0 +1,21 @@ +Question Number,Question,Manual State +1,Which courses are currently open for enrollment?,"To find courses open for enrollment, follow these steps: 1) Access the database interface and go to the course section. 2) Enter the query ""Which courses are currently open for enrollment?"" to filter by enrollment status. 3) The results will list courses with details like title, description, duration, and enrollment status. 4) Ensure the 'currentlyEnrolling' field is 'True' to confirm availability. 5) Review course descriptions for content and focus areas. 6) Check course duration to plan your schedule. This process helps you select courses efficiently and ensures you receive accurate results." +2,Show me all courses where the total duration is less than 10 hours.,"To find courses open for enrollment, access the database interface and navigate to the course section. Enter the query ""Which courses are currently open for enrollment?"" to filter by enrollment status. The results will list courses with details like title, description, duration, and enrollment status. Ensure the 'currentlyEnrolling' field is 'True' to confirm availability. Review course descriptions for content and focus areas, and check course duration to plan your schedule. For specific queries, such as finding courses with a total duration of less than 10 hours, adjust the query to include duration filters. This process helps you select courses efficiently and ensures you receive accurate results." +3,Which courses mention advanced Python programming in their description?,"To find courses open for enrollment, access the database interface and navigate to the course section. Enter the query ""Which courses are currently open for enrollment?"" to filter by enrollment status. Ensure the 'currentlyEnrolling' field is 'True' to confirm availability. The results will list courses with details like title, description, duration, and enrollment status. Review course descriptions for content and focus areas, and check course duration to plan your schedule. For specific queries, such as finding courses with a total duration of less than 10 hours, adjust the query to include duration filters. To find courses mentioning specific topics, modify the query to search within course descriptions. This process helps you select courses efficiently and ensures you receive accurate results." +4,What are the key learning outcomes for the course titled 'Data Structures 101'?,"To find courses open for enrollment, access the database interface and navigate to the course section. Enter the query ""Which courses are currently open for enrollment?"" and ensure the 'currentlyEnrolling' field is 'True'. The results will list courses with details like title, description, duration, and enrollment status. Review course descriptions and check course duration to plan your schedule. For courses under 10 hours, adjust the query to include duration filters. To find courses on specific topics, modify the query to search within course descriptions. For key learning outcomes, ensure the course title is correctly entered. If no results appear, verify the course title and check the database for course descriptions. This process helps you select courses efficiently and ensures accurate results." +5,List all courses that have a duration of 40 hours or more.,"To find courses open for enrollment, access the database interface and navigate to the course section. Enter the query ""Which courses are currently open for enrollment?"" and ensure the 'currentlyEnrolling' field is 'True'. The results will list courses with details like title, description, duration, and enrollment status. Review course descriptions and check course duration to plan your schedule. For courses under 10 hours, adjust the query to include duration filters. To find courses on specific topics, modify the query to search within course descriptions. Ensure the course title is correctly entered for key learning outcomes. If no results appear, verify the course title and check the database for course descriptions. When searching for courses with specific durations, such as 40 hours or more, include a duration filter in your query. This process helps you select courses efficiently and ensures accurate results." +6,Which instructors have over 10 years of teaching experience?,"To find courses open for enrollment, access the database and navigate to the course section. Enter the query ""Which courses are currently open for enrollment?"" and ensure the 'currentlyEnrolling' field is 'True'. The results will list courses with details like title, description, duration, and enrollment status. For courses under 10 hours, adjust the query to include duration filters. To find courses on specific topics, modify the query to search within course descriptions. Ensure the course title is correctly entered for key learning outcomes. If no results appear, verify the course title and check the database for course descriptions. To find courses with specific durations, such as 40 hours or more, include a duration filter in your query. To find instructors with over 10 years of teaching experience, query the database with ""Which instructors have over 10 years of teaching experience?"" and review the 'yearsOfTeaching' field in the results." +7,Show me the biography of the instructor named 'Dr. Jane Doe'.,"To find courses open for enrollment, access the database and navigate to the course section. Use the query ""Which courses are currently open for enrollment?"" and ensure the 'currentlyEnrolling' field is 'True'. For courses under 10 hours, add a duration filter. To search for courses on specific topics, modify the query to include keywords in course descriptions. Verify course titles for key learning outcomes if no results appear. To find courses with durations of 40 hours or more, include a duration filter in your query. For instructors with over 10 years of experience, query ""Which instructors have over 10 years of teaching experience?"" and check the 'yearsOfTeaching' field. To view an instructor's biography, use ""Show me the biography of the instructor named [Instructor's Name]"" and ensure correct spelling." +8,Which instructors are tenured?,"To find courses open for enrollment, navigate to the course section in the database and use the query ""Which courses are currently open for enrollment?"" Ensure the 'currentlyEnrolling' field is 'True'. For courses under 10 hours, apply a duration filter. To search for specific topics, include keywords in course descriptions. If no results appear, verify course titles for key learning outcomes. To find courses with durations of 40 hours or more, add a duration filter. For instructors with over 10 years of experience, query ""Which instructors have over 10 years of teaching experience?"" and check the 'yearsOfTeaching' field. To view an instructor's biography, use ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. To identify tenured instructors, query ""Which instructors are tenured?"" and ensure the 'tenured' field is 'True'." +9,Who has the longest teaching career among all instructors in the database?,"To find courses open for enrollment, query ""Which courses are currently open for enrollment?"" and ensure the 'currentlyEnrolling' field is 'True'. For courses under 10 hours, apply a duration filter. To search by topic, include keywords in course descriptions. If no results appear, verify course titles for key learning outcomes. For courses 40 hours or more, add a duration filter. To find instructors with over 10 years of experience, query ""Which instructors have over 10 years of teaching experience?"" and check the 'yearsOfTeaching' field. To view an instructor's biography, use ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. To identify tenured instructors, query ""Which instructors are tenured?"" and ensure the 'tenured' field is 'True'. To find the instructor with the longest teaching career, use ""Who has the longest teaching career among all instructors?"" and check the 'yearsOfTeaching' field for the maximum value." +10,Find instructors who mention a 'hands-on learning' philosophy in their biography.,"To find courses open for enrollment, query ""Which courses are currently open for enrollment?"" and ensure the 'currentlyEnrolling' field is 'True'. For courses under 10 hours or 40 hours and more, apply a duration filter. To search by topic, include keywords in course descriptions. If no results appear, verify course titles for key learning outcomes. To find instructors with over 10 years of experience, query ""Which instructors have over 10 years of teaching experience?"" and check the 'yearsOfTeaching' field. To view an instructor's biography, use ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. To identify tenured instructors, query ""Which instructors are tenured?"" and ensure the 'tenured' field is 'True'. To find the instructor with the longest teaching career, use ""Who has the longest teaching career among all instructors?"" and check the 'yearsOfTeaching' field for the maximum value. To find instructors who emphasize a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search for the phrase in the 'biography' field." +11,List all students who are enrolled full-time.,"To find courses open for enrollment, query ""Which courses are currently open for enrollment?"" and ensure the 'currentlyEnrolling' field is 'True'. Use a duration filter for courses under 10 hours or 40 hours and more. Include keywords in course descriptions to search by topic. Verify course titles for key learning outcomes if no results appear. To find instructors with over 10 years of experience, query ""Which instructors have over 10 years of teaching experience?"" and check the 'yearsOfTeaching' field. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensuring the 'tenured' field is 'True'. To find the instructor with the longest teaching career, use ""Who has the longest teaching career among all instructors?"" and check the 'yearsOfTeaching' field for the maximum value. For instructors emphasizing a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search for the phrase in the 'biography' field. List all full-time enrolled students by querying ""List all students who are enrolled full-time"" and ensuring the 'enrolledFullTime' field is 'True'." +12,Which students have completed more than 30 credits?,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10 hours or 40 hours and more. Include keywords in descriptions to search by topic. Verify course titles for key outcomes if no results appear. To find experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits' for values over 30." +13,Show me the research interests of the student named 'Alex Johnson'.,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10 hours or 40 hours and more. Include keywords in descriptions to search by topic. Verify course titles for key outcomes if no results appear. To find experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits' for values over 30. To find a student's research interests, query ""Show me the research interests of the student named [Student's Name]"" and ensure the name is spelled correctly." +14,Find students with research interests in machine learning or data science.,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10 hours or 40 hours and more. Include keywords in descriptions to search by topic. Verify course titles for key outcomes if no results appear. To find experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits' for values over 30. To find a student's research interests, query ""Show me the research interests of the student named [Student's Name]"" and ensure the name is spelled correctly. To find students with specific research interests, such as machine learning or data science, query ""Find students with research interests in [specific topic]"" and search the 'researchInterests' field for relevant keywords." +15,Which students are pursuing studies in artificial intelligence?,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10 hours or 40 hours and more. Include keywords in descriptions to search by topic. Verify course titles for key outcomes if no results appear. To find experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits' for values over 30. To find a student's research interests, query ""Show me the research interests of the student named [Student's Name]"" and ensure the name is spelled correctly. To find students with specific research interests, such as machine learning, data science, or artificial intelligence, query ""Find students with research interests in [specific topic]"" and search the 'researchInterests' field for relevant keywords." +16,Which instructors have a biography mentioning 'online teaching methods' and have been teaching for more than 5 years?,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10 hours or 40 hours and more. Include keywords in descriptions to search by topic. Verify course titles for key outcomes if no results appear. For experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. To find instructors with specific teaching methods, query ""Which instructors have a biography mentioning '[specific teaching method]' and have been teaching for more than [number] years?"" and check 'yearsOfTeaching' for the specified duration. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits' for values over 30. To find a student's research interests, query ""Show me the research interests of the student named [Student's Name]"" and ensure the name is spelled correctly. For students with specific research interests, query ""Find students with research interests in [specific topic]"" and search the 'researchInterests' field for relevant keywords." +17,Which courses currently open for enrollment have a duration under 20 hours?,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10, 20, or 40 hours. Include keywords in descriptions to search by topic. Verify course titles for key outcomes if no results appear. For experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. To find instructors with specific teaching methods, query ""Which instructors have a biography mentioning '[specific teaching method]' and have been teaching for more than [number] years?"" and check 'yearsOfTeaching' for the specified duration. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits' for values over 30. To find a student's research interests, query ""Show me the research interests of the student named [Student's Name]"" and ensure the name is spelled correctly. For students with specific research interests, query ""Find students with research interests in [specific topic]"" and search the 'researchInterests' field for relevant keywords." +18,Show me all students who have completed at least 20 credits but are not enrolled full-time.,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10, 20, or 40 hours. Include keywords in descriptions to search by topic. Verify course titles for key outcomes if no results appear. For experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. To find instructors with specific teaching methods, query ""Which instructors have a biography mentioning '[specific teaching method]' and have been teaching for more than [number] years?"" and check 'yearsOfTeaching' for the specified duration. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits' for values over 30. To find students with at least 20 credits but not enrolled full-time, query ""Show me all students who have completed at least 20 credits but are not enrolled full-time"" and ensure 'completedCredits' is 20 or more and 'enrolledFullTime' is 'False'. To find a student's research interests, query ""Show me the research interests of the student named [Student's Name]"" and ensure the name is spelled correctly. For students with specific research interests, query ""Find students with research interests in [specific topic]"" and search the 'researchInterests' field for relevant keywords." +19,Which instructors are tenured and have a teaching philosophy related to 'project-based learning'?,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10, 20, or 40 hours. Include keywords in descriptions to search by topic. Verify course titles for key outcomes if no results appear. For experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. To find instructors with specific teaching methods, query ""Which instructors have a biography mentioning '[specific teaching method]' and have been teaching for more than [number] years?"" and check 'yearsOfTeaching' for the specified duration. To find tenured instructors with a specific teaching philosophy, query ""Which tenured instructors have a teaching philosophy related to '[specific teaching method]'?"" and ensure 'tenured' is 'True' and search the 'biography' field. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits' for values over 30. To find students with at least 20 credits but not enrolled full-time, query ""Show me all students who have completed at least 20 credits but are not enrolled full-time"" and ensure 'completedCredits' is 20 or more and 'enrolledFullTime' is 'False'. To find a student's research interests, query ""Show me the research interests of the student named [Student's Name]"" and ensure the name is spelled correctly. For students with specific research interests, query ""Find students with research interests in [specific topic]"" and search the 'researchInterests' field for relevant keywords." +20,Which courses specifically mention 'capstone project' in their description and are currently enrolling?,"To find open courses, query ""Which courses are currently open for enrollment?"" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10, 20, or 40 hours. Include keywords in descriptions to search by topic, such as ""capstone project"". Verify course titles for key outcomes if no results appear. For experienced instructors, query ""Which instructors have over 10 years of teaching experience?"" and check 'yearsOfTeaching'. View an instructor's biography by querying ""Show me the biography of the instructor named [Instructor's Name]"" with correct spelling. Identify tenured instructors by querying ""Which instructors are tenured?"" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use ""Who has the longest teaching career among all instructors?"" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query ""Which instructors mention a 'hands-on learning' philosophy in their biography?"" and search the 'biography' field. To find instructors with specific teaching methods, query ""Which instructors have a biography mentioning '[specific teaching method]' and have been teaching for more than [number] years?"" and check 'yearsOfTeaching'. To find tenured instructors with a specific teaching philosophy, query ""Which tenured instructors have a teaching philosophy related to '[specific teaching method]'?"" and ensure 'tenured' is 'True' and search the 'biography' field. List full-time students by querying ""List all students who are enrolled full-time"" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query ""Which students have completed more than 30 credits?"" and check 'completedCredits'. To find students with at least 20 credits but not enrolled full-time, query ""Show me all students who have completed at least 20 credits but are not enrolled full-time"" and ensure 'completedCredits' is 20 or more and 'enrolledFullTime' is 'False'. To find a student's research interests, query ""Show me the research interests of the student named [Student's Name]"" and ensure the name is spelled correctly. For students with specific research interests, query ""Find students with research interests in [specific topic]"" and search the 'researchInterests' field." diff --git a/notebooks/meta-learning-with-natural-language.ipynb b/notebooks/meta-learning-with-natural-language.ipynb new file mode 100644 index 0000000..c5f434d --- /dev/null +++ b/notebooks/meta-learning-with-natural-language.ipynb @@ -0,0 +1,1061 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "# Read the collection schemas from JSON file\n", + "with open(\"../data/3-collection-schemas-with-search-property.json\", \"r\") as f:\n", + " collection_schemas = json.load(f)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\"weaviate_collections\":[{\"name\":\"Courses\",\"properties\":[{\"name\":\"courseTitle\",\"data_type\":[\"string\"],\"description\":\"The title of the course.\"},{\"name\":\"courseDescription\",\"data_type\":[\"string\"],\"description\":\"A detailed summary of the course, including coverage topics and learning outcomes.\"},{\"name\":\"courseDuration\",\"data_type\":[\"number\"],\"description\":\"The total number of hours required to complete the course.\"},{\"name\":\"currentlyEnrolling\",\"data_type\":[\"boolean\"],\"description\":\"Indicates whether the course is currently open for enrollment.\"}],\"envisioned_use_case_overview\":\"This schema helps users find courses based on subject matter, duration, and enrollment status. Semantic search enhances discovery of courses by learning outcomes and topics covered.\"},{\"name\":\"Instructors\",\"properties\":[{\"name\":\"instructorName\",\"data_type\":[\"string\"],\"description\":\"The full name of the instructor.\"},{\"name\":\"biography\",\"data_type\":[\"string\"],\"description\":\"A detailed biography of the instructor, including professional background and teaching philosophy.\"},{\"name\":\"yearsOfTeaching\",\"data_type\":[\"number\"],\"description\":\"The number of years the instructor has been teaching.\"},{\"name\":\"tenured\",\"data_type\":[\"boolean\"],\"description\":\"Indicates whether the instructor holds a tenured position.\"}],\"envisioned_use_case_overview\":\"This schema allows students and administrators to search for instructors based on experience and background. Rich biographies help in matching students with instructors who align with their learning style and academic goals.\"},{\"name\":\"Students\",\"properties\":[{\"name\":\"studentName\",\"data_type\":[\"string\"],\"description\":\"The full name of the student.\"},{\"name\":\"researchInterests\",\"data_type\":[\"string\"],\"description\":\"Detailed information on the student's academic interests and research focus.\"},{\"name\":\"completedCredits\",\"data_type\":[\"number\"],\"description\":\"The number of academic credits the student has completed.\"},{\"name\":\"enrolledFullTime\",\"data_type\":[\"boolean\"],\"description\":\"Indicates whether the student is enrolled full-time.\"}],\"envisioned_use_case_overview\":\"This schema is designed to help institutions manage student data and preferences. Semantic search allows deeper insights into student research interests and progression paths.\"}]}\n" + ] + } + ], + "source": [ + "courses_collection = collection_schemas[2]\n", + "print(courses_collection)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Connecting to Weaviate...\n", + "Successfully connected to Weaviate...\n" + ] + } + ], + "source": [ + "import os\n", + "import weaviate\n", + "\n", + "WEAVIATE_URL = os.getenv(\"WEAVIATE_URL\")\n", + "WEAVIATE_API_KEY = os.getenv(\"WEAVIATE_API_KEY\")\n", + "OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\")\n", + "\n", + "print(\"Connecting to Weaviate...\")\n", + "weaviate_client = weaviate.connect_to_weaviate_cloud(\n", + " cluster_url=WEAVIATE_URL,\n", + " auth_credentials=weaviate.auth.AuthApiKey(WEAVIATE_API_KEY),\n", + " headers={\"X-OpenAI-Api-Key\": OPENAI_API_KEY},\n", + ")\n", + "print(\"Successfully connected to Weaviate...\")\n", + "\n", + "# Delete existing collections if they exist\n", + "for collection_name in [\"Courses\", \"Instructors\", \"Students\"]:\n", + " if weaviate_client.collections.exists(collection_name):\n", + " weaviate_client.collections.delete(collection_name)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created 'Courses', 'Instructors', and 'Students' collections successfully!\n" + ] + } + ], + "source": [ + "import weaviate.classes as wvc\n", + "\n", + "# 1. Create the \"Courses\" collection\n", + "courses_collection = weaviate_client.collections.create(\n", + " name=\"Courses\",\n", + " vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),\n", + " properties=[\n", + " wvc.config.Property(\n", + " name=\"courseTitle\",\n", + " data_type=wvc.config.DataType.TEXT,\n", + " description=\"The title of the course.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"courseDescription\",\n", + " data_type=wvc.config.DataType.TEXT,\n", + " description=\"A detailed summary of the course, including coverage topics and learning outcomes.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"courseDuration\",\n", + " data_type=wvc.config.DataType.INT,\n", + " description=\"The total number of hours required to complete the course.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"currentlyEnrolling\",\n", + " data_type=wvc.config.DataType.BOOL,\n", + " description=\"Indicates whether the course is currently open for enrollment.\"\n", + " ),\n", + " ],\n", + ")\n", + "\n", + "# 2. Create the \"Instructors\" collection\n", + "instructors_collection = weaviate_client.collections.create(\n", + " name=\"Instructors\",\n", + " vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),\n", + " properties=[\n", + " wvc.config.Property(\n", + " name=\"instructorName\",\n", + " data_type=wvc.config.DataType.TEXT,\n", + " description=\"The full name of the instructor.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"biography\",\n", + " data_type=wvc.config.DataType.TEXT,\n", + " description=\"A detailed biography of the instructor, including professional background and teaching philosophy.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"yearsOfTeaching\",\n", + " data_type=wvc.config.DataType.INT,\n", + " description=\"The number of years the instructor has been teaching.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"tenured\",\n", + " data_type=wvc.config.DataType.BOOL,\n", + " description=\"Indicates whether the instructor holds a tenured position.\"\n", + " ),\n", + " ],\n", + ")\n", + "\n", + "# 3. Create the \"Students\" collection\n", + "students_collection = weaviate_client.collections.create(\n", + " name=\"Students\",\n", + " vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),\n", + " properties=[\n", + " wvc.config.Property(\n", + " name=\"studentName\",\n", + " data_type=wvc.config.DataType.TEXT,\n", + " description=\"The full name of the student.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"researchInterests\",\n", + " data_type=wvc.config.DataType.TEXT,\n", + " description=\"Detailed information on the student's academic interests and research focus.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"completedCredits\",\n", + " data_type=wvc.config.DataType.INT,\n", + " description=\"The number of academic credits the student has completed.\"\n", + " ),\n", + " wvc.config.Property(\n", + " name=\"enrolledFullTime\",\n", + " data_type=wvc.config.DataType.BOOL,\n", + " description=\"Indicates whether the student is enrolled full-time.\"\n", + " ),\n", + " ],\n", + ")\n", + "\n", + "print(\"Created 'Courses', 'Instructors', and 'Students' collections successfully!\")" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Successfully inserted all data into collections!\n" + ] + } + ], + "source": [ + "# Read the CSV files into pandas DataFrames\n", + "import pandas as pd\n", + "\n", + "courses_df = pd.read_csv(\"../data/data-for-use-cases/Courses.csv\")\n", + "instructors_df = pd.read_csv(\"../data/data-for-use-cases/Instructors.csv\")\n", + "students_df = pd.read_csv(\"../data/data-for-use-cases/Students.csv\")\n", + "\n", + "# Convert DataFrames to list of dictionaries with \"properties\" key\n", + "courses_data = [{\"properties\": row.to_dict()} for _, row in courses_df.iterrows()]\n", + "instructors_data = [{\"properties\": row.to_dict()} for _, row in instructors_df.iterrows()]\n", + "students_data = [{\"properties\": row.to_dict()} for _, row in students_df.iterrows()]\n", + "\n", + "# Insert data into respective collections\n", + "for course in courses_data:\n", + " courses_collection.data.insert(properties=course[\"properties\"])\n", + "\n", + "for instructor in instructors_data:\n", + " instructors_collection.data.insert(properties=instructor[\"properties\"])\n", + "\n", + "for student in students_data:\n", + " students_collection.data.insert(properties=student[\"properties\"])\n", + "\n", + "print(\"Successfully inserted all data into collections!\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from pydantic import BaseModel\n", + "from typing import Literal, Optional, List\n", + "\n", + "class IntPropertyFilter(BaseModel):\n", + " property_name: str\n", + " operator: Literal[\"=\", \"<\", \">\", \"<=\", \">=\"]\n", + " value: int | float\n", + "\n", + "\n", + "class TextPropertyFilter(BaseModel):\n", + " property_name: str\n", + " operator: Literal[\"=\", \"LIKE\"]\n", + " value: str\n", + "\n", + "\n", + "class BooleanPropertyFilter(BaseModel):\n", + " property_name: str\n", + " operator: Literal[\"=\", \"!=\"]\n", + " value: bool\n", + "\n", + "\n", + "class IntAggregation(BaseModel):\n", + " property_name: str\n", + " metrics: Literal[\"MIN\", \"MAX\", \"MEAN\", \"MEDIAN\", \"MODE\", \"SUM\"]\n", + "\n", + "\n", + "class TextAggregation(BaseModel):\n", + " property_name: str\n", + " metrics: Literal[\"TOP_OCCURRENCES\"]\n", + " top_occurrences_limit: Optional[int] = None\n", + "\n", + "\n", + "class BooleanAggregation(BaseModel):\n", + " property_name: str\n", + " metrics: Literal[\"TOTAL_TRUE\", \"TOTAL_FALSE\", \"PERCENTAGE_TRUE\", \"PERCENTAGE_FALSE\"]\n", + "\n", + "\n", + "class WeaviateQuery(BaseModel):\n", + " target_collection: str\n", + " search_query: Optional[str] = None\n", + "\n", + " integer_property_filter: Optional[IntPropertyFilter] = None\n", + " text_property_filter: Optional[TextPropertyFilter] = None\n", + " boolean_property_filter: Optional[BooleanPropertyFilter] = None\n", + "\n", + " limit: Optional[int] = 5\n", + "\n", + " integer_property_aggregation: Optional[IntAggregation] = None\n", + " text_property_aggregation: Optional[TextAggregation] = None\n", + " boolean_property_aggregation: Optional[BooleanAggregation] = None\n", + "\n", + " groupby_property: Optional[str] = None" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from weaviate.collections import Collection\n", + "from weaviate.collections.classes.filters import _FilterByProperty\n", + "from typing import Any, Union\n", + "\n", + "def _build_filter(\n", + " f: Union[IntPropertyFilter, TextPropertyFilter, BooleanPropertyFilter]\n", + ") -> _FilterByProperty:\n", + " \"\"\"Build a Weaviate filter from a property filter.\"\"\"\n", + " operators = {\n", + " \"=\": \"equal\",\n", + " \"!=\": \"not_equal\",\n", + " \"<\": \"less_than\",\n", + " \">\": \"greater_than\",\n", + " \"<=\": \"less_or_equal\",\n", + " \">=\": \"greater_or_equal\",\n", + " \"LIKE\": \"like\",\n", + " }\n", + "\n", + " method_name = operators.get(f.operator)\n", + " if not method_name:\n", + " raise ValueError(f\"Unsupported operator: {f.operator}\")\n", + "\n", + " value = f.value\n", + " if isinstance(f, BooleanPropertyFilter):\n", + " value = bool(value)\n", + "\n", + " filter_prop = wvc.query.Filter.by_property(f.property_name)\n", + " return getattr(filter_prop, method_name)(value)\n", + "\n", + "\n", + "def _build_numeric_metric(agg: IntAggregation) -> wvc.query.Metrics:\n", + " \"\"\"Build numeric metrics for aggregation.\"\"\"\n", + " metric = wvc.query.Metrics(agg.property_name)\n", + "\n", + " metric_map = {\n", + " \"MIN\": lambda m: m.number(minimum=True),\n", + " \"MAX\": lambda m: m.number(maximum=True),\n", + " \"MEAN\": lambda m: m.number(mean=True),\n", + " \"SUM\": lambda m: m.number(sum_=True),\n", + " }\n", + "\n", + " return metric_map[agg.metrics](metric)\n", + "\n", + "\n", + "def _build_text_metric(agg: TextAggregation) -> wvc.query.Metrics:\n", + " \"\"\"Build text metrics for aggregation.\"\"\"\n", + " metric = wvc.query.Metrics(agg.property_name)\n", + "\n", + " if agg.metrics == \"COUNT\":\n", + " return metric.count()\n", + " elif agg.metrics == \"TOP_OCCURRENCES\":\n", + " return metric.text(\n", + " top_occurrences_count=True,\n", + " top_occurrences_value=True,\n", + " )\n", + " return metric.count() # default\n", + "\n", + "\n", + "def _build_boolean_metric(agg: BooleanAggregation) -> wvc.query.Metrics:\n", + " \"\"\"Build boolean metrics for aggregation.\"\"\"\n", + " metric = wvc.query.Metrics(agg.property_name)\n", + "\n", + " metric_map = {\n", + " \"TOTAL_TRUE\": lambda m: m.boolean(total_true=True),\n", + " \"TOTAL_FALSE\": lambda m: m.boolean(total_false=True),\n", + " \"PERCENTAGE_TRUE\": lambda m: m.boolean(percentage_true=True),\n", + " \"PERCENTAGE_FALSE\": lambda m: m.boolean(percentage_false=True),\n", + " }\n", + "\n", + " return metric_map[agg.metrics](metric)\n", + "\n", + "\n", + "def _format_query_result(result: Any) -> str:\n", + " \"\"\"Format query results into a readable string.\"\"\"\n", + "\n", + " # Handle QueryReturn objects (regular search/filter queries)\n", + " if hasattr(result, \"objects\"):\n", + " formatted = \"Found objects:\\n\"\n", + " for obj in result.objects:\n", + " formatted += \"-\" * 40 + \"\\n\"\n", + " for key, value in obj.properties.items():\n", + " formatted += f\"{key}: {value}\\n\"\n", + " return formatted\n", + "\n", + " # Handle AggregateReturn objects (simple aggregations)\n", + " elif hasattr(result, \"properties\"):\n", + " formatted = \"Aggregation results:\\n\"\n", + " formatted += \"-\" * 40 + \"\\n\"\n", + " for prop_name, metrics in result.properties.items():\n", + " formatted += f\"Property: {prop_name}\\n\"\n", + " for metric_name, value in metrics.__dict__.items():\n", + " if value is not None:\n", + " if metric_name == \"top_occurrences\":\n", + " formatted += f\" Most common values:\\n\"\n", + " for occurrence in value:\n", + " formatted += f\" - {occurrence.value} (count: {occurrence.count})\\n\"\n", + " else:\n", + " formatted += f\" {metric_name}: {value}\\n\"\n", + " if hasattr(result, \"total_count\"):\n", + " formatted += f\"Total count: {result.total_count}\\n\"\n", + " return formatted\n", + "\n", + " # Handle AggregateGroupByReturn objects (grouped aggregations)\n", + " elif hasattr(result, \"groups\"):\n", + " formatted = \"Grouped aggregation results:\\n\"\n", + " for group in result.groups:\n", + " formatted += \"-\" * 40 + \"\\n\"\n", + " formatted += f\"Group: {group.grouped_by.prop} = {group.grouped_by.value}\\n\"\n", + " for prop_name, metrics in group.properties.items():\n", + " formatted += f\"Property: {prop_name}\\n\"\n", + " for metric_name, value in metrics.__dict__.items():\n", + " if value is not None:\n", + " if metric_name == \"top_occurrences\":\n", + " formatted += f\" Most common values:\\n\"\n", + " for occurrence in value:\n", + " formatted += f\" - {occurrence.value} (count: {occurrence.count})\\n\"\n", + " else:\n", + " formatted += f\" {metric_name}: {value}\\n\"\n", + " formatted += f\"Group count: {group.total_count}\\n\"\n", + " return formatted\n", + "\n", + " return str(result)\n", + "\n", + "\n", + "def execute_weaviate_query(\n", + " collection,\n", + " query: WeaviateQuery,\n", + " return_properties: list[str] | None = None,\n", + ") -> str:\n", + " # Build filters if any exist\n", + " filters = None\n", + " if query.integer_property_filter:\n", + " filters = _build_filter(query.integer_property_filter)\n", + " elif query.text_property_filter:\n", + " filters = _build_filter(query.text_property_filter)\n", + " elif query.boolean_property_filter:\n", + " filters = _build_filter(query.boolean_property_filter)\n", + "\n", + " # Handle aggregations if they exist\n", + " if any([\n", + " query.integer_property_aggregation,\n", + " query.text_property_aggregation,\n", + " query.boolean_property_aggregation,\n", + " ]):\n", + " metrics = []\n", + " if query.integer_property_aggregation:\n", + " metrics.append(_build_numeric_metric(query.integer_property_aggregation))\n", + " if query.text_property_aggregation:\n", + " metrics.append(_build_text_metric(query.text_property_aggregation))\n", + " if query.boolean_property_aggregation:\n", + " metrics.append(_build_boolean_metric(query.boolean_property_aggregation))\n", + "\n", + " group_by = None\n", + " if query.groupby_property:\n", + " group_by = wvc.aggregate.GroupByAggregate(prop=query.groupby_property)\n", + "\n", + " if query.search_query:\n", + " result = collection.aggregate.near_text(\n", + " query=query.search_query,\n", + " object_limit=query.limit,\n", + " total_count=True,\n", + " group_by=group_by,\n", + " return_metrics=metrics,\n", + " filters=wvc.query.Filter.all_of([filters]) if filters else None,\n", + " )\n", + " else:\n", + " result = collection.aggregate.over_all(\n", + " total_count=True,\n", + " group_by=group_by,\n", + " return_metrics=metrics,\n", + " filters=wvc.query.Filter.all_of([filters]) if filters else None,\n", + " )\n", + " else:\n", + " # Handle regular queries - use hybrid only when there's a search query\n", + " if query.search_query:\n", + " result = collection.query.hybrid(\n", + " query=query.search_query,\n", + " filters=wvc.query.Filter.all_of([filters]) if filters else None,\n", + " limit=query.limit,\n", + " return_properties=return_properties,\n", + " )\n", + " else:\n", + " # Use fetch() for filter-only queries\n", + " result = collection.query.fetch_objects(\n", + " filters=wvc.query.Filter.all_of([filters]) if filters else None,\n", + " limit=query.limit,\n", + " return_properties=return_properties,\n", + " )\n", + "\n", + " return _format_query_result(result)\n", + "\n", + "\n", + "def query_collection(weaviate_client, query: WeaviateQuery) -> str:\n", + " \"\"\"Query Weaviate, Return Search Results or Aggregations.\"\"\"\n", + " collection = weaviate_client.collections.get(query.target_collection)\n", + " return execute_weaviate_query(\n", + " collection=collection,\n", + " query=query,\n", + " return_properties=None\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Testing: Simple Search\n", + "--------------------------------------------------\n", + "Success!\n", + "Result:\n", + "Found objects:\n", + "----------------------------------------\n", + "courseDescription: Deep dive into neural networks, reinforcement learning, and deep learning architectures. Includes hands-on projects with real-world datasets and implementation of state-of-the-art algorithms. Focus on both theoretical foundations and practical applications.\n", + "courseDuration: 48\n", + "currentlyEnrolling: True\n", + "courseTitle: Advanced Machine Learning\n", + "----------------------------------------\n", + "courseDescription: In-depth study of Mathematics Linear Algebra. includes hands-on projects and features case studies. Prepares students for professional practice.\n", + "courseDuration: 34\n", + "currentlyEnrolling: True\n", + "courseTitle: Linear Algebra II\n", + "----------------------------------------\n", + "courseDescription: Interactive learning experience focusing on Computer Science Data Structures. combines theoretical and practical elements and incorporates real-world applications. Enables application of theoretical knowledge to real-world scenarios.\n", + "courseDuration: 40\n", + "currentlyEnrolling: True\n", + "courseTitle: Data Structures II\n", + "----------------------------------------\n", + "courseDescription: Advanced analysis of Computer Science Software Engineering. combines theoretical and practical elements and integrates modern methodologies. Enables application of theoretical knowledge to real-world scenarios.\n", + "courseDuration: 24\n", + "currentlyEnrolling: True\n", + "courseTitle: Software Engineering II\n", + "----------------------------------------\n", + "courseDescription: Interactive learning experience focusing on Chemistry Analytical Methods. incorporates real-world applications and incorporates real-world applications. Develops critical thinking and analytical skills.\n", + "courseDuration: 45\n", + "currentlyEnrolling: False\n", + "courseTitle: Analytical Methods II\n", + "\n", + "\n", + "Testing: Numeric Filter\n", + "--------------------------------------------------\n", + "Success!\n", + "Result:\n", + "Found objects:\n", + "----------------------------------------\n", + "courseDescription: Analysis of global historical events from 1750 to present, examining social movements, technological revolutions, and geopolitical changes. Incorporates primary source analysis and comparative historical methods.\n", + "courseDuration: 42\n", + "currentlyEnrolling: False\n", + "courseTitle: Modern World History\n", + "----------------------------------------\n", + "courseDescription: In-depth study of Chemistry Organic Chemistry. features case studies and emphasizes problem-solving techniques. Provides comprehensive understanding of core concepts.\n", + "courseDuration: 42\n", + "currentlyEnrolling: True\n", + "courseTitle: Organic Chemistry I\n", + "----------------------------------------\n", + "courseDescription: Advanced analysis of Mathematics Calculus. emphasizes problem-solving techniques and combines theoretical and practical elements. Builds practical expertise in the field.\n", + "courseDuration: 41\n", + "currentlyEnrolling: True\n", + "courseTitle: Calculus II\n", + "\n", + "\n", + "Testing: Text Filter with Aggregation\n", + "--------------------------------------------------\n", + "Success!\n", + "Result:\n", + "Aggregation results:\n", + "----------------------------------------\n", + "Property: researchInterests\n", + " Most common values:\n", + " - Analyzing renewable energy storage solutions with a focus on battery efficiency and grid management. (count: 1)\n", + " - Analyzing the efficacy of mindfulness-based interventions in managing chronic stress. (count: 1)\n", + " - Assessing the impact of social and mobile games on user behavior and engagement. (count: 1)\n", + " - Developing machine learning algorithms for personalized dietary recommendations. (count: 1)\n", + " - Examining climate adaptation policies in coastal regions to mitigate flood risks. (count: 1)\n", + "Total count: 35\n", + "\n", + "\n", + "Testing: Complex Query\n", + "--------------------------------------------------\n", + "Success!\n", + "Result:\n", + "Grouped aggregation results:\n", + "----------------------------------------\n", + "Group: tenured = true\n", + "Property: yearsOfTeaching\n", + " mean: 13.634146341463415\n", + "Group count: 41\n", + "\n" + ] + } + ], + "source": [ + "# Test case 1: Simple search query\n", + "simple_search = WeaviateQuery(\n", + " target_collection=\"Courses\", \n", + " search_query=\"machine learning\",\n", + " limit=5\n", + ")\n", + "\n", + "# Test case 2: Query with integer filter\n", + "numeric_filter_query = WeaviateQuery(\n", + " target_collection=\"Courses\",\n", + " integer_property_filter=IntPropertyFilter(\n", + " property_name=\"courseDuration\",\n", + " operator=\">\",\n", + " value=40\n", + " ),\n", + " limit=3\n", + ")\n", + "\n", + "# Test case 3: Query with text filter and aggregation\n", + "text_filter_and_agg_query = WeaviateQuery(\n", + " target_collection=\"Students\",\n", + " boolean_property_filter=BooleanPropertyFilter(\n", + " property_name=\"enrolledFullTime\",\n", + " operator=\"=\",\n", + " value=True\n", + " ),\n", + " text_property_aggregation=TextAggregation(\n", + " property_name=\"researchInterests\",\n", + " metrics=\"TOP_OCCURRENCES\",\n", + " top_occurrences_limit=5\n", + " )\n", + ")\n", + "\n", + "# Test case 4: Query with boolean filter and aggregation with groupby\n", + "complex_query = WeaviateQuery(\n", + " target_collection=\"Instructors\",\n", + " boolean_property_filter=BooleanPropertyFilter(\n", + " property_name=\"tenured\",\n", + " operator=\"=\",\n", + " value=True\n", + " ),\n", + " integer_property_aggregation=IntAggregation(\n", + " property_name=\"yearsOfTeaching\",\n", + " metrics=\"MEAN\"\n", + " ),\n", + " groupby_property=\"tenured\"\n", + ")\n", + "\n", + "# Function to run all test cases\n", + "def run_test_cases(weaviate_client):\n", + " test_cases = [\n", + " (\"Simple Search\", simple_search),\n", + " (\"Numeric Filter\", numeric_filter_query),\n", + " (\"Text Filter with Aggregation\", text_filter_and_agg_query),\n", + " (\"Complex Query\", complex_query)\n", + " ]\n", + " \n", + " for test_name, query in test_cases:\n", + " print(f\"\\nTesting: {test_name}\")\n", + " print(\"-\" * 50)\n", + " try:\n", + " result = query_collection(weaviate_client, query)\n", + " print(\"Success!\")\n", + " print(\"Result:\")\n", + " print(result)\n", + " except Exception as e:\n", + " print(f\"Error occurred: {str(e)}\")\n", + " print(f\"Query details: {query.dict()}\")\n", + "\n", + "run_test_cases(weaviate_client)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pydantic/_internal/_config.py:295: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/\n", + " warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning)\n" + ] + }, + { + "data": { + "text/plain": [ + "['Hello! How can I assist you today?']" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import dspy\n", + "\n", + "lm = dspy.LM(model=\"openai/gpt-4o\", api_key=os.getenv(\"OPENAI_API_KEY\"))\n", + "dspy.settings.configure(lm=lm)\n", + "lm(\"say hello\")" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "class SearchOrRespond(dspy.Signature):\n", + " \"\"\"Either perform a Weaviate search or respond directly based on the query.\"\"\"\n", + " \n", + " user_query: str = dspy.InputField()\n", + " \n", + " database_schema: str = dspy.InputField(desc=\"The database schema you can query to gain necessary information to provide the most accurate possible to the user.\")\n", + " \n", + " database_querying_manual: str = dspy.InputField(desc=\"A collection of notes about how to query the database and what kind of information you can access from its operators and the data contained in the collection described by the `database_schema` input.\")\n", + "\n", + " query_execution_history: str = dspy.InputField(desc=\"History of previous queries and their responses to provide context for the current query\")\n", + "\n", + " should_search: bool = dspy.OutputField(desc=\"True if we need to search Weaviate, False if we can answer directly\")\n", + " search_parameters: Optional[WeaviateQuery] = dspy.OutputField(desc=\"Parameters for WeaviateQuery if should_search is True\")\n", + " response_to_user: Optional[str] = dspy.OutputField(desc=\"A succinct and information dense response to the User.\")\n", + "\n", + "class UpdateQueryingManual(dspy.Signature):\n", + " \"\"\"Determine if and how to update the querying manual based on recent experience. \n", + " IMPORTANT: The manual MUST NOT exceed 16 sentences total!\"\"\"\n", + " \n", + " current_manual: str = dspy.InputField()\n", + " query: str = dspy.InputField()\n", + " result: str = dspy.InputField()\n", + " updated_manual: str = dspy.OutputField(desc=\"The updated manual incorporating new insights. CRITICAL: Must be 16 sentences or fewer!\")\n", + "\n", + "class TrimQueryingManual(dspy.Signature):\n", + " \"\"\"Trim and organize the querying manual to ensure it stays within length limits.\"\"\"\n", + " \n", + " current_manual: str = dspy.InputField()\n", + " trimmed_manual: str = dspy.OutputField(desc=\"A concise, well-organized version of the manual containing only the most essential information in 16 sentences or fewer.\")\n", + "\n", + "class FormatResponse(dspy.Signature):\n", + " \"\"\"Format the raw response into a natural, conversational reply.\"\"\"\n", + " \n", + " raw_response: str = dspy.InputField()\n", + " user_query: str = dspy.InputField()\n", + " formatted_response: str = dspy.OutputField(desc=\"A naturally worded, friendly response that presents the information clearly and engagingly.\")\n", + "\n", + "class WeaviateFunctionCallingAgent(dspy.Module):\n", + " def __init__(self, weaviate_client, collections_description):\n", + " self.predict = dspy.Predict(SearchOrRespond)\n", + " self.update_manual = dspy.Predict(UpdateQueryingManual)\n", + " self.trim_manual = dspy.Predict(TrimQueryingManual)\n", + " self.format_response = dspy.Predict(FormatResponse)\n", + " self.weaviate_client = weaviate_client\n", + " self.collections_description = collections_description\n", + " self.query_history = \"\"\n", + " self.database_querying_manual = \"Initial manual for querying the database.\"\n", + " \n", + " def get_database_querying_manual(self) -> str:\n", + " \"\"\"Inspect learned database manual.\"\"\"\n", + " return self.database_querying_manual\n", + "\n", + " def forward(self, user_query: str) -> str:\n", + " # Determine whether to search or respond directly\n", + " prediction = self.predict(\n", + " user_query=user_query, \n", + " database_schema=self.collections_description,\n", + " database_querying_manual=self.database_querying_manual,\n", + " query_execution_history=self.query_history\n", + " )\n", + " \n", + " result = None\n", + " if prediction.should_search:\n", + " query = WeaviateQuery(\n", + " target_collection=prediction.search_parameters.target_collection,\n", + " search_query=prediction.search_parameters.search_query,\n", + " integer_property_filter=prediction.search_parameters.integer_property_filter,\n", + " text_property_filter=prediction.search_parameters.text_property_filter,\n", + " boolean_property_filter=prediction.search_parameters.boolean_property_filter,\n", + " limit=prediction.search_parameters.limit,\n", + " integer_property_aggregation=prediction.search_parameters.integer_property_aggregation,\n", + " text_property_aggregation=prediction.search_parameters.text_property_aggregation,\n", + " boolean_property_aggregation=prediction.search_parameters.boolean_property_aggregation,\n", + " groupby_property=prediction.search_parameters.groupby_property\n", + " )\n", + " result = query_collection(self.weaviate_client, query)\n", + " # Update query history\n", + " self.query_history += f\"\\nQuery: {user_query}\\nResult: {result}\\n\"\n", + " \n", + " # Update manual with default if none returned\n", + " manual_update = self.update_manual(\n", + " current_manual=self.database_querying_manual,\n", + " query=user_query,\n", + " result=str(result)\n", + " )\n", + " if manual_update and manual_update.updated_manual:\n", + " self.database_querying_manual = manual_update.updated_manual\n", + " # Trim manual if needed\n", + " trimmed = self.trim_manual(current_manual=self.database_querying_manual)\n", + " if trimmed and trimmed.trimmed_manual:\n", + " self.database_querying_manual = trimmed.trimmed_manual\n", + " \n", + " # Format the response\n", + " formatted = self.format_response(raw_response=str(result), user_query=user_query)\n", + " return formatted.formatted_response if formatted else result\n", + " else:\n", + " # Update manual with default if none returned\n", + " manual_update = self.update_manual(\n", + " current_manual=self.database_querying_manual,\n", + " query=user_query,\n", + " result=prediction.response_to_user\n", + " )\n", + " if manual_update and manual_update.updated_manual:\n", + " self.database_querying_manual = manual_update.updated_manual\n", + " # Trim manual if needed\n", + " trimmed = self.trim_manual(current_manual=self.database_querying_manual)\n", + " if trimmed and trimmed.trimmed_manual:\n", + " self.database_querying_manual = trimmed.trimmed_manual\n", + " \n", + " # Format the direct response\n", + " formatted = self.format_response(raw_response=prediction.response_to_user, user_query=user_query)\n", + " return dspy.Prediction(response=formatted.formatted_response if formatted else prediction.response_to_user)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Initial manual for querying the database.\n" + ] + } + ], + "source": [ + "schema = '''{\"weaviate_collections\":[{\"name\":\"Courses\",\"properties\":[{\"name\":\"courseTitle\",\"data_type\":[\"string\"],\"description\":\"The title of the course.\"},{\"name\":\"courseDescription\",\"data_type\":[\"string\"],\"description\":\"A detailed summary of the course, including coverage topics and learning outcomes.\"},{\"name\":\"courseDuration\",\"data_type\":[\"number\"],\"description\":\"The total number of hours required to complete the course.\"},{\"name\":\"currentlyEnrolling\",\"data_type\":[\"boolean\"],\"description\":\"Indicates whether the course is currently open for enrollment.\"}],\"envisioned_use_case_overview\":\"This schema helps users find courses based on subject matter, duration, and enrollment status. Semantic search enhances discovery of courses by learning outcomes and topics covered.\"},{\"name\":\"Instructors\",\"properties\":[{\"name\":\"instructorName\",\"data_type\":[\"string\"],\"description\":\"The full name of the instructor.\"},{\"name\":\"biography\",\"data_type\":[\"string\"],\"description\":\"A detailed biography of the instructor, including professional background and teaching philosophy.\"},{\"name\":\"yearsOfTeaching\",\"data_type\":[\"number\"],\"description\":\"The number of years the instructor has been teaching.\"},{\"name\":\"tenured\",\"data_type\":[\"boolean\"],\"description\":\"Indicates whether the instructor holds a tenured position.\"}],\"envisioned_use_case_overview\":\"This schema allows students and administrators to search for instructors based on experience and background. Rich biographies help in matching students with instructors who align with their learning style and academic goals.\"},{\"name\":\"Students\",\"properties\":[{\"name\":\"studentName\",\"data_type\":[\"string\"],\"description\":\"The full name of the student.\"},{\"name\":\"researchInterests\",\"data_type\":[\"string\"],\"description\":\"Detailed information on the student's academic interests and research focus.\"},{\"name\":\"completedCredits\",\"data_type\":[\"number\"],\"description\":\"The number of academic credits the student has completed.\"},{\"name\":\"enrolledFullTime\",\"data_type\":[\"boolean\"],\"description\":\"Indicates whether the student is enrolled full-time.\"}],\"envisioned_use_case_overview\":\"This schema is designed to help institutions manage student data and preferences. Semantic search allows deeper insights into student research interests and progression paths.\"}]}'''\n", + "\n", + "weaviate_search_module = WeaviateFunctionCallingAgent(\n", + " weaviate_client=weaviate_client,\n", + " collections_description=schema\n", + ")\n", + "\n", + "print(weaviate_search_module.get_database_querying_manual())" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Starting processing of 20 questions...\n", + "\n", + "Processing question 1/20: Which courses are currently open for enrollment?\n", + "✓ Question 1 processed successfully\n", + " Response length: 1547 characters\n", + " Manual length: 600 characters\n", + "\n", + "Processing question 2/20: Show me all courses where the total duration is less than 10 hours.\n", + "✓ Question 2 processed successfully\n", + " Response length: 171 characters\n", + " Manual length: 709 characters\n", + "\n", + "Processing question 3/20: Which courses mention advanced Python programming in their description?\n", + "✓ Question 3 processed successfully\n", + " Response length: 250 characters\n", + " Manual length: 808 characters\n", + "\n", + "Processing question 4/20: What are the key learning outcomes for the course titled 'Data Structures 101'?\n", + "✓ Question 4 processed successfully\n", + " Response length: 222 characters\n", + " Manual length: 811 characters\n", + "\n", + "Processing question 5/20: List all courses that have a duration of 40 hours or more.\n", + "✓ Question 5 processed successfully\n", + " Response length: 2876 characters\n", + " Manual length: 929 characters\n", + "\n", + "Processing question 6/20: Which instructors have over 10 years of teaching experience?\n", + "✓ Question 6 processed successfully\n", + " Response length: 1277 characters\n", + " Manual length: 957 characters\n", + "\n", + "Processing question 7/20: Show me the biography of the instructor named 'Dr. Jane Doe'.\n", + "✓ Question 7 processed successfully\n", + " Response length: 202 characters\n", + " Manual length: 820 characters\n", + "\n", + "Processing question 8/20: Which instructors are tenured?\n", + "✓ Question 8 processed successfully\n", + " Response length: 1289 characters\n", + " Manual length: 873 characters\n", + "\n", + "Processing question 9/20: Who has the longest teaching career among all instructors in the database?\n", + "✓ Question 9 processed successfully\n", + " Response length: 222 characters\n", + " Manual length: 972 characters\n", + "\n", + "Processing question 10/20: Find instructors who mention a 'hands-on learning' philosophy in their biography.\n", + "✓ Question 10 processed successfully\n", + " Response length: 340 characters\n", + " Manual length: 1147 characters\n", + "\n", + "Processing question 11/20: List all students who are enrolled full-time.\n", + "✓ Question 11 processed successfully\n", + " Response length: 1101 characters\n", + " Manual length: 1293 characters\n", + "\n", + "Processing question 12/20: Which students have completed more than 30 credits?\n", + "✓ Question 12 processed successfully\n", + " Response length: 1661 characters\n", + " Manual length: 1287 characters\n", + "\n", + "Processing question 13/20: Show me the research interests of the student named 'Alex Johnson'.\n", + "✓ Question 13 processed successfully\n", + " Response length: 224 characters\n", + " Manual length: 1446 characters\n", + "\n", + "Processing question 14/20: Find students with research interests in machine learning or data science.\n", + "✓ Question 14 processed successfully\n", + " Response length: 546 characters\n", + " Manual length: 1669 characters\n", + "\n", + "Processing question 15/20: Which students are pursuing studies in artificial intelligence?\n", + "✓ Question 15 processed successfully\n", + " Response length: 239 characters\n", + " Manual length: 1695 characters\n", + "\n", + "Processing question 16/20: Which instructors have a biography mentioning 'online teaching methods' and have been teaching for more than 5 years?\n", + "✓ Question 16 processed successfully\n", + " Response length: 239 characters\n", + " Manual length: 1863 characters\n", + "\n", + "Processing question 17/20: Which courses currently open for enrollment have a duration under 20 hours?\n", + "✓ Question 17 processed successfully\n", + " Response length: 179 characters\n", + " Manual length: 1853 characters\n", + "\n", + "Processing question 18/20: Show me all students who have completed at least 20 credits but are not enrolled full-time.\n", + "✓ Question 18 processed successfully\n", + " Response length: 314 characters\n", + " Manual length: 2101 characters\n", + "\n", + "Processing question 19/20: Which instructors are tenured and have a teaching philosophy related to 'project-based learning'?\n", + "✓ Question 19 processed successfully\n", + " Response length: 307 characters\n", + " Manual length: 2333 characters\n", + "\n", + "Processing question 20/20: Which courses specifically mention 'capstone project' in their description and are currently enrolling?\n", + "✓ Question 20 processed successfully\n", + " Response length: 201 characters\n", + " Manual length: 2293 characters\n", + "\n", + "Processing complete. Results saved to meta-learning-results.csv\n" + ] + } + ], + "source": [ + "import csv\n", + "import os\n", + "from datetime import datetime\n", + "\n", + "# Define questions list\n", + "questions = [\n", + " # About Courses\n", + " \"Which courses are currently open for enrollment?\",\n", + " \"Show me all courses where the total duration is less than 10 hours.\",\n", + " \"Which courses mention advanced Python programming in their description?\", \n", + " \"What are the key learning outcomes for the course titled 'Data Structures 101'?\",\n", + " \"List all courses that have a duration of 40 hours or more.\",\n", + "\n", + " # About Instructors\n", + " \"Which instructors have over 10 years of teaching experience?\",\n", + " \"Show me the biography of the instructor named 'Dr. Jane Doe'.\",\n", + " \"Which instructors are tenured?\",\n", + " \"Who has the longest teaching career among all instructors in the database?\",\n", + " \"Find instructors who mention a 'hands-on learning' philosophy in their biography.\",\n", + "\n", + " # About Students\n", + " \"List all students who are enrolled full-time.\",\n", + " \"Which students have completed more than 30 credits?\",\n", + " \"Show me the research interests of the student named 'Alex Johnson'.\",\n", + " \"Find students with research interests in machine learning or data science.\",\n", + " \"Which students are pursuing studies in artificial intelligence?\",\n", + "\n", + " # Combining Criteria\n", + " \"Which instructors have a biography mentioning 'online teaching methods' and have been teaching for more than 5 years?\",\n", + " \"Which courses currently open for enrollment have a duration under 20 hours?\",\n", + " \"Show me all students who have completed at least 20 credits but are not enrolled full-time.\",\n", + " \"Which instructors are tenured and have a teaching philosophy related to 'project-based learning'?\",\n", + " \"Which courses specifically mention 'capstone project' in their description and are currently enrolling?\"\n", + "]\n", + "\n", + "csv_filename = \"meta-learning-results.csv\"\n", + "\n", + "print(f\"Starting processing of {len(questions)} questions...\")\n", + "weaviate_search_module = WeaviateFunctionCallingAgent(\n", + " weaviate_client=weaviate_client,\n", + " collections_description=schema\n", + ")\n", + "\n", + "with open(csv_filename, 'w', newline='') as csvfile:\n", + " writer = csv.writer(csvfile)\n", + " writer.writerow(['Question Number', 'Question', 'Manual State'])\n", + " \n", + " # Process each question and track manual evolution\n", + " for i, question in enumerate(questions, 1):\n", + " print(f\"\\nProcessing question {i}/{len(questions)}: {question}\")\n", + " try:\n", + " prediction = weaviate_search_module(question)\n", + " manual_state = weaviate_search_module.get_database_querying_manual()\n", + " writer.writerow([i, question, manual_state])\n", + " print(f\"✓ Question {i} processed successfully\")\n", + " print(f\" Response length: {len(str(prediction))} characters\")\n", + " print(f\" Manual length: {len(manual_state)} characters\")\n", + " except Exception as e:\n", + " print(f\"✗ Error processing question {i}: {str(e)}\")\n", + "\n", + "print(f\"\\nProcessing complete. Results saved to {csv_filename}\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "To find open courses, query \"Which courses are currently open for enrollment?\" and ensure 'currentlyEnrolling' is 'True'. Use duration filters for courses under 10, 20, or 40 hours. Include keywords in descriptions to search by topic, such as \"capstone project\". Verify course titles for key outcomes if no results appear. For experienced instructors, query \"Which instructors have over 10 years of teaching experience?\" and check 'yearsOfTeaching'. View an instructor's biography by querying \"Show me the biography of the instructor named [Instructor's Name]\" with correct spelling. Identify tenured instructors by querying \"Which instructors are tenured?\" and ensure 'tenured' is 'True'. To find the instructor with the longest career, use \"Who has the longest teaching career among all instructors?\" and check 'yearsOfTeaching' for the maximum value. For instructors with a 'hands-on learning' philosophy, query \"Which instructors mention a 'hands-on learning' philosophy in their biography?\" and search the 'biography' field. To find instructors with specific teaching methods, query \"Which instructors have a biography mentioning '[specific teaching method]' and have been teaching for more than [number] years?\" and check 'yearsOfTeaching'. To find tenured instructors with a specific teaching philosophy, query \"Which tenured instructors have a teaching philosophy related to '[specific teaching method]'?\" and ensure 'tenured' is 'True' and search the 'biography' field. List full-time students by querying \"List all students who are enrolled full-time\" and ensure 'enrolledFullTime' is 'True'. To find students with over 30 credits, query \"Which students have completed more than 30 credits?\" and check 'completedCredits'. To find students with at least 20 credits but not enrolled full-time, query \"Show me all students who have completed at least 20 credits but are not enrolled full-time\" and ensure 'completedCredits' is 20 or more and 'enrolledFullTime' is 'False'. To find a student's research interests, query \"Show me the research interests of the student named [Student's Name]\" and ensure the name is spelled correctly. For students with specific research interests, query \"Find students with research interests in [specific topic]\" and search the 'researchInterests' field.\n" + ] + } + ], + "source": [ + "print(weaviate_search_module.get_database_querying_manual())" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'We have two students currently involved in machine learning-related research. Christopher Perez is focusing on using machine learning to detect financial fraud and optimize risk assessment, while Aubrey Bennett is working on developing algorithms for personalized dietary recommendations. Christopher is enrolled part-time, having completed 72 credits, and Aubrey is enrolled full-time with 33 credits completed.'" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "weaviate_search_module(\"How many students are enrolled in machine learning courses?\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.10" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/weaviate_query_manual.md b/notebooks/weaviate_query_manual.md new file mode 100644 index 0000000..3569a71 --- /dev/null +++ b/notebooks/weaviate_query_manual.md @@ -0,0 +1,109 @@ +# Course Management System Query Guide + +## Course Queries + +### Finding Open Courses +To search for currently available courses: +```sql +"Which courses are currently open for enrollment?" +``` +Verify that 'currentlyEnrolling' is set to 'True'. + +### Duration Filters +Filter courses by duration thresholds: +- Under 10 hours +- Under 20 hours +- Under 40 hours + +### Topic-Based Search +- Include relevant keywords in course descriptions +- Check course titles for key outcomes if initial search yields no results +- Use specific terms like "capstone project" in search queries + +## Instructor Queries + +### Experience and Qualifications +For experienced faculty: +```sql +"Which instructors have over 10 years of teaching experience?" +``` +Reference the 'yearsOfTeaching' field. + +To view biographical information: +```sql +"Show me the biography of the instructor named [Instructor's Name]" +``` +Note: Exact spelling is required. + +### Tenure Status +To identify tenured faculty: +```sql +"Which instructors are tenured?" +``` +Verify 'tenured' is set to 'True'. + +### Career Length +To find the most experienced instructor: +```sql +"Who has the longest teaching career among all instructors?" +``` +Check 'yearsOfTeaching' for maximum value. + +### Teaching Philosophy +For specific teaching approaches: +```sql +"Which instructors mention a 'hands-on learning' philosophy in their biography?" +``` +Search the 'biography' field for relevant terms. + +### Combined Queries +For complex instructor searches: +```sql +"Which instructors have a biography mentioning '[specific teaching method]' and have been teaching for more than [number] years?" +``` +Check both 'biography' and 'yearsOfTeaching' fields. + +For tenured instructors with specific methods: +```sql +"Which tenured instructors have a teaching philosophy related to '[specific teaching method]'?" +``` +Ensure: +- 'tenured' is 'True' +- Search 'biography' field for teaching method + +## Student Information + +### Enrollment Status +To view full-time students: +```sql +"List all students who are enrolled full-time" +``` +Verify 'enrolledFullTime' is 'True'. + +### Credit Status +For credit-based searches: +```sql +"Which students have completed more than 30 credits?" +``` +Check 'completedCredits' field. + +For part-time students with significant credits: +```sql +"Show me all students who have completed at least 20 credits but are not enrolled full-time" +``` +Ensure: +- 'completedCredits' ≥ 20 +- 'enrolledFullTime' is 'False' + +### Research Interests +To view individual research interests: +```sql +"Show me the research interests of the student named [Student's Name]" +``` +Note: Exact spelling is required. + +For topic-specific searches: +```sql +"Find students with research interests in [specific topic]" +``` +Search the 'researchInterests' field for relevant keywords. \ No newline at end of file