#16 OOP Upgrade Feature Engineering Agent Example

business-science · Jan 10, 2025 · 5e97589 · 5e97589
1 parent 9b9aba3
commit 5e97589
Show file tree

Hide file tree

Showing 2 changed files with 54,493 additions and 587 deletions.
diff --git a/examples/data_cleaning_agent.ipynb b/examples/data_cleaning_agent.ipynb
@@ -1187,132 +1187,6 @@
     "data_cleaning_agent.get_recommended_cleaning_steps(markdown=True)"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Explore the agent documentation for more information"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[0;31mType:\u001b[0m           DataCleaningAgent\n",
-      "\u001b[0;31mString form:\u001b[0m    <ai_data_science_team.agents.data_cleaning_agent.DataCleaningAgent object at 0x7f86d9036290>\n",
-      "\u001b[0;31mFile:\u001b[0m           ~/Desktop/course_code/ai-data-science-team/ai_data_science_team/agents/data_cleaning_agent.py\n",
-      "\u001b[0;31mDocstring:\u001b[0m     \n",
-      "Creates a data cleaning agent that can process datasets based on user-defined instructions or default cleaning steps. \n",
-      "The agent generates a Python function to clean the dataset, performs the cleaning, and logs the process, including code \n",
-      "and errors. It is designed to facilitate reproducible and customizable data cleaning workflows.\n",
-      "\n",
-      "The agent performs the following default cleaning steps unless instructed otherwise:\n",
-      "\n",
-      "- Removing columns with more than 40% missing values.\n",
-      "- Imputing missing values with the mean for numeric columns.\n",
-      "- Imputing missing values with the mode for categorical columns.\n",
-      "- Converting columns to appropriate data types.\n",
-      "- Removing duplicate rows.\n",
-      "- Removing rows with missing values.\n",
-      "- Removing rows with extreme outliers (values 3x the interquartile range).\n",
-      "\n",
-      "User instructions can modify, add, or remove any of these steps to tailor the cleaning process.\n",
-      "\n",
-      "Parameters\n",
-      "----------\n",
-      "model : langchain.llms.base.LLM\n",
-      "    The language model used to generate the data cleaning function.\n",
-      "n_samples : int, optional\n",
-      "    Number of samples used when summarizing the dataset. Defaults to 30. Reducing this number can help \n",
-      "    avoid exceeding the model's token limits.\n",
-      "log : bool, optional\n",
-      "    Whether to log the generated code and errors. Defaults to False.\n",
-      "log_path : str, optional\n",
-      "    Directory path for storing log files. Defaults to None.\n",
-      "file_name : str, optional\n",
-      "    Name of the file for saving the generated response. Defaults to \"data_cleaner.py\".\n",
-      "overwrite : bool, optional\n",
-      "    Whether to overwrite the log file if it exists. If False, a unique file name is created. Defaults to True.\n",
-      "human_in_the_loop : bool, optional\n",
-      "    Enables user review of data cleaning instructions. Defaults to False.\n",
-      "bypass_recommended_steps : bool, optional\n",
-      "    If True, skips the default recommended cleaning steps. Defaults to False.\n",
-      "bypass_explain_code : bool, optional\n",
-      "    If True, skips the step that provides code explanations. Defaults to False.\n",
-      "\n",
-      "Methods\n",
-      "-------\n",
-      "update_params(**kwargs)\n",
-      "    Updates the agent's parameters and rebuilds the compiled state graph.\n",
-      "ainvoke(user_instructions: str, data_raw: pd.DataFrame, max_retries=3, retry_count=0)\n",
-      "    Cleans the provided dataset asynchronously based on user instructions.\n",
-      "invoke(user_instructions: str, data_raw: pd.DataFrame, max_retries=3, retry_count=0)\n",
-      "    Cleans the provided dataset synchronously based on user instructions.\n",
-      "explain_cleaning_steps()\n",
-      "    Returns an explanation of the cleaning steps performed by the agent.\n",
-      "get_log_summary()\n",
-      "    Retrieves a summary of logged operations if logging is enabled.\n",
-      "get_state_keys()\n",
-      "    Returns a list of keys from the state graph response.\n",
-      "get_state_properties()\n",
-      "    Returns detailed properties of the state graph response.\n",
-      "get_data_cleaned()\n",
-      "    Retrieves the cleaned dataset as a pandas DataFrame.\n",
-      "get_data_raw()\n",
-      "    Retrieves the raw dataset as a pandas DataFrame.\n",
-      "get_data_cleaner_function()\n",
-      "    Retrieves the generated Python function used for cleaning the data.\n",
-      "get_recommended_cleaning_steps()\n",
-      "    Retrieves the agent's recommended cleaning steps.\n",
-      "\n",
-      "Examples\n",
-      "--------\n",
-      "```python\n",
-      "import pandas as pd\n",
-      "from langchain_openai import ChatOpenAI\n",
-      "from ai_data_science_team.agents import DataCleaningAgent\n",
-      "\n",
-      "llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
-      "\n",
-      "data_cleaning_agent = DataCleaningAgent(\n",
-      "    model=llm, n_samples=50, log=True, log_path=\"logs\", human_in_the_loop=True\n",
-      ")\n",
-      "\n",
-      "df = pd.read_csv(\"https://raw.githubusercontent.com/business-science/ai-data-science-team/refs/heads/master/data/churn_data.csv\")\n",
-      "\n",
-      "data_cleaning_agent.invoke(\n",
-      "    user_instructions=\"Don't remove outliers when cleaning the data.\",\n",
-      "    data_raw=df,\n",
-      "    max_retries=3,\n",
-      "    retry_count=0\n",
-      ")\n",
-      "\n",
-      "cleaned_data = data_cleaning_agent.get_data_cleaned()\n",
-      "\n",
-      "response = data_cleaning_agent.response\n",
-      "```\n",
-      "\n",
-      "Returns\n",
-      "--------\n",
-      "DataCleaningAgent : langchain.graphs.CompiledStateGraph \n",
-      "    A data cleaning agent implemented as a compiled state graph. \n",
-      "\u001b[0;31mInit docstring:\u001b[0m\n",
-      "Initialize the agent with provided parameters.\n",
-      "\n",
-      "Parameters:\n",
-      "    **params: Arbitrary keyword arguments representing the agent's parameters."
-     ]
-    }
-   ],
-   "source": [
-    "?data_cleaning_agent"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},