Version 1.2 #534

gventuri · 2023-09-07T08:33:32Z

Summary by CodeRabbit

Release Notes

New Feature: Introduced DfValidator for validating dataframes using Pydantic schemas.
New Feature: Added flexibility to the configuration handling by allowing either a Config object or a dictionary in various classes and methods.
New Feature: Added an optional sample_head parameter to the SmartDataframe class.
New Feature: Updated chart saving functionality to allow custom directories.
Deprecation: Marked Starcoder and Falcon classes as deprecated with warnings suggesting alternatives.
Test: Added new test cases for dataframe validation and chart saving functionality.
Documentation: Updated documentation with deprecation notices and minor changes.

* fix: directory for saving charts issue (#513) * (fix): update template (remove hardcode of directory) in `GeneratePythonCodePrompt` class * (fix): add passing "save_charts_path" from the config object to `get_prompt()` * tests: custom path for charts (#513) * (tests): add test method for a case passing custom path with `save_charts_path` parameter * test: add test for the custom chart prompt --------- Co-authored-by: Gabriele Venturi <[email protected]>

* feat[DFValidator]: Add return types * fix pre-commit errors * add return value * refactor: minor changes in validation --------- Co-authored-by: arslan-agory <[email protected]> Co-authored-by: Gabriele Venturi <[email protected]>

* fix: update type hint for config parameter (#526) * (fix): add `dict` as a possible type of object to pass in methods and functions where `config` parameter occurs * (feat): add logging of an exception to `load_config()` instead of silently suppressing an error * fix: use Union for config hints (#526) * (fix): replace "bitwise or" (`|`) usage with `Union` in type hints for `config` object

* feat(llm): deprecate falcon and starcoder * docs: mention Starcoder and Falcon being deprecated --------- Co-authored-by: Gabriele Venturi <[email protected]>

coderabbitai · 2023-09-07T08:33:38Z

Walkthrough

This update introduces flexibility in configuration handling, deprecates certain classes, enhances dataframe validation with Pydantic schemas, and improves chart saving functionality. It also includes necessary test updates to validate these changes.

Changes

File(s)	Summary
`docs/API/llms.md`, `pandasai/llm/falcon.py`, `pandasai/llm/starcoder.py`	Deprecated `Starcoder` and `Falcon` classes with alternative LLMs suggested.
`docs/LLMs/llms.md`	Updated documentation with image link addition and OpenAI models section removal.
`pandasai/__init__.py`, `pandasai/config.py`, `pandasai/helpers/code_manager.py`, `pandasai/smart_dataframe/__init__.py`, `pandasai/smart_datalake/__init__.py`	Introduced flexibility in configuration handling by allowing `Config` instances or dictionaries.
`pandasai/helpers/df_validator.py`, `tests/test_smartdataframe.py`	Added dataframe validation using Pydantic schemas and corresponding tests.
`pandasai/prompts/generate_python_code.py`, `tests/prompts/test_generate_python_code_prompt.py`	Improved chart saving functionality with custom directory support and updated tests.

🐇💻

Code changes here, code changes there,

Making our software fair and square.

Deprecations, validations, and more,

That's what this update has in store.

So let's hop to it, without any delay,

Celebrating the improvements we made today! 🎉🥕

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Mention @coderabbitai in any review comment for bot assistance.
Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.

coderabbitai

Review Status

Actionable comments generated: 3

Commits

Files that changed from the base of the PR and between 3a92d33 and 9e6a71a commits.

Files ignored due to filter (2)

mkdocs.yml
pyproject.toml

Files selected for processing (13)

docs/API/llms.md (1 hunks)
docs/LLMs/llms.md (2 hunks)
pandasai/init.py (4 hunks)
pandasai/config.py (2 hunks)
pandasai/helpers/code_manager.py (2 hunks)
pandasai/helpers/df_validator.py (1 hunks)
pandasai/llm/falcon.py (2 hunks)
pandasai/llm/starcoder.py (2 hunks)
pandasai/prompts/generate_python_code.py (1 hunks)
pandasai/smart_dataframe/init.py (5 hunks)
pandasai/smart_datalake/init.py (5 hunks)
tests/prompts/test_generate_python_code_prompt.py (2 hunks)
tests/test_smartdataframe.py (6 hunks)

Files skipped from review due to trivial changes (5)

docs/API/llms.md
docs/LLMs/llms.md
pandasai/llm/falcon.py
pandasai/llm/starcoder.py
pandasai/prompts/generate_python_code.py

Additional comments (Suppressed): 23

pandasai/config.py (1)

12-15: The type hint for override_config has been updated to accept either a Config object or a dictionary. This is a good change as it provides more flexibility in how the configuration can be provided.

pandasai/smart_dataframe/__init__.py (3)

56-62: The function signature has been updated to accept either a Config object or a dictionary for configuration. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

65-71: The docstring has been updated to reflect the changes in the function signature. The description of the config parameter now correctly indicates that it can be either a Config object or a dictionary.

238-253: A new method validate has been added which uses the DfValidator class to validate the dataframe rows based on a Pydantic schema input. This is a good addition as it allows for robust data validation using Pydantic's powerful and flexible validation capabilities.

pandasai/__init__.py (5)

103-117: The __init__ method of the PandasAI class has been updated to accept either a Config object or a dictionary for configuration. This change enhances flexibility and allows users to pass in configurations in a more convenient way. However, ensure that all calls to this function throughout the codebase have been updated to match the new signature.

146-148: Deprecation warning added for the PandasAI class, suggesting users to use SmartDataframe instead. This is a good practice as it informs users about the upcoming changes and gives them time to adapt their code accordingly.

150-150: The _config attribute of the PandasAI class is now initialized with a Config object. This is consistent with the change made in the __init__ method where it can now accept either a Config object or a dictionary for configuration.

163-172: No significant changes detected in the run method of the PandasAI class. The method signature and functionality remain the same.

200-209: No significant changes detected in the __call__ method of the PandasAI class. The method signature and functionality remain the same.

pandasai/helpers/df_validator.py (2)

1-52: The DfValidationResult class is well implemented. It encapsulates the result of a validation operation, including whether it passed and any errors that occurred. The use of properties for passed and errors ensures that these attributes are read-only, which is good for immutability. The add_error method allows adding an error message to the results and automatically sets passed to False, which is a logical behavior. The __bool__ method is also correctly implemented to return the value of passed, allowing a DfValidationResult object to be used in a boolean context.

54-126: The DfValidator class is responsible for validating a dataframe using a Pydantic schema. The _validate_batch method validates a batch of rows from the dataframe against the provided schema and returns any errors. The _df_to_list_of_dict method converts the dataframe to a list of dictionaries, which is the format required for validation. The validate method orchestrates the validation process by determining the type of the dataframe, converting it to a list of dictionaries, validating it, and returning the results. This class is well designed and implemented.

However, there's a potential issue with the _df_to_list_of_dict method. If the dataframe_type is neither "pandas" nor "polars", it returns an empty list. This could lead to silent failures if an unsupported dataframe type is used. It would be better to raise an exception in this case.

tests/prompts/test_generate_python_code_prompt.py (1)

26-27: The new variable save_charts_path is being set but not used in the assertion. Please ensure that the test case checks for the correct usage of this variable.

pandasai/helpers/code_manager.py (2)

21-27: The type hint for _config has been updated to accept either a Config object or a dictionary. This change enhances the flexibility of configuration, but it also introduces potential issues. If a dictionary is passed instead of a Config object, there might be missing keys that the Config object would have guaranteed. Ensure that all usages of _config in this class handle both Config objects and dictionaries correctly.

30-35: The __init__ method now accepts either a Config object or a dictionary for the config parameter. As mentioned above, ensure that all usages of config in this class handle both Config objects and dictionaries correctly.

pandasai/smart_datalake/__init__.py (5)

44-50: The type of _config has been changed from Config to Union[Config, dict]. This change allows for more flexibility in the configuration of the SmartDatalake class. However, it's important to ensure that all parts of the code that interact with _config can handle both Config objects and dictionaries.

60-72: The __init__ method now accepts either a Config object or a dictionary for the config parameter. This is a good change as it enhances the flexibility of the SmartDatalake class. However, make sure that the rest of the codebase is updated to reflect this change where necessary.

138-154: The _load_config method has been updated to accept either a Config object or a dictionary. The load_config function is then used to load the configuration. This is a good change as it enhances the flexibility of the SmartDatalake class. However, make sure that the load_config function can handle both Config objects and dictionaries.

164-170: The check for whether an LLM is a PandasAI LLM has been simplified by checking if the LLM has the _llm_type attribute. This is a cleaner way of performing the check and should improve readability.

262-268: The default_values dictionary now includes the save_charts_path key. This is a good addition as it allows the path for saving charts to be customized. However, make sure that the rest of the codebase is updated to reflect this change where necessary.

tests/test_smartdataframe.py (4)

14-14: The import of BaseModel and Field from pydantic is new. Ensure that the pydantic package is included in your project dependencies.

125-127: The test case test_init_without_llm has been updated to pass a configuration dictionary with llm set to None. This change aligns with the updates made to the SmartDataframe class to accept a configuration dictionary. The test case should fail as expected if no LLM is provided.

244-309: This is a new test case for saving charts in a non-default directory. It checks whether the chart is saved in the correct directory and whether the path to the saved chart is correctly returned. The test case seems well-written and covers all necessary assertions.

728-824: These are new test cases for validating dataframes using Pydantic schemas. They cover various scenarios, including validation passing, validation failing due to incorrect types, validation with Polars dataframes, validation failing for one record, and validation with complex Pydantic schemas. These tests are important for ensuring the correct functioning of the new dataframe validation feature.

coderabbitai · 2023-09-07T08:36:38Z

pandasai/config.py

            if config.get("callback") and not override_config.get("callback"):
                config["callback"] = getattr(callbacks, config["callback"])()
    except Exception:
-        pass
+        logger.error("Could not load configuration", exc_info=True)

    if override_config:
        config.update(override_config)

-    config = Config(**config)
-
    return config


The error handling has been improved by logging the exception details instead of silently passing it. However, the code no longer converts the final config dictionary into a Config object before returning it. This could potentially lead to issues if the rest of the codebase expects a Config object rather than a dictionary. Please verify that this change is compatible with the rest of the codebase.

- return config + config = Config(**config) + return config

coderabbitai · 2023-09-07T08:36:38Z

pandasai/helpers/df_validator.py

+        """
+        Create list of dict of dataframe rows on basis of dataframe type
+        Supports only polars and pandas dataframe
+
+        Args:
+            df: dataframe to be converted
+            dataframe_type: type of dataframe
+
+        Returns:
+            list of dict of dataframe rows
+        """
+        if dataframe_type == "pandas":
+            return df.to_dict(orient="records")
+        elif dataframe_type == "polars":
+            return df.to_dicts()
+        else:
+            return []


- else: - return [] + else: + raise ValueError(f"Unsupported DataFrame type: {dataframe_type}")

coderabbitai · 2023-09-07T08:36:39Z

tests/prompts/test_generate_python_code_prompt.py

+    def test_str_with_custom_save_charts_path(self):
+        """Test that the __str__ method is implemented"""
+
+        llm = FakeLLM("plt.show()")
+        dfs = [
+            SmartDataframe(
+                pd.DataFrame({"a": [1], "b": [4]}),
+                config={"llm": llm},
+            )
+        ]
+
+        prompt = GeneratePythonCodePrompt()
+        prompt.set_var("dfs", dfs)
+        prompt.set_var("conversation", "Question")
+        prompt.set_var("save_charts_path", "custom_path")
+
+        assert (
+            prompt.to_string()
+            == """
+You are provided with the following pandas DataFrames with the following metadata:
+
+Dataframe dfs[0], with 1 rows and 2 columns.
+This is the metadata of the dataframe dfs[0]:
+a,b
+1,4
+
+
+This is the initial python code to be updated:
+```python
+# TODO import all the dependencies required
+import pandas as pd
+
+# Analyze the data
+# 1. Prepare: Preprocessing and cleaning data if necessary
+# 2. Process: Manipulating data for analysis (grouping, filtering, aggregating, etc.)
+# 3. Analyze: Conducting the actual analysis (if the user asks to create a chart save it to an image in custom_path/temp_chart.png and do not show the chart.)
+# 4. Output: return a dictionary of:
+# - type (possible values "text", "number", "dataframe", "plot")
+# - value (can be a string, a dataframe or the path of the plot, NOT a dictionary)
+# Example output: { "type": "text", "value": "The average loan amount is $15,000." }
+def analyze_data(dfs: list[pd.DataFrame]) -> dict:
+   # Code goes here (do not add comments)
+
+
 # Declare a result variable
 result = analyze_data(dfs)
 ```


A new test case test_str_with_custom_save_charts_path has been added to verify the functionality of setting a custom path for saving charts. The test case seems well-structured and covers the necessary assertions. However, it would be beneficial to add an assertion to check if the chart is actually saved at the specified location.

+ assert os.path.exists("custom_path/temp_chart.png")

mspronesti · 2023-09-07T08:44:21Z

@gventuri Are we planning to also discontinue the HuggingfaceLLM class? After the suggestions we got in this thread #525, we could consider to keep that class, parametrizing the model name (to append it to the api) in order to allow users to use starchat-beta, codellama-7b, codellama-13b, llama-2-7b and any future fine-tune but without "promising" any performances, as we would with a dedicated class.
Does it make sense? :)

gventuri · 2023-09-07T08:50:16Z

@mspronesti I suppose we should basically change the implementation so that we go for the "official" way (not limited to 10 tokens with the "hacky" thing of running multiple calls to get the whole code generated

gventuri and others added 8 commits September 4, 2023 17:48

fix: langchain models wrapper

63ced9a

Release v1.1.1

74ec223

docs: add video explanation for LLM

167a15a

feat: validate dataframe with Pydantic schema (#522)

c986050

* feat[DFValidator]: Add return types * fix pre-commit errors * add return value * refactor: minor changes in validation --------- Co-authored-by: arslan-agory <[email protected]> Co-authored-by: Gabriele Venturi <[email protected]>

feat(llm): deprecate falcon and starcoder (#530)

7232d9b

* feat(llm): deprecate falcon and starcoder * docs: mention Starcoder and Falcon being deprecated --------- Co-authored-by: Gabriele Venturi <[email protected]>

Release v1.1.2

9e6a71a

coderabbitai bot reviewed Sep 7, 2023

View reviewed changes

gventuri merged commit 9955000 into feature/v1.2 Sep 7, 2023
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.2 #534

Version 1.2 #534

gventuri commented Sep 7, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 7, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

Pause Incremental Reviews

coderabbitai bot left a comment

coderabbitai bot Sep 7, 2023

coderabbitai bot Sep 7, 2023

coderabbitai bot Sep 7, 2023

mspronesti commented Sep 7, 2023

gventuri commented Sep 7, 2023

Version 1.2 #534

Version 1.2 #534

Conversation

gventuri commented Sep 7, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Sep 7, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

Pause Incremental Reviews

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 7, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 7, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 7, 2023

Choose a reason for hiding this comment

mspronesti commented Sep 7, 2023

gventuri commented Sep 7, 2023

gventuri commented Sep 7, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 7, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)