-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 1.2 #534
Version 1.2 #534
Conversation
* fix: directory for saving charts issue (#513) * (fix): update template (remove hardcode of directory) in `GeneratePythonCodePrompt` class * (fix): add passing "save_charts_path" from the config object to `get_prompt()` * tests: custom path for charts (#513) * (tests): add test method for a case passing custom path with `save_charts_path` parameter * test: add test for the custom chart prompt --------- Co-authored-by: Gabriele Venturi <[email protected]>
* feat[DFValidator]: Add return types * fix pre-commit errors * add return value * refactor: minor changes in validation --------- Co-authored-by: arslan-agory <[email protected]> Co-authored-by: Gabriele Venturi <[email protected]>
* fix: update type hint for config parameter (#526) * (fix): add `dict` as a possible type of object to pass in methods and functions where `config` parameter occurs * (feat): add logging of an exception to `load_config()` instead of silently suppressing an error * fix: use Union for config hints (#526) * (fix): replace "bitwise or" (`|`) usage with `Union` in type hints for `config` object
* feat(llm): deprecate falcon and starcoder * docs: mention Starcoder and Falcon being deprecated --------- Co-authored-by: Gabriele Venturi <[email protected]>
WalkthroughThis update introduces flexibility in configuration handling, deprecates certain classes, enhances dataframe validation with Pydantic schemas, and improves chart saving functionality. It also includes necessary test updates to validate these changes. Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Files ignored due to filter (2)
- mkdocs.yml
- pyproject.toml
Files selected for processing (13)
- docs/API/llms.md (1 hunks)
- docs/LLMs/llms.md (2 hunks)
- pandasai/init.py (4 hunks)
- pandasai/config.py (2 hunks)
- pandasai/helpers/code_manager.py (2 hunks)
- pandasai/helpers/df_validator.py (1 hunks)
- pandasai/llm/falcon.py (2 hunks)
- pandasai/llm/starcoder.py (2 hunks)
- pandasai/prompts/generate_python_code.py (1 hunks)
- pandasai/smart_dataframe/init.py (5 hunks)
- pandasai/smart_datalake/init.py (5 hunks)
- tests/prompts/test_generate_python_code_prompt.py (2 hunks)
- tests/test_smartdataframe.py (6 hunks)
Files skipped from review due to trivial changes (5)
- docs/API/llms.md
- docs/LLMs/llms.md
- pandasai/llm/falcon.py
- pandasai/llm/starcoder.py
- pandasai/prompts/generate_python_code.py
Additional comments (Suppressed): 23
pandasai/config.py (1)
- 12-15: The type hint for
override_config
has been updated to accept either aConfig
object or a dictionary. This is a good change as it provides more flexibility in how the configuration can be provided.pandasai/smart_dataframe/__init__.py (3)
56-62: The function signature has been updated to accept either a
Config
object or a dictionary for configuration. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.65-71: The docstring has been updated to reflect the changes in the function signature. The description of the
config
parameter now correctly indicates that it can be either aConfig
object or a dictionary.238-253: A new method
validate
has been added which uses theDfValidator
class to validate the dataframe rows based on a Pydantic schema input. This is a good addition as it allows for robust data validation using Pydantic's powerful and flexible validation capabilities.pandasai/__init__.py (5)
103-117: The
__init__
method of thePandasAI
class has been updated to accept either aConfig
object or a dictionary for configuration. This change enhances flexibility and allows users to pass in configurations in a more convenient way. However, ensure that all calls to this function throughout the codebase have been updated to match the new signature.146-148: Deprecation warning added for the
PandasAI
class, suggesting users to useSmartDataframe
instead. This is a good practice as it informs users about the upcoming changes and gives them time to adapt their code accordingly.150-150: The
_config
attribute of thePandasAI
class is now initialized with aConfig
object. This is consistent with the change made in the__init__
method where it can now accept either aConfig
object or a dictionary for configuration.163-172: No significant changes detected in the
run
method of thePandasAI
class. The method signature and functionality remain the same.200-209: No significant changes detected in the
__call__
method of thePandasAI
class. The method signature and functionality remain the same.pandasai/helpers/df_validator.py (2)
1-52: The
DfValidationResult
class is well implemented. It encapsulates the result of a validation operation, including whether it passed and any errors that occurred. The use of properties forpassed
anderrors
ensures that these attributes are read-only, which is good for immutability. Theadd_error
method allows adding an error message to the results and automatically setspassed
toFalse
, which is a logical behavior. The__bool__
method is also correctly implemented to return the value ofpassed
, allowing aDfValidationResult
object to be used in a boolean context.54-126: The
DfValidator
class is responsible for validating a dataframe using a Pydantic schema. The_validate_batch
method validates a batch of rows from the dataframe against the provided schema and returns any errors. The_df_to_list_of_dict
method converts the dataframe to a list of dictionaries, which is the format required for validation. Thevalidate
method orchestrates the validation process by determining the type of the dataframe, converting it to a list of dictionaries, validating it, and returning the results. This class is well designed and implemented.However, there's a potential issue with the
_df_to_list_of_dict
method. If thedataframe_type
is neither "pandas" nor "polars", it returns an empty list. This could lead to silent failures if an unsupported dataframe type is used. It would be better to raise an exception in this case.tests/prompts/test_generate_python_code_prompt.py (1)
- 26-27: The new variable
save_charts_path
is being set but not used in the assertion. Please ensure that the test case checks for the correct usage of this variable.pandasai/helpers/code_manager.py (2)
21-27: The type hint for
_config
has been updated to accept either aConfig
object or a dictionary. This change enhances the flexibility of configuration, but it also introduces potential issues. If a dictionary is passed instead of aConfig
object, there might be missing keys that theConfig
object would have guaranteed. Ensure that all usages of_config
in this class handle bothConfig
objects and dictionaries correctly.30-35: The
__init__
method now accepts either aConfig
object or a dictionary for theconfig
parameter. As mentioned above, ensure that all usages ofconfig
in this class handle bothConfig
objects and dictionaries correctly.pandasai/smart_datalake/__init__.py (5)
44-50: The type of
_config
has been changed fromConfig
toUnion[Config, dict]
. This change allows for more flexibility in the configuration of theSmartDatalake
class. However, it's important to ensure that all parts of the code that interact with_config
can handle bothConfig
objects and dictionaries.60-72: The
__init__
method now accepts either aConfig
object or a dictionary for theconfig
parameter. This is a good change as it enhances the flexibility of theSmartDatalake
class. However, make sure that the rest of the codebase is updated to reflect this change where necessary.138-154: The
_load_config
method has been updated to accept either aConfig
object or a dictionary. Theload_config
function is then used to load the configuration. This is a good change as it enhances the flexibility of theSmartDatalake
class. However, make sure that theload_config
function can handle bothConfig
objects and dictionaries.164-170: The check for whether an LLM is a PandasAI LLM has been simplified by checking if the LLM has the
_llm_type
attribute. This is a cleaner way of performing the check and should improve readability.262-268: The
default_values
dictionary now includes thesave_charts_path
key. This is a good addition as it allows the path for saving charts to be customized. However, make sure that the rest of the codebase is updated to reflect this change where necessary.tests/test_smartdataframe.py (4)
14-14: The import of
BaseModel
andField
frompydantic
is new. Ensure that thepydantic
package is included in your project dependencies.125-127: The test case
test_init_without_llm
has been updated to pass a configuration dictionary withllm
set toNone
. This change aligns with the updates made to theSmartDataframe
class to accept a configuration dictionary. The test case should fail as expected if no LLM is provided.244-309: This is a new test case for saving charts in a non-default directory. It checks whether the chart is saved in the correct directory and whether the path to the saved chart is correctly returned. The test case seems well-written and covers all necessary assertions.
728-824: These are new test cases for validating dataframes using Pydantic schemas. They cover various scenarios, including validation passing, validation failing due to incorrect types, validation with Polars dataframes, validation failing for one record, and validation with complex Pydantic schemas. These tests are important for ensuring the correct functioning of the new dataframe validation feature.
if config.get("callback") and not override_config.get("callback"): | ||
config["callback"] = getattr(callbacks, config["callback"])() | ||
except Exception: | ||
pass | ||
logger.error("Could not load configuration", exc_info=True) | ||
|
||
if override_config: | ||
config.update(override_config) | ||
|
||
config = Config(**config) | ||
|
||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling has been improved by logging the exception details instead of silently passing it. However, the code no longer converts the final config
dictionary into a Config
object before returning it. This could potentially lead to issues if the rest of the codebase expects a Config
object rather than a dictionary. Please verify that this change is compatible with the rest of the codebase.
- return config
+ config = Config(**config)
+ return config
""" | ||
Create list of dict of dataframe rows on basis of dataframe type | ||
Supports only polars and pandas dataframe | ||
|
||
Args: | ||
df: dataframe to be converted | ||
dataframe_type: type of dataframe | ||
|
||
Returns: | ||
list of dict of dataframe rows | ||
""" | ||
if dataframe_type == "pandas": | ||
return df.to_dict(orient="records") | ||
elif dataframe_type == "polars": | ||
return df.to_dicts() | ||
else: | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- else:
- return []
+ else:
+ raise ValueError(f"Unsupported DataFrame type: {dataframe_type}")
def test_str_with_custom_save_charts_path(self): | ||
"""Test that the __str__ method is implemented""" | ||
|
||
llm = FakeLLM("plt.show()") | ||
dfs = [ | ||
SmartDataframe( | ||
pd.DataFrame({"a": [1], "b": [4]}), | ||
config={"llm": llm}, | ||
) | ||
] | ||
|
||
prompt = GeneratePythonCodePrompt() | ||
prompt.set_var("dfs", dfs) | ||
prompt.set_var("conversation", "Question") | ||
prompt.set_var("save_charts_path", "custom_path") | ||
|
||
assert ( | ||
prompt.to_string() | ||
== """ | ||
You are provided with the following pandas DataFrames with the following metadata: | ||
|
||
Dataframe dfs[0], with 1 rows and 2 columns. | ||
This is the metadata of the dataframe dfs[0]: | ||
a,b | ||
1,4 | ||
|
||
|
||
This is the initial python code to be updated: | ||
```python | ||
# TODO import all the dependencies required | ||
import pandas as pd | ||
|
||
# Analyze the data | ||
# 1. Prepare: Preprocessing and cleaning data if necessary | ||
# 2. Process: Manipulating data for analysis (grouping, filtering, aggregating, etc.) | ||
# 3. Analyze: Conducting the actual analysis (if the user asks to create a chart save it to an image in custom_path/temp_chart.png and do not show the chart.) | ||
# 4. Output: return a dictionary of: | ||
# - type (possible values "text", "number", "dataframe", "plot") | ||
# - value (can be a string, a dataframe or the path of the plot, NOT a dictionary) | ||
# Example output: { "type": "text", "value": "The average loan amount is $15,000." } | ||
def analyze_data(dfs: list[pd.DataFrame]) -> dict: | ||
# Code goes here (do not add comments) | ||
|
||
|
||
# Declare a result variable | ||
result = analyze_data(dfs) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new test case test_str_with_custom_save_charts_path
has been added to verify the functionality of setting a custom path for saving charts. The test case seems well-structured and covers the necessary assertions. However, it would be beneficial to add an assertion to check if the chart is actually saved at the specified location.
+ assert os.path.exists("custom_path/temp_chart.png")
@gventuri Are we planning to also discontinue the |
@mspronesti I suppose we should basically change the implementation so that we go for the "official" way (not limited to 10 tokens with the "hacky" thing of running multiple calls to get the whole code generated |
Summary by CodeRabbit
Release Notes
DfValidator
for validating dataframes using Pydantic schemas.Config
object or a dictionary in various classes and methods.sample_head
parameter to theSmartDataframe
class.Starcoder
andFalcon
classes as deprecated with warnings suggesting alternatives.