Merge branch 'main' of https://github.com/gventuri/pandas-ai

sinaptik-ai · Nov 21, 2023 · eda9bb7 · eda9bb7
2 parents ac89b97 + abd1538
commit eda9bb7
Show file tree

Hide file tree

Showing 25 changed files with 295 additions and 60 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -24,7 +24,11 @@ jobs:
       - name: Install dependencies
         run: poetry install --all-extras
       - name: Lint with ruff
-        run: poetry run ruff pandasai examples
+        run: |
+          poetry run ruff pandasai examples
+          poetry run ruff format pandasai examples --diff
+      - name: Spellcheck
+        run: poetry run codespell pandasai docs examples tests
       - name: Run tests
         run: poetry run pytest
       - name: Run code coverage

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -48,9 +48,18 @@ Make sure that the linter does not report any errors or warnings before submitti
 We use `ruff` to reformat the code by running the following command:
 
 ```bash
-ruff format pandasai
+ruff format pandasai examples
 ```
 
+### Spell check
+
+We usee `codespell` to check the spelling of our code. You can run codespell by running the following command:
+
+```bash
+codespell pandasai docs examples -w
+```
+
+
 ### 🧪 Testing
 
 We use `pytest` to test our code. You can run the tests by running the following command:

diff --git a/docs/LLMs/llms.md b/docs/LLMs/llms.md
@@ -178,7 +178,7 @@ langchain_llm = OpenAI(openai_api_key="my-openai-api-key")
 df = SmartDataframe("data.csv", {"llm": langchain_llm})
 ```
 
-PandasAI will automatically detect that you are using a LangChain llm and will convert it to a PandasAI llm.
+PandasAI will automatically detect that you are using a LangChain LLM and will convert it to a PandasAI LLM.
 
 ### More information
 

diff --git a/docs/building_docs.md b/docs/building_docs.md
@@ -46,7 +46,7 @@ Below is the rundown of documentation structure for `pandasai`, you need to know
 2. copy `mkdocs.yml`, `.readthedocs.yaml` and the `docs/` folder into your project root.
 3. `docs/API` contains the API documentation created using `docstring`. For any new module, add the links here
 4. Project is using standard Google Docstring Style.
-5. Rebuild the documenation locally to see that it works.
+5. Rebuild the documentation locally to see that it works.
 6. Documentation are hosted on [Read the Docs tutorial](https://docs.readthedocs.io/en/stable/tutorial/)
 
 > Define the release version in `mkdocs.yml` file.

diff --git a/docs/custom-prompts.md b/docs/custom-prompts.md
@@ -25,7 +25,7 @@ class MyCustomPrompt(AbstractPrompt):
         return """This is your custom text for your prompt with custom {my_custom_value}"""
 
     def setup(self, kwargs):
-        # This method is called before the prompt is intialized
+        # This method is called before the prompt is initialized
         # You can use it to setup your prompt and pass any additional
         # variables to the template
         self.set_var("my_custom_value", kwargs["my_custom_value"])

diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -2,7 +2,7 @@
 
 ## Installation
 
-To use `pandasai`, first install it
+To use `pandasai`, first install it:
 
 ```console
 # Using poetry (recommended)
@@ -12,11 +12,11 @@ poetry add pandasai
 pip install pandasai
 ```
 
-> Before you install it, we recommended to create a Virtual environment using your preffred choice of Environment Managers e.g [Poetry](https://python-poetry.org/), [Pipenv](https://pipenv.pypa.io/en/latest/), [Conda](https://docs.conda.io/en/latest/), [Virtualenv](https://virtualenv.pypa.io/en/latest/), [Venv](https://docs.python.org/3/library/venv.html) etc.
+> Before installation, we recommend you create a virtual environment using your preferred choice of environment manager e.g [Poetry](https://python-poetry.org/), [Pipenv](https://pipenv.pypa.io/en/latest/), [Conda](https://docs.conda.io/en/latest/), [Virtualenv](https://virtualenv.pypa.io/en/latest/), [Venv](https://docs.python.org/3/library/venv.html) etc.
 
-### Optional Installs
+### Optional dependencies
 
-To keep the package size small, we have decided to make some dependencies that are not required by default. These dependencies are required for some features of `pandasai`. To install `pandasai` with these extra dependencies, run
+To keep the package size small, we have decided to make some dependencies optional. To install `pandasai` with these extra dependencies, run:
 
 ```console
 pip install pandasai[extra-dependency-name]
@@ -61,20 +61,20 @@ df.chat('Which are the 5 happiest countries?')
 # Output: United Kingdom, Canada, Australia, United States, Germany
 ```
 
-If you want to get to know more about the `SmartDataframe` class, check out this video:
+If you want to learn more about the `SmartDataframe` class, check out this video:
 
 [![Intro to SmartDataframe](https://cdn.loom.com/sessions/thumbnails/1ec1b8fbaa0e4ae0ab99b728b8b05fdb-00001.jpg)](https://www.loom.com/embed/1ec1b8fbaa0e4ae0ab99b728b8b05fdb?sid=7370854b-57c3-4f00-801b-69811a98d970 "Intro to SmartDataframe")
 
-### How to generate OpenAI API Token
+### How to generate an OpenAI API Token
 
-Users are required to generate `YOUR_API_TOKEN`. Follow below simple steps to generate your API_TOKEN with
+Users are required to generate `YOUR_API_TOKEN`. Follow these simple steps to generate `YOUR_API_TOKEN` with
 [openai](https://platform.openai.com/overview).
 
 1. Go to https://openai.com/api/ and signup with your email address or connect your Google Account.
-2. Go to View API Keys on left side of your Personal Account Settings
-3. Select Create new Secret key
+2. Go to View API Keys on left side of your Personal Account Settings.
+3. Select Create new Secret key.
 
-> The API access to openai is a paid service. You have to set up billing.
+> The API access to OPENAI is a paid service. You have to set up billing.
 > Read the [Pricing](https://platform.openai.com/docs/quickstart/pricing) information before experimenting.
 
 ### Passing name and description
@@ -106,7 +106,7 @@ df3 = "data/Loan payments data.xlsx"
 dl = SmartDatalake([df1, df2, df3])
 ```
 
-Then, you can use the `SmartDatalake` as follows, similar to how you would use a `SmartDataframe`:
+Then, similar to how you would use a `SmartDataframe`, you can use the `SmartDatalake` as follows:
 
 ```python
 dl.chat('Which are the 5 happiest countries?')

diff --git a/examples/from_sql.py b/examples/from_sql.py
@@ -38,7 +38,7 @@
     }
 )
 
-# With a Sqlite databse
+# With a Sqlite database
 
 invoice_connector = SqliteConnector(
     config={

diff --git a/examples/with_multiple_dataframes.py b/examples/with_multiple_dataframes.py
@@ -25,6 +25,6 @@
     [employees_df, salaries_df],
     config={"llm": llm, "verbose": True},
 )
-response = dl.chat("Plot salaries againt name")
+response = dl.chat("Plot salaries against name")
 print(response)
 # Output: <displays the plot>
diff --git a/pandasai/agent/__init__.py b/pandasai/agent/__init__.py
@@ -159,7 +159,7 @@ def explain(self) -> str:
             )
             response = self._call_llm_with_prompt(prompt)
             self._logger.log(
-                f"""Explaination:  {response}
+                f"""Explanation:  {response}
                 """
             )
             return response

diff --git a/pandasai/connectors/airtable.py b/pandasai/connectors/airtable.py
@@ -213,7 +213,7 @@ def head(self):
 
         Returns :
             DatFrameType: The head of the data source
-                 that the conector is connected to .
+                 that the connector is connected to .
         """
         data = self._request_api(params={"maxRecords": 5})
         return pd.DataFrame(

diff --git a/pandasai/connectors/sql.py b/pandasai/connectors/sql.py
@@ -112,7 +112,7 @@ def _validate_column_name(self, column_name):
     def _build_query(self, limit=None, order=None):
         base_query = select("*").select_from(text(self._config.table))
         if self._config.where or self._additional_filters:
-            # conditions is the list of wher + additional filters
+            # conditions is the list of where + additional filters
             conditions = []
             if self._config.where:
                 conditions += self._config.where
@@ -412,7 +412,7 @@ class SqliteConnector(SQLConnector):
 
     def __init__(self, config: Union[SqliteConnectorConfig, dict]):
         """
-        Intialize the Sqlite connector with the given configuration.
+        Initialize the Sqlite connector with the given configuration.
 
         Args:
             config (ConnectorConfig) : The configuration for the MySQL connector.

diff --git a/pandasai/exceptions.py b/pandasai/exceptions.py
@@ -8,7 +8,7 @@
 class InvalidRequestError(Exception):
 
     """
-    Raised when the request is not succesfull.
+    Raised when the request is not successful.
 
     Args :
         Exception (Exception): InvalidRequestError
@@ -80,7 +80,7 @@ def __init__(self, model_name):
 
 class MissingModelError(Exception):
     """
-    Raised when deployment name is not passed to azure as it's a required paramter
+    Raised when deployment name is not passed to azure as it's a required parameter
 
     Args:
     Exception (Exception): MissingModelError
@@ -166,7 +166,7 @@ class InvalidWorkspacePathError(Exception):
 
 class InvalidConfigError(Exception):
     """
-    Raised when config value is not appliable
+    Raised when config value is not applicable
     Args:
         Exception (Exception): InvalidConfigError
     """
@@ -176,5 +176,5 @@ class MaliciousQueryError(Exception):
     """
     Raise error if malicious query is generated
     Args:
-        Exception (Excpetion): MaliciousQueryError
+        Exception (Exception): MaliciousQueryError
     """
diff --git a/pandasai/helpers/code_manager.py b/pandasai/helpers/code_manager.py
@@ -433,7 +433,7 @@ def _get_nearest_func_call(current_lineno, calls, func_name):
     @staticmethod
     def _tokenize_operand(operand_node: ast.expr) -> Generator[str, None, None]:
         """
-        Utility generator function to get subscript slice contants.
+        Utility generator function to get subscript slice constants.
 
         Args:
             operand_node (ast.expr):
@@ -467,7 +467,7 @@ def _get_df_id_by_nearest_assignment(
         current_lineno: int, assignments: list[ast.Assign], target_name: str
     ):
         """
-        Utility function to get df label by finding the nearest assigment.
+        Utility function to get df label by finding the nearest assignment.
 
         Sort assignment nodes list (copy of the list) by line number.
         Iterate over the assignment nodes list. If the assignment node's value

diff --git a/pandasai/llm/base.py b/pandasai/llm/base.py
@@ -68,7 +68,7 @@ def _polish_code(self, code: str) -> str:
         removing the imports and removing trailing spaces and new lines.
 
         Args:
-            code (str): A sting of Python code.
+            code (str): A string of Python code.
 
         Returns:
             str: Polished code.

diff --git a/pandasai/pipelines/logic_units/prompt_execution.py b/pandasai/pipelines/logic_units/prompt_execution.py
@@ -7,7 +7,7 @@
 class PromptExecution(BaseLogicUnit):
     def execute(self, input: FileBasedPrompt, **kwargs) -> Any:
         config = kwargs.get("config")
-        if config is None or getattr(config, 'llm', None) is None:
+        if config is None or getattr(config, "llm", None) is None:
             raise LLMNotFoundError()
-        llm = getattr(config, 'llm')
+        llm = getattr(config, "llm")
         return llm.call(input)
diff --git a/pandasai/pipelines/pipeline.py b/pandasai/pipelines/pipeline.py
@@ -27,7 +27,7 @@ def __init__(
         logger: Optional[Logger] = None,
     ):
         """
-        Intialize the pipeline with given context and configuration
+        Initialize the pipeline with given context and configuration
             parameters.
         Args :
             context (Context) : Context is required for ResponseParsers.

diff --git a/pandasai/pipelines/smart_datalake_chat/result_parsing.py b/pandasai/pipelines/smart_datalake_chat/result_parsing.py
@@ -41,7 +41,7 @@ def _add_result_to_memory(self, result: dict, context: PipelineContext):
 
         Args:
             result (dict): The result to add to the memory
-            context (PipelineContext) : Pipleline Context
+            context (PipelineContext) : Pipeline Context
         """
         if result is None:
             return

diff --git a/pandasai/prompts/generate_python_code.py b/pandasai/prompts/generate_python_code.py
@@ -29,7 +29,7 @@ def setup(self, **kwargs) -> None:
         if kwargs.get("dfs_declared", False):
             self.set_var(
                 "dfs_declared_message",
-                "The variable `dfs: list[pd.DataFrame]` is already decalared.",
+                "The variable `dfs: list[pd.DataFrame]` is already declared.",
             )
         else:
             self.set_var("dfs_declared_message", "")

diff --git a/pandasai/smart_datalake/__init__.py b/pandasai/smart_datalake/__init__.py
@@ -410,7 +410,7 @@ def prepare_context_for_smart_datalake_pipeline(
         self, query: str, output_type: Optional[str] = None
     ) -> PipelineContext:
         """
-        Prepare Pipeline Context to intiate Smart Data Lake Pipeline.
+        Prepare Pipeline Context to initiate Smart Data Lake Pipeline.
 
         Args:
             query (str): Query to run on the dataframe