diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml
new file mode 100644
index 00000000..20dfb849
--- /dev/null
+++ b/.github/workflows/codespell.yml
@@ -0,0 +1,23 @@
+# Codespell configuration is within setup.cfg
+---
+name: Codespell
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+permissions:
+  contents: read
+
+jobs:
+  codespell:
+    name: Check for spelling errors
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: Codespell
+        uses: codespell-project/actions-codespell@v2
diff --git a/docs/_config.yml b/docs/_config.yml
index f31aa6f2..344dac56 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -10,7 +10,7 @@ exclude_patterns            : [ README.md, _build, Thumbs.db, .DS_Store,
 
 execute:
   execute_notebooks         : off  # Whether to execute notebooks at build time. Must be one of ("auto", "force", "cache", "off")
-  cache                     : ""    # A path to the jupyter cache that will be used to store execution artifacs. Defaults to `_build/.jupyter_cache/`
+  cache                     : ""    # A path to the jupyter cache that will be used to store execution artifacts. Defaults to `_build/.jupyter_cache/`
   # exclude_patterns          : [content/Download_Data.ipynb]   # A list of patterns to *skip* in execution (e.g. a notebook that takes a really long time)
   timeout                   : 30    # The maximum time (in seconds) each notebook cell is allowed to run.
   run_in_temp               : true # If `True`, then a temporary directory will be created and used as the command working directory (cwd),
diff --git a/docs/basic_tutorials/01_basics.ipynb b/docs/basic_tutorials/01_basics.ipynb
index 1312bd58..0e630794 100644
--- a/docs/basic_tutorials/01_basics.ipynb
+++ b/docs/basic_tutorials/01_basics.ipynb
@@ -32,7 +32,7 @@
     "\n",
     "A detector is a swiss-army-knife class that \"glues\" together a particular combination of a Face, Landmark, Action Unit, and Emotion detection model into a single object. This allows us to provide a very easy-to-use high-level API, e.g. `detector.detect_image('my_image.jpg')`, which will automatically make use of the correct underlying model to solve the sub-tasks of identifying face locations, getting landmarks, extracting action units, etc. \n",
     "\n",
-    "The first time you initialize a `Detector` instance on your computer will take a moment as Py-Feat will automatically download required pretrained model weights for you and save them to disk. Everytime after that it will use existing model weights:\n"
+    "The first time you initialize a `Detector` instance on your computer will take a moment as Py-Feat will automatically download required pretrained model weights for you and save them to disk. Every time after that it will use existing model weights:\n"
    ]
   },
   {
diff --git a/docs/basic_tutorials/02_detector_imgs.ipynb b/docs/basic_tutorials/02_detector_imgs.ipynb
index 9470a976..8a15d5f4 100644
--- a/docs/basic_tutorials/02_detector_imgs.ipynb
+++ b/docs/basic_tutorials/02_detector_imgs.ipynb
@@ -590,7 +590,7 @@
    "source": [
     "#### Loading detection results from a saved file\n",
     "\n",
-    "We can load this output using the `read_feat()` function, which behaves just like `pd.read_csv` from Pandas, but returns a `Fex` data class instead of a DataFrame. This gives you the full suite of Fex funcionality right away."
+    "We can load this output using the `read_feat()` function, which behaves just like `pd.read_csv` from Pandas, but returns a `Fex` data class instead of a DataFrame. This gives you the full suite of Fex functionality right away."
    ]
   },
   {
@@ -1386,7 +1386,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "However, it's easy to use pandas slicing sytax to just grab predictions for the image you want. For example you can use `.loc` and chain it to `.plot_detections()`:"
+    "However, it's easy to use pandas slicing syntax to just grab predictions for the image you want. For example you can use `.loc` and chain it to `.plot_detections()`:"
    ]
   },
   {
diff --git a/docs/basic_tutorials/04_plotting.ipynb b/docs/basic_tutorials/04_plotting.ipynb
index 0985fbcd..d53f7fd1 100644
--- a/docs/basic_tutorials/04_plotting.ipynb
+++ b/docs/basic_tutorials/04_plotting.ipynb
@@ -125,7 +125,7 @@
    "source": [
     "### Adding muscle heatmaps to the plot\n",
     "\n",
-    "We can also visualize how AU intensity affects the underyling facial muscle movement by passing in a dictionary of facial muscle names and colors (or the value `'heatmap'`) to `plot_face()`. \n",
+    "We can also visualize how AU intensity affects the underlying facial muscle movement by passing in a dictionary of facial muscle names and colors (or the value `'heatmap'`) to `plot_face()`. \n",
     "\n",
     "Below we activate 2 AUs and use the key `'all'` with the value `'heatmap'` to overlay muscle movement intensities affected by these specific AUs:"
    ]
@@ -172,7 +172,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "But it's also possibile to arbitrarily highlight any facial muscle by setting it to a color instead. This ignores the AU intensity and useful for highlighting specific facial muscles. Below we higlight two different muscles on a neutral face:"
+    "But it's also possible to arbitrarily highlight any facial muscle by setting it to a color instead. This ignores the AU intensity and useful for highlighting specific facial muscles. Below we highlight two different muscles on a neutral face:"
    ]
   },
   {
@@ -504,7 +504,7 @@
     "\n",
     "While `animate_face()` is useful for animating a single facial expression, sometimes you might want to make more complex multi-face animations. We can do that using `plot_face()` along with the `interpolate_aus()` helper function which will generate intermediate AU intensity values between two arrays in a manner that creates graceful animations ([cubic bezier easing function](https://easings.net/)).\n",
     "\n",
-    "We can easily make a grid of all 20 AUs and animate their intensity changes one at a time from a netural facial expression. To generate the animation from matplotlib plots, we use the [`celluloid`](https://github.com/jwkvam/celluloid) library that makes it a bit easier to work with matplotlib animations. It's also what `animate_face` uses under the hood: "
+    "We can easily make a grid of all 20 AUs and animate their intensity changes one at a time from a neutral facial expression. To generate the animation from matplotlib plots, we use the [`celluloid`](https://github.com/jwkvam/celluloid) library that makes it a bit easier to work with matplotlib animations. It's also what `animate_face` uses under the hood: "
    ]
   },
   {
diff --git a/docs/basic_tutorials/05_fex_analysis.ipynb b/docs/basic_tutorials/05_fex_analysis.ipynb
index 375334c5..4d094e98 100644
--- a/docs/basic_tutorials/05_fex_analysis.ipynb
+++ b/docs/basic_tutorials/05_fex_analysis.ipynb
@@ -13,14 +13,14 @@
     "\n",
     "In the original paper the authors had 3 speakers deliver *good* or *bad* news while filming their facial expressions. They found that could accurately \"decode\" each condition based on participants' facial expressions extracted either using a custom multi-chanel-gradient model or action units (AUs) extracted using [Open Face](https://github.com/TadasBaltrusaitis/OpenFace). \n",
     "\n",
-    "In this tutorial we'll show how easiy it is to not only reproduce their decoding analysis with py-feat, but just as easily perform additional analyses. Specifically we'll:\n",
+    "In this tutorial we'll show how easily it is to not only reproduce their decoding analysis with py-feat, but just as easily perform additional analyses. Specifically we'll:\n",
     "\n",
     "1. Download 20 of the first subject's videos (the full dataset is available on [OSF](https://osf.io/6tbwj/)\n",
     "2. Extract facial features using the `Detector`\n",
     "3. Aggregate and summarize detections per video using `Fex`\n",
     "2. Train and test a decoder to classify *good* vs *bad* news using extracted emotions, AUs, and poses\n",
     "3. Run a fMRI style \"mass-univariate\" comparison across all AUs between conditions\n",
-    "4. Run a time-series analysis comparing videos based on the time-courses of extracted facial fatures "
+    "4. Run a time-series analysis comparing videos based on the time-courses of extracted facial features "
    ]
   },
   {
@@ -40,7 +40,7 @@
    "source": [
     "# 5.1 Download the data\n",
     "\n",
-    "Here's we'll download and save the first 20 video files and their corresponding attributes from OSF. The next cell should run quickly on Google Collab, but will depend on your own internet conection if you're executing this notebook locally. You can rerun this cell in case the download fails for any reason, as it should skip downloading existing files:"
+    "Here's we'll download and save the first 20 video files and their corresponding attributes from OSF. The next cell should run quickly on Google Collab, but will depend on your own internet connection if you're executing this notebook locally. You can rerun this cell in case the download fails for any reason, as it should skip downloading existing files:"
    ]
   },
   {
@@ -1204,7 +1204,7 @@
     "    )\n",
     ")\n",
     "\n",
-    "# Update sesssions to group by condition, compute means (per condition), and make a\n",
+    "# Update sessions to group by condition, compute means (per condition), and make a\n",
     "# barplot of the mean AUs for each condition\n",
     "ax = (\n",
     "    by_video.update_sessions(video2condition)\n",
@@ -1319,7 +1319,7 @@
     "    X=\"sessions\", y=\"aus\", fit_intercept=True\n",
     ")\n",
     "\n",
-    "# We can perform bonferroni correction for multiple comparisions:\n",
+    "# We can perform bonferroni correction for multiple comparisons:\n",
     "p_bonf = p / p.shape[1]\n",
     "\n",
     "results = pd.concat(\n",
diff --git a/docs/extra_tutorials/06_trainAUvisModel.ipynb b/docs/extra_tutorials/06_trainAUvisModel.ipynb
index 44b28a3f..505e68d3 100755
--- a/docs/extra_tutorials/06_trainAUvisModel.ipynb
+++ b/docs/extra_tutorials/06_trainAUvisModel.ipynb
@@ -151,7 +151,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can examine the correlation between AU occurences across all the datasets to get a sense of what AU's tend to co-occur:"
+    "We can examine the correlation between AU occurrences across all the datasets to get a sense of what AU's tend to co-occur:"
    ]
   },
   {
@@ -254,7 +254,7 @@
    "source": [
     "## Balance AU-occurences by sub-sampling\n",
     "\n",
-    "Because datasets differ in which AUs they contain and because AUs differ greatly in their occurence across samples, we sub-sample the aggregated data to generate a new dataset that contains at least 650 occurences of each AU. This number was chosen because it is the largest number of positive samples (samples where the AU was present) for the AU with the fewest positive samples (AU43). This helps balance the features out a bit:"
+    "Because datasets differ in which AUs they contain and because AUs differ greatly in their occurrence across samples, we sub-sample the aggregated data to generate a new dataset that contains at least 650 occurrences of each AU. This number was chosen because it is the largest number of positive samples (samples where the AU was present) for the AU with the fewest positive samples (AU43). This helps balance the features out a bit:"
    ]
   },
   {
@@ -327,7 +327,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can see that our resampled dataset contains signficantly higher proportions of each AU, which will make it a bit easier to train the model."
+    "We can see that our resampled dataset contains significantly higher proportions of each AU, which will make it a bit easier to train the model."
    ]
   },
   {
@@ -399,7 +399,7 @@
     "        _X = poly.fit_transform(_X)\n",
     "\n",
     "    # It can also be helpful to scale AUs within each sample such that they reflect\n",
-    "    # z-scores relative to the mean/std AU occurences within that sample, rather than\n",
+    "    # z-scores relative to the mean/std AU occurrences within that sample, rather than\n",
     "    # values between 0-1. This can be helpful if you use a polynomial degree > 1\n",
     "    # But we don't do this by default\n",
     "    if scale_across_features:\n",
diff --git a/docs/pages/changelog.md b/docs/pages/changelog.md
index dea0090a..7ed1f37a 100644
--- a/docs/pages/changelog.md
+++ b/docs/pages/changelog.md
@@ -143,7 +143,7 @@ This is a large overhaul and refactor of some of the core testing and API functi
 
 ### Breaking Changes
 
-- `Detector` no longer support unintialized models, e.g. `any_model = None`
+- `Detector` no longer support uninitialized models, e.g. `any_model = None`
   - This is is also true for `Detector.change_model`
 - Columns of interest on `Fex` data classes were previously accessed like class _methods_, i.e. `fex.aus()`. These have now been changed to class _attributes_, i.e. `fex.aus`
 - Remove support for `DRML` AU detector
diff --git a/docs/pages/models.md b/docs/pages/models.md
index 0b7f2294..e02f7528 100644
--- a/docs/pages/models.md
+++ b/docs/pages/models.md
@@ -39,7 +39,7 @@ Models names are case-insensitive: `'resmasknet' == 'ResMaskNet'`
 - `svm`: SVM model trained on Histogram of Oriented Gradients\*\* extracted from BP4D, DISFA, CK+, UNBC-McMaster shoulder pain, and AFF-Wild2 datasets
 
 ```{note}
-\*For AU07, our `xbg` detector was trained with hinge-loss instead of cross-entropy loss like other AUs as this yielded substantially better detection peformance given the labeled data available for this AU. This means that while it returns continuous probability predictions,  these are more likely to appear binary in practice (i.e. be 0 or 1) and should be interpreted as *proportion of decision-trees with a detection* rather than *average decision-tree confidence* like other AU values.
+\*For AU07, our `xbg` detector was trained with hinge-loss instead of cross-entropy loss like other AUs as this yielded substantially better detection performance given the labeled data available for this AU. This means that while it returns continuous probability predictions,  these are more likely to appear binary in practice (i.e. be 0 or 1) and should be interpreted as *proportion of decision-trees with a detection* rather than *average decision-tree confidence* like other AU values.
 ```
 
 ```{note}
diff --git a/feat/data.py b/feat/data.py
index b0eb63d6..add05e20 100644
--- a/feat/data.py
+++ b/feat/data.py
@@ -941,7 +941,7 @@ def ttest_1samp(self, popmean=0):
 
         Args:
             popmean (int, optional): Population mean to test against. Defaults to 0.
-            threshold_dict ([type], optional): Dictonary for thresholding. Defaults to None. [NOT IMPLEMENTED]
+            threshold_dict ([type], optional): Dictionary for thresholding. Defaults to None. [NOT IMPLEMENTED]
 
         Returns:
             t, p: t-statistics and p-values
@@ -999,7 +999,7 @@ def predict(
 
         mX, my = self._parse_features_labels(X, y)
 
-        # user passes an unintialized class, e.g. LogisticRegression
+        # user passes an uninitialized class, e.g. LogisticRegression
         if isinstance(model, type):
             clf = model(*args, **kwargs)
         else:
@@ -1042,7 +1042,7 @@ def isc(self, col, index="frame", columns="input", method="pearson"):
             method (str, optional): Method to use for correlation pearson, kendall, or spearman. Defaults to "pearson".
 
         Returns:
-            DataFrame: Correlation matrix with index as colmns
+            DataFrame: Correlation matrix with index as columns
         """
         if index is None:
             index = "frame"
diff --git a/feat/detector.py b/feat/detector.py
index 0abc54b6..e0d7582d 100644
--- a/feat/detector.py
+++ b/feat/detector.py
@@ -43,7 +43,7 @@
 from tqdm import tqdm
 import torchvision.transforms as transforms
 
-# Supress sklearn warning about pickled estimators and diff sklearn versions
+# Suppress sklearn warning about pickled estimators and diff sklearn versions
 warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")
 
 
@@ -77,12 +77,12 @@ def __init__(
             info (dict):
                 n_jobs (int): Number of jobs to be used in parallel.
                 face_model (str, default=retinaface): Name of face detection model
-                landmark_model (str, default=mobilenet): Nam eof landmark model
+                landmark_model (str, default=mobilenet): Name eof landmark model
                 au_model (str, default=svm): Name of Action Unit detection model
                 emotion_model (str, default=resmasknet): Path to emotion detection model.
                 facepose_model (str, default=img2pose): Name of headpose detection model.
                 identity_model (str, default=facenet): Name of identity detection model.
-                face_detection_columns (list): Column names for face detection ouput (x, y, w, h)
+                face_detection_columns (list): Column names for face detection output (x, y, w, h)
                 face_landmark_columns (list): Column names for face landmark output (x0, y0, x1, y1, ...)
                 emotion_model_columns (list): Column names for emotion model output
                 emotion_model_columns (list): Column names for emotion model output
@@ -170,7 +170,7 @@ def _init_detectors(
 
         # Initialize model instances and any additional post init setup
         # Only initialize a model if the currently initialized model is diff than the
-        # requested one. Lets us re-use this with .change_model
+        # requested one. Lets us reuse this with .change_model
 
         # FACE MODEL
         if self.info["face_model"] != face:
diff --git a/feat/face_detectors/FaceBoxes/readme.md b/feat/face_detectors/FaceBoxes/readme.md
index 3b447beb..6d4e3459 100644
--- a/feat/face_detectors/FaceBoxes/readme.md
+++ b/feat/face_detectors/FaceBoxes/readme.md
@@ -1,4 +1,4 @@
-## Liscense:
+## License:
 # MIT License
 
 # Copyright (c) 2017 Max deGroot, Ellis Brown
diff --git a/feat/facepose_detectors/img2pose/deps/rpn.py b/feat/facepose_detectors/img2pose/deps/rpn.py
index 59a626c1..7b457e7e 100644
--- a/feat/facepose_detectors/img2pose/deps/rpn.py
+++ b/feat/facepose_detectors/img2pose/deps/rpn.py
@@ -362,7 +362,7 @@ def filter_proposals(
         # -> Tuple[List[Tensor], List[Tensor]]
         num_images = proposals.shape[0]
         device = proposals.device
-        # do not backprop throught objectness
+        # do not backprop through objectness
         objectness = objectness.detach()
         objectness = objectness.reshape(num_images, -1)
 
diff --git a/feat/facepose_detectors/img2pose/img2pose_test.py b/feat/facepose_detectors/img2pose/img2pose_test.py
index 3671b117..7cbce805 100644
--- a/feat/facepose_detectors/img2pose/img2pose_test.py
+++ b/feat/facepose_detectors/img2pose/img2pose_test.py
@@ -38,7 +38,7 @@ def __init__(
 
         Args:
             device (str): device to execute code. can be ['auto', 'cpu', 'cuda', 'mps']
-            contrained (bool): whether to run constrained (default) or unconstrained mode
+            constrained (bool): whether to run constrained (default) or unconstrained mode
 
         Returns:
             Img2Pose object
diff --git a/feat/utils/io.py b/feat/utils/io.py
index 9f6ecdb3..888b4a5f 100644
--- a/feat/utils/io.py
+++ b/feat/utils/io.py
@@ -71,7 +71,7 @@ def validate_input(inputFname):
 def download_url(*args, **kwargs):
     """By default just call download_url from torch vision, but we pass a verbose =
     False keyword argument, then call download_url with a special context manager than
-    supresses the print messages"""
+    suppresses the print messages"""
     verbose = kwargs.pop("verbose", True)
 
     if verbose:
diff --git a/feat/utils/stats.py b/feat/utils/stats.py
index 5bc9cd10..a4ee770e 100644
--- a/feat/utils/stats.py
+++ b/feat/utils/stats.py
@@ -17,7 +17,7 @@ def wavelet(freq, num_cyc=3, sampling_freq=30.0):
     Creates a complex Morlet wavelet by windowing a cosine function by a Gaussian. All formulae taken from Cohen, 2014 Chaps 12 + 13
 
     Args:
-        freq: (float) desired frequence of wavelet
+        freq: (float) desired frequency of wavelet
         num_cyc: (float) number of wavelet cycles/gaussian taper. Note that smaller cycles give greater temporal precision and that larger values give greater frequency precision; (default: 3)
         sampling_freq: (float) sampling frequency of original signal.
 
diff --git a/setup.cfg b/setup.cfg
index 2262d1ff..56e6a564 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -19,3 +19,10 @@ universal = 1
 
 [aliases]
 test = pytest
+
+[codespell]
+# Ref: https://github.com/codespell-project/codespell#using-a-config-file
+skip = .git
+check-hidden = true
+ignore-regex = ^\s*"image/\S+": ".*
+ignore-words-list = ists,gaus