Performing a linear regression produces an 'IndexError: tuple index out of range' result #66

kburchfiel · 2025-01-31T20:41:07Z

Hi there,

I am attempting to perform a weighted linear regression using version 0.4.34 of Samplics. Here is the entirety of my code (which should be reproducible on your end):

import pandas as pd
from samplics.regression import SurveyGLM

df_car_survey = pd.read_csv(
    'https://raw.githubusercontent.com/ifstudies/\
carsurveydata/refs/heads/main/car_survey.csv')

df_car_survey['Enjoy_Driving_Fast_Int'] = (
    df_car_survey['Enjoy_Driving_Fast'].map(
    {'Strongly Agree':5, 'Agree':4, 'Slightly Agree':3, 
     'Slightly Disagree':2, 'Disagree':1, 'Strongly Disagree':0}))

df_car_survey = pd.concat([df_car_survey, pd.get_dummies(
    df_car_survey['Car_Color'], dtype = 'int')],
         axis = 1)
print(df_car_survey.head())

# The following code was based on Samplics' GLM source code, available at
# https://github.com/samplics-org/samplics/blob/main/src/
# samplics/regression/glm.py

slr = SurveyGLM()
slr.estimate(y = df_car_survey['Enjoy_Driving_Fast_Int'],
             x = df_car_survey['Black'],
            samp_weight = df_car_survey['Weight'])

Here's the output of print(df_car_survey.head()) for reference:

  Car_Color    Weight Enjoy_Driving_Fast  Count  Response_Sort_Map  \
0       Red  1.975884     Strongly Agree      1                  0   
1       Red  0.943725     Strongly Agree      1                  0   
2       Red  1.342593     Strongly Agree      1                  0   
3       Red  1.704274     Strongly Agree      1                  0   
4       Red  0.348622     Strongly Agree      1                  0   

   Enjoy_Driving_Fast_Int  Black  Red  White  
0                       5      0    1      0  
1                       5      0    1      0  
2                       5      0    1      0  
3                       5      0    1      0  
4                       5      0    1      0

When I try to run slr.estimate(), I receive the following error and trackeback:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[1], line 23
     18 # The following code was based on Samplics' GLM source code, available at
     19 # https://github.com/samplics-org/samplics/blob/main/src/
     20 # samplics/regression/glm.py
     22 slr = SurveyGLM()
---> 23 slr.estimate(y = df_car_survey['Enjoy_Driving_Fast_Int'],
     24              x = df_car_survey['Black'],
     25             samp_weight = df_car_survey['Weight'])

File [~/miniforge3/envs/ifs/lib/python3.13/site-packages/samplics/regression/glm.py:100](http://localhost:33733/home/ifskjb3/miniforge3/envs/ifs/lib/python3.13/site-packages/samplics/regression/glm.py#line=99), in SurveyGLM.estimate(self, y, x, samp_weight, stratum, psu, fpc, remove_nan)
     97 glm_model = sm.GLM(endog=_y, exog=_x, var_weights=_samp_weight)
     98 glm_results = glm_model.fit()
--> 100 g = self._calculate_g(
    101     samp_weight=_samp_weight,
    102     resid=glm_results.resid_response,
    103     x=_x,
    104     stratum=_stratum,
    105     psu=_psu,
    106     fpc=self.fpc,
    107     glm_scale=glm_results.scale,
    108 )
    110 d = glm_results.cov_params()
    112 self.beta = glm_results.params

File [~/miniforge3/envs/ifs/lib/python3.13/site-packages/samplics/regression/glm.py:55](http://localhost:33733/home/ifskjb3/miniforge3/envs/ifs/lib/python3.13/site-packages/samplics/regression/glm.py#line=54), in SurveyGLM._calculate_g(self, samp_weight, resid, x, stratum, psu, fpc, glm_scale)
     53     psu = np.arange(e.shape[0])
     54 if stratum.shape in ((), (0,)):
---> 55     e_h, n_h = self._residuals(e=e, psu=psu, nb_vars=x.shape[1])
     56     return fpc * (n_h [/](http://localhost:33733/) (n_h - 1)) * e_h
     57 else:

IndexError: tuple index out of range

It appears that the code is attempting to access the second element of df_car_survey['Black'].shape. However, this shape equals (1059,) , and thus there is no second element.

Thanks in advance for your assistance! Also, I imagine your time is very limited, but adding a documentation page on linear regressions would be a huge help.

The text was updated successfully, but these errors were encountered:

MamadouSDiallo · 2025-01-31T23:22:24Z

Regression and sae are not ready for use. I should add a not implemented tag or something until it's ready.

kburchfiel · 2025-02-03T14:52:44Z

Understood! Thank you for the heads up. I imagine your time is quite limited, but if you could let me know when the regression package is ready (perhaps by commenting within this thread), that would be great.

kburchfiel · 2025-02-04T18:48:33Z

This update will be especially exciting because I'm not sure of any other Python library that can easily report P values and test statistics for logistic regressions of data with sample weights. (Statsmodels has a 'freq_weights' column, but this is a different concept than the sample weights that Samplics uses.) I can use R's survey and srvyr packages via rpy2 in the meantime, but being able to do all of my weighted survey analyses directly in Python would be great!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performing a linear regression produces an 'IndexError: tuple index out of range' result #66

Performing a linear regression produces an 'IndexError: tuple index out of range' result #66

kburchfiel commented Jan 31, 2025 •

edited

Loading

MamadouSDiallo commented Jan 31, 2025

kburchfiel commented Feb 3, 2025

kburchfiel commented Feb 4, 2025 •

edited

Loading

Performing a linear regression produces an 'IndexError: tuple index out of range' result #66

Performing a linear regression produces an 'IndexError: tuple index out of range' result #66

Comments

kburchfiel commented Jan 31, 2025 • edited Loading

MamadouSDiallo commented Jan 31, 2025

kburchfiel commented Feb 3, 2025

kburchfiel commented Feb 4, 2025 • edited Loading

kburchfiel commented Jan 31, 2025 •

edited

Loading

kburchfiel commented Feb 4, 2025 •

edited

Loading