You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to perform a weighted linear regression using version 0.4.34 of Samplics. Here is the entirety of my code (which should be reproducible on your end):
import pandas as pd
from samplics.regression import SurveyGLM
df_car_survey = pd.read_csv(
'https://raw.githubusercontent.com/ifstudies/\
carsurveydata/refs/heads/main/car_survey.csv')
df_car_survey['Enjoy_Driving_Fast_Int'] = (
df_car_survey['Enjoy_Driving_Fast'].map(
{'Strongly Agree':5, 'Agree':4, 'Slightly Agree':3,
'Slightly Disagree':2, 'Disagree':1, 'Strongly Disagree':0}))
df_car_survey = pd.concat([df_car_survey, pd.get_dummies(
df_car_survey['Car_Color'], dtype = 'int')],
axis = 1)
print(df_car_survey.head())
# The following code was based on Samplics' GLM source code, available at
# https://github.com/samplics-org/samplics/blob/main/src/
# samplics/regression/glm.py
slr = SurveyGLM()
slr.estimate(y = df_car_survey['Enjoy_Driving_Fast_Int'],
x = df_car_survey['Black'],
samp_weight = df_car_survey['Weight'])
Here's the output of print(df_car_survey.head()) for reference:
Car_Color Weight Enjoy_Driving_Fast Count Response_Sort_Map \
0 Red 1.975884 Strongly Agree 1 0
1 Red 0.943725 Strongly Agree 1 0
2 Red 1.342593 Strongly Agree 1 0
3 Red 1.704274 Strongly Agree 1 0
4 Red 0.348622 Strongly Agree 1 0
Enjoy_Driving_Fast_Int Black Red White
0 5 0 1 0
1 5 0 1 0
2 5 0 1 0
3 5 0 1 0
4 5 0 1 0
When I try to run slr.estimate(), I receive the following error and trackeback:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[1], line 23
18 # The following code was based on Samplics' GLM source code, available at
19 # https://github.com/samplics-org/samplics/blob/main/src/
20 # samplics/regression/glm.py
22 slr = SurveyGLM()
---> 23 slr.estimate(y = df_car_survey['Enjoy_Driving_Fast_Int'],
24 x = df_car_survey['Black'],
25 samp_weight = df_car_survey['Weight'])
File [~/miniforge3/envs/ifs/lib/python3.13/site-packages/samplics/regression/glm.py:100](http://localhost:33733/home/ifskjb3/miniforge3/envs/ifs/lib/python3.13/site-packages/samplics/regression/glm.py#line=99), in SurveyGLM.estimate(self, y, x, samp_weight, stratum, psu, fpc, remove_nan)
97 glm_model = sm.GLM(endog=_y, exog=_x, var_weights=_samp_weight)
98 glm_results = glm_model.fit()
--> 100 g = self._calculate_g(
101 samp_weight=_samp_weight,
102 resid=glm_results.resid_response,
103 x=_x,
104 stratum=_stratum,
105 psu=_psu,
106 fpc=self.fpc,
107 glm_scale=glm_results.scale,
108 )
110 d = glm_results.cov_params()
112 self.beta = glm_results.params
File [~/miniforge3/envs/ifs/lib/python3.13/site-packages/samplics/regression/glm.py:55](http://localhost:33733/home/ifskjb3/miniforge3/envs/ifs/lib/python3.13/site-packages/samplics/regression/glm.py#line=54), in SurveyGLM._calculate_g(self, samp_weight, resid, x, stratum, psu, fpc, glm_scale)
53 psu = np.arange(e.shape[0])
54 if stratum.shape in ((), (0,)):
---> 55 e_h, n_h = self._residuals(e=e, psu=psu, nb_vars=x.shape[1])
56 return fpc * (n_h [/](http://localhost:33733/) (n_h - 1)) * e_h
57 else:
IndexError: tuple index out of range
It appears that the code is attempting to access the second element of df_car_survey['Black'].shape. However, this shape equals (1059,) , and thus there is no second element.
Thanks in advance for your assistance! Also, I imagine your time is very limited, but adding a documentation page on linear regressions would be a huge help.
The text was updated successfully, but these errors were encountered:
Understood! Thank you for the heads up. I imagine your time is quite limited, but if you could let me know when the regression package is ready (perhaps by commenting within this thread), that would be great.
This update will be especially exciting because I'm not sure of any other Python library that can easily report P values and test statistics for logistic regressions of data with sample weights. (Statsmodels has a 'freq_weights' column, but this is a different concept than the sample weights that Samplics uses.) I can use R's survey and srvyr packages via rpy2 in the meantime, but being able to do all of my weighted survey analyses directly in Python would be great!
Hi there,
I am attempting to perform a weighted linear regression using version 0.4.34 of Samplics. Here is the entirety of my code (which should be reproducible on your end):
Here's the output of
print(df_car_survey.head())
for reference:When I try to run slr.estimate(), I receive the following error and trackeback:
It appears that the code is attempting to access the second element of
df_car_survey['Black'].shape
. However, this shape equals(1059,)
, and thus there is no second element.Thanks in advance for your assistance! Also, I imagine your time is very limited, but adding a documentation page on linear regressions would be a huge help.
The text was updated successfully, but these errors were encountered: