Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mg object failed to be created when executing dsBase::glmerSLMADS2 #267

Open
Adelca2018 opened this issue Feb 4, 2022 · 19 comments
Open
Assignees
Milestone

Comments

@Adelca2018
Copy link

As I was carrying out some analysis using the command ds.glmerSLMA I received a DataSHIELD error saying that the object mg from the function dsBase::glmerSLMADS2 was not found.
When I checked the source code of the function dsBase::glmerSLMADS2 in GitHub, I then realized that this object is created within a try in line 392 (see code below) from this page: https://github.com/datashield/dsBase/blob/master/R/glmerSLMADS2.R

At line 392: iterations <- utils::capture.output(try(mg <- lme4::glmer(formula2use, offset=offset, weights=weights, data=dataDF,
  family = family, nAGQ=nAGQ,verbose = verbose, control=control.obj, start = start)))

It seems that for my case the function fails to create the mg object inside a try.

Datashield error Message I got:
Error while evaluating 'dsBase::glmerSLMADS2(flag_pavk~sex_cd+age+flag_nicotine+flag_aat+yyy1xxxpatient_numzzz, NULL, NULL, "D", "binomial", NULL, NULL, 1L, 0, NULL, NULL)' -> Error in summary(mg) : object 'mg' not found\n"

Can I have more details when or why the try can fail to create the mg object?

Thanks In advance for your explanations.

@StuartWheater StuartWheater added this to the v6.2 milestone Feb 4, 2022
@StuartWheater
Copy link
Member

@Adelca2018 Thank you for the information we will investigate

@tombisho
Copy link
Contributor

tombisho commented Feb 4, 2022

Hi @Adelca2018 , are you able to provide the client side call that you made, please? And it would be very helpful if you could provide a set of dummy data that is representative of you real data, in order to recreate the issue. If the dummy data are not possible, please could you provide a statistical description of each variable? For example, flag_pakv is a binary variable, count0 = 200 and count1 = 340.

@Adelca2018
Copy link
Author

Hi @tombisho .

The client side call I used was: ds.glmerSLMA(formula=flag_pavk ~ sex_cd + age + flag_nicotine + (1 | patient_num) ,dataName='D', datasources=connections, family="binomial")

Unfortunately I cannot provide a set of dummy data.
But for more information:

  • flag_pavk is a dummy variable specifying whether or not a patient was diagnosed of PAVK during a visit. count0=22984 and count1=2093
  • sex_cd is the gender as a factor count0=15649 and count1=9428
  • age is the age as a coninuous variable
  • flag_nicotine is a dummy variable specifying whether or not the patient is a smoker. count0=20799 and count1=4278
  • patient_num is a unique numerical code attributed to each patient.

Best regards

@tombisho
Copy link
Contributor

tombisho commented Feb 8, 2022 via email

@davraam
Copy link
Member

davraam commented Feb 8, 2022

maybe the error is created because there are no quote marks around the formula?

@Adelca2018
Copy link
Author

maybe the error is created because there are no quote marks around the formula?

No. The error is not the formula because I use it in a loop and the formula works well for other variables. So I am pretty sure the error does not come from the formula

@Adelca2018
Copy link
Author

That's great, thanks. How many data points are there per patient approximately? Tom

On Tue, Feb 8, 2022 at 1:32 PM Adeline Makoudjou @.> wrote: Hi @tombisho https://github.com/tombisho . The client side call I used was: ds.glmerSLMA(formula=flag_pavk ~ sex_cd + age + flag_nicotine + (1 | patient_num) ,dataName='D', datasources=connections, family="binomial") Unfortunately I cannot provide a set of dummy data. But for more information: - flag_pavk is a dummy variable specifying whether or not a patient was diagnosed of PAVK during a visit. count0=22984 and count1=2093 - sex_cd is the gender as a factor count0=15649 and count1=9428 - age is the age as a coninuous variable - flag_nicotine is a dummy variable specifying whether or not the patient is a smoker. count0=20799 and count1=4278 - patient_num is a unique numerical code attributed to each patient. Best regards — Reply to this email directly, view it on GitHub <#267 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBAPNSXRVRQVK6LOFZ4QXTU2ELOHANCNFSM5NR3LLGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.>

The average number of visits per patient is 2.35

@tombisho
Copy link
Contributor

tombisho commented Feb 8, 2022

maybe the error is created because there are no quote marks around the formula?

No. The error is not the formula because I use it in a loop and the formula works well for other variables. So I am pretty sure the error does not come from the formula

I agree, I tested and ds.glmerSLMA does the string conversion for the user

@davraam
Copy link
Member

davraam commented Feb 8, 2022

OK thanks. that's good to know.

@tombisho
Copy link
Contributor

tombisho commented Feb 8, 2022

Hi @Adelca2018 I think to recreate this I need to know more information about each of the input variables. Please could you provide the output from ds.summary for each of them?

Thanks

@Adelca2018
Copy link
Author

Hi @tombisho .

I forgot to mention a variable in the formula in my previous message.

The right client side call I used was: ds.glmerSLMA(formula=flag_pavk ~ sex_cd + age + flag_nicotine + flag_aat + (1 | patient_num) ,dataName='D', datasources=connections, family="binomial")

Summaries:

flag_pavk is a dummy variable specifying whether or not a patient was diagnosed of PAVK during a visit. count0=22984 and count1=2093
sex_cd is the gender as a factor count0=15649 and count1=9428
age is the age as a coninuous variable
flag_nicotine is a dummy variable specifying whether or not the patient is a smoker. count0=20799 and count1=4278
flag_aat is also a dummy variable. count0=24832. count1=245
patient_num is a unique numerical code attributed to each patient.

you asked me to provide more details using ds. summary. Actually for categorical variables ds.summary gives exactly the counts I just provided you above. Since age is the only continuous variable, these are the summaries for age:

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 61.00 69.00 67.96 77.00 101.00

Best regards

@tombisho
Copy link
Contributor

tombisho commented Feb 11, 2022 via email

@Adelca2018
Copy link
Author

patient_num is integer

@tombisho
Copy link
Contributor

You could try running a normal glm and see if that works to give an indication of what the problem might be:

ds.glmSLMA(formula="flag_pavk ~ sex_cd + age + flag_nicotine + flag_aat + patient_num" ,dataName='D', datasources=connections, family="binomial")

@Adelca2018
Copy link
Author

ds.glmSLMA(formula="flag_pavk ~ sex_cd + age + flag_nicotine + flag_aat + patient_num" ,dataName='D', datasources=connections, family="binomial"). produces a result without any error

@tombisho
Copy link
Contributor

Ok thank you. I have not been able to reproduce your error with a synthetic dataset that simulates your data. If the fitting has any problems, an error message is normally returned. I can't recreate a situation where mg is simply not created.

I wonder if you try running ds.glmerSLMA with a small subset (e.g. 250 participants) whether you get the same error. You could do the subset using the patient_num , i.e. take only patient_num < a certain value.

@StuartWheater
Copy link
Member

Hi, sorry if you have indicated this earlier, but could you indicate which version of dsBase and dsBaseClient you are using.

Stuart

@tombisho
Copy link
Contributor

tombisho commented Mar 1, 2022

Hi @Adelca2018 , further investigation shows that this error can occur when there is only one observation for each patient_num. I know that previously you have said there are ~2 visits per patient. Can you check that when you try to fit the model, there are more than 1 observations per patient_num?

For example, are you sure that the data you run the summaries on is exactly the same as the data you fit the model to? Or do you do some processing before you fit the model?

@Adelca2018
Copy link
Author

Dear @tombisho, thanks for your feedback.

Actually in my data there are some patients who just came once so that there is just one row having their details.

When this can be an issue, then as you say the problem may come form there.

So mixed effects models should always be applied when there at list two lines of observations for the variable we added as random factor?

What I think strange is that I run the same model on other variables as well (as dependent variables but with the same formula for independent variables) with the same data but I did not get this error. Assumed the reason is because there were cases of patient_num not having more than 1 observation, then those models should have generated errors as well.

Regarding your question, the data I run the summaries on is exactly the same as the data I fit the model to, but you know these summaries were averages.

Regards.

@tombisho tombisho mentioned this issue Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants