-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced Anatomy plots #159
Comments
@andkov, I like those graphs, and all the context/stat info you packed in --without trampling on the data patterns.
|
@wibeasley , thanks for the input. In reaction to your suggestion I've split categories into 5 quantiles, which provides additional information about the distribution of this variable. I've implemented all but 4, because i would have needed to restructure complex graph assembly. But good point. |
@andkov : these are really great!! Can you create these for one of the situations where the slope-slope correlation is high? in this example the correlation is -0.04, with p value 0.925. |
@ampiccinin , yes, here's one for |
@ampiccinin , here's the same model with scaled fev . We begin to run into the difficulty of too many graphs, the anatomy reports would be very large and might loose the reader. I'm thinking about the solution. So for right now, if you have a specific model you'd like to examine like this - just let me know. I'll think of something over the weekend, to organize these images into accessible form. |
@andkov - great example. Maybe perfect. See how the estimated correlation is 0.8? Yet the graph suggests zero or negative correlation? This suggests that the correlations do in fact represent the partial correlation, which they should. If you plot this different age quintiles separately, we should see more clearly that the one that included age 70 (wherever baseline age is centred) looks more like r= +0.80. Can you do this (just for this one example is fine)? In fact, the estimated correlation should be for the 70-year-old male non-smoker of average education and height and no diagnosed cardio or diabetes issues. I'm betting that the reason they often are not statistically "significant" is because the n for this subgroup is so small compared to its variability. ...Ok, now I'm feeling a bit déjà vu. At any rate, it is not unlikely a power issue. I'm keen to see the separate plots for the age groups!! |
Cool. what do you see (in particular for age and smoking=no)? is it difficult to go one more step and make a single plot that contains only the age group you have plotted here, but restricting also on the other variables to where their reference value is? We don't need this for all studies. It would just be good to check a couple, in particular of the correlation/SE/p values that seemed odd and not matching. Just to satisfy ourselves that we actually understand what is going on so we can explain it in the paper. Since the phys-cog papers will not have a lot of space, it might actually be better to make this point in the phys-phys paper (and possibly reference it in the phys-cog), but since you have the graphs running for phys-cog, we can examine the problem there, since it should be the same principle. |
Two other options are (a) online supplemental material and (b) hosting them on an IALSA site (similar as this report is contained on a project-specific website). |
@ampiccinin, ok, let me work on these modification. |
@andkov @wibeasley At this point I am thinking of the graphs as serving our own understanding of what the models are telling us, particularly when we see odd combinations (high non-significant correlations). I suppose we could imagine including one as an example in a paper, but for me this is just internal consumption. I don't think most people will be interested in sifting through these. |
Technically, this is not difficult. However, there are not enough data points to find such a restrictive combination. To illustrate, when I further restrict this group of 65 individuals ds <- ds %>%
dplyr::filter(
edu == 7
,height > 169.5 & height < 170.5
,smoke == "no"
,diabetes == "no"
,cardio == "no"
) I simply run out of data points : i find no such combination. Or did I misunderstood your question? |
I'm with you, @ampiccinin . Telling the story is not the same as finding the story. I am fully cognizant that most of the graphs and reports produced are not going to end up in a publication, but it's ok, because it's aimed at a different goal - to empower the writers to say what would be interesting to read. It's quite tricky, as I'm finding out, to switch from the scientist mind to a journalist mind and i'm finding more and more appreciation for this knack. |
:) exactly - We can't just select 1 value for these, just like with age. How about selecting 165-180 (or 170-175) for height and 7-12 on education (remind me where it is centered? Is it really 7 years?). with respect to scientist vs journalist - I was thinking of it more as exploratory vs confirmatory, or observation vs interpretation. |
We frequently called them "internal reports" and "external reports" to make a similar distinction. Another related characteristic is development time. An internal report takes ~20 minutes to develop. An external report takes hours, because all the defaults are usually tweaked in order to make a big impact quickly. |
@ampiccinin Unfortunately, even as i'm expanding the ranges there aren't even people. Let me try to find a similar case for females, there are more of them in the set. |
if we can't find enough people, even taking a wide swath, Then I'd say this is part of the problem with the models. Another option is to just not select based on education and height. That will be close enough |
The contingency tables are pretty sparse. When I take the initial 321 individuals (even before stratifying on age group at baseline) there are just not enough people in the refence category (11 here) > ds %>%
+ dplyr::group_by(smoke,cardio,diabetes) %>%
+ dplyr::distinct(id) %>%
+ dplyr::count()
Source: local data frame [8 x 4]
Groups: smoke, cardio [?]
smoke cardio diabetes n
<fctr> <fctr> <fctr> <int>
1 yes yes yes 99
2 yes yes no 35
3 yes no yes 22
4 yes no no 6
5 no yes yes 84
6 no yes no 30
7 no no yes 34
8 no no no 11 And when I add age_group_bl smoke cardio diabetes n
<fctr> <fctr> <fctr> <fctr> <int>
1 [85.4,98.5] yes yes yes 23
2 [85.4,98.5] yes yes no 2
3 [85.4,98.5] yes no yes 8
4 [85.4,98.5] yes no no 2
5 [85.4,98.5] no yes yes 14
6 [85.4,98.5] no yes no 6
7 [85.4,98.5] no no yes 8
8 [85.4,98.5] no no no 1
9 [81.9,85.4) yes yes yes 25
10 [81.9,85.4) yes yes no 6
11 [81.9,85.4) yes no yes 4
12 [81.9,85.4) no yes yes 12
13 [81.9,85.4) no yes no 6
14 [81.9,85.4) no no yes 9
15 [81.9,85.4) no no no 2
16 [78.5,81.9) yes yes yes 21
17 [78.5,81.9) yes yes no 12
18 [78.5,81.9) yes no yes 3
19 [78.5,81.9) yes no no 2
20 [78.5,81.9) no yes yes 12
21 [78.5,81.9) no yes no 8
22 [78.5,81.9) no no yes 4
23 [78.5,81.9) no no no 2
24 [73.4,78.5) yes yes yes 14
25 [73.4,78.5) yes yes no 6
26 [73.4,78.5) yes no yes 4
27 [73.4,78.5) yes no no 2
28 [73.4,78.5) no yes yes 22
29 [73.4,78.5) no yes no 5
30 [73.4,78.5) no no yes 8
31 [73.4,78.5) no no no 3
32 [57.9,73.4) yes yes yes 16
33 [57.9,73.4) yes yes no 9
34 [57.9,73.4) yes no yes 3
35 [57.9,73.4) no yes yes 24
36 [57.9,73.4) no yes no 5
37 [57.9,73.4) no no yes 5
38 [57.9,73.4) no no no 3 |
...it's the last one, with n=3. You could try age quartile instead of quintile. I am not at all surprised. This is what I've been asking about from the start. |
The investigation into the effect size / significance of the association between two processes (quantified through covariance and correlation between terms of the bivariate growth curve model) can benefit from several modification to the existing "anatomy" plots:
a - ae - aeh - aehplus
progression (incremental addition of predictors)Spaghetti
Scatters
@ampiccinin
The text was updated successfully, but these errors were encountered: