Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forest plot : Pulmonary - Cognitive track #167

Open
andkov opened this issue Jan 25, 2017 · 19 comments
Open

Forest plot : Pulmonary - Cognitive track #167

andkov opened this issue Jan 25, 2017 · 19 comments
Assignees

Comments

@andkov
Copy link
Member

andkov commented Jan 25, 2017

Need to develop a graphing function to create forest plots from catalog type of data structures.

Reading sources:

See documentation for rmeta::forestplot arguments

@andkov andkov self-assigned this Jan 25, 2017
@ampiccinin ampiccinin changed the title Forrest plot : Pulmonary - Cognitive track Forest plot : Pulmonary - Cognitive track Jan 26, 2017
andkov added a commit that referenced this issue Feb 2, 2017
added a script that replicates and studies the examples from
https://cran.r-project.org/web/packages/forestplot/vignettes/forestplot.html

Please refer to this script for re-study
@andkov
Copy link
Member Author

andkov commented Feb 7, 2017

@ampiccinin @eduggan

The first draft of the forest plot graph has been stabilized. I have incorporated it into the report with dynamic table: in this way you can both view the summary, and explore the number.

However, I am not pleased with the performance of forestplot::forestplot().. I have accumulated sufficient grievances with it to discourage its use in the future, instead opting for creating our own similar graphing machinery using ggplot2 system. I think forestplot::forestplot() has its uses (for quick and unassuming displays), but its poorly suited for our case, where multiple plots of various sizes need to be produced dynamically. That and it looks kinda ugly. But maybe i'm just letting my pretentiousness to get better of me.

However, to be frugal, I'd like to see if we can get away with this for now. I'll start building a more robust and flexible graphing machinery with ggplot (@wibeasley made a good start for me) at a slower pace, but in the meanwhile let's see if we can make it work. Please take a look at the report ( also available from ./project/pulmonary-cognitive/ and separate pngs can be found in ./reports/correlation-3/forest-plot-pulmonary and comment on what you would like to change about it.

Please leave your comment and suggestions in this thread. I'll try to implement them, but I may not be able to due to limitation of the forestplot::forestplot().'s functionality. If I can - great, if not - we'll have to live with this solution for now until ggplot-based machinery is developed.

NOTE: the average effect is caluculated using weights of sample size, basically following this recipe. Please let me know if this is incorrect, or you prefer a different method.

A few specific questions:

  • how should the rows be sorted?
  • should I produce these forest plots for intercepts and residuals as well or it will be an unnecessary overload?

@ampiccinin
Copy link
Member

@andkov - We should produce the I and R forest plots, although we may only present them in GitHub and not in the publication.

It looks like you have differently sized points (squares), and the size differences are more manageable than the ones from Sean's plots, but they are not clearly associated with sample size differences, so I am uncertain what they mean.

Can we invert TrailsB scores/values so that it reflects correct responses (like everything else) rather than errors?

With respect to sorting: I realise (remember) now with your plots that in fact we have both different Cognitive AND different Pulmonary measures. Good to remind readers about the variability of both. I am happy with the sorting the way it is.

I like how Sean's, with subgroupings rather than separate plots, is more compact, but I appreciate the flexibility that separate plots may afford (just seems like there are so many of them and it means so many more titles and headings!).

Can you remind me why you use "process" rather than "variable"?

Wednesday morning between 9 and 11:30 Pacific work for me.

@ampiccinin
Copy link
Member

@andkov - also noticed that "Ravens" in MAP is labelled "Matrices". Can we please change it to Raven's so it will match LASA? Many thanks! I'll give a quick check for whether there are any other instances like this.

...actually,

LASA is RavenColourAB

and

MAP is RavenStandard

If this is too complicated, just label them both "raven", and we will include the other details in the Method section.

@andkov
Copy link
Member Author

andkov commented Feb 15, 2017

Agenda:

  • interpretation of differently sized points (squares)
  • invert TrailsB scores/values
  • rename Matrices into RavenStandard (MAP) and Raven into RabevColorAB (LASA). alternatively, all into raven
  • Can you remind me why you use "process" rather than "variable"?

@andkov
Copy link
Member Author

andkov commented Feb 15, 2017

@ampiccinin

Can you remind me why you use "process" rather than "variable"?

Yes. We wanted a word to distinguish the role of a variable in the model (outcome vs predictor). However, outcome (at least in my understanding) is tied to the specific measure, as in the outcome of the mmse measure. Given that various pairing of outcomes may inserted into a bivariate growth curve model (BGCM), we decided to call these "slots" in BGCM "processes". Hence we can refer to components of BGCM more generally as process A and process B, each of which can be represented by a variety of outcomes/measures.

However, this choice of words was selected to help us navigate the scripts easier. There is no reason why we cannot change the legend to reflect a more appropriate meaning.

@andkov
Copy link
Member Author

andkov commented Feb 15, 2017

@ampiccinin

interpretation of differently sized points (squares)

Good point. Here's what I've uncovered. According to this discussion the point sizes are drawn proportional to the precision of the estimates, which is the reciprocal of variance. This mapping agrees with my interpretation of the graphs: the smaller the sd the larger the box. Do you see cases with contradiction?

However, there is an option to make the sizes equal, so we can always resort to that if mapping the precision onto the box size in undesirable:

image

andkov added a commit that referenced this issue Feb 15, 2017
rename Matrices into RavenStandard (MAP) and Raven into RabevColorAB
(LASA)
@andkov
Copy link
Member Author

andkov commented Feb 15, 2017

@ampiccinin

rename Matrices into RavenStandard (MAP) and Raven into RabevColorAB (LASA).

done (commit: 01354aa) . For right now, I've kept the renaming to the script (as opposed to a csv, which we typically used for more robust and traceable renaming).

@andkov
Copy link
Member Author

andkov commented Feb 15, 2017

@ampiccinin

invert TrailsB scores/values

I'm not sure what operation to perform to accomplish that. Do you mean taking the reciprocal of the point estimate for the slope and the residual? That should be very easy to do. But what about CI? Perhaps we can discuss this when we meet.

@ampiccinin
Copy link
Member

@andkov - re TrailsB: sorry - wrong terminology - just need to reverse the sign. thanks!

@ampiccinin
Copy link
Member

@andkov IJE Instructions:

FIGURES

• Figures should be submitted in editable image formats (such as jpg or tiff, not pdf).
• Illustrations should be numbered and given suitable legends.
• They should be kept separate from the text.
• As standard figures appear in black and white in print, and in colour online. There is no charge for this.
• Authors will be expected to pay if they want their figures reproduced in colour in the print version of the Journal (£350/figure).
• Please state your preferred option (i.e. agreement to pay £350/figure for print and online colour or preference for online-only colour with no charge) upon submission via the online submission system.
• Please ensure that the prepared electronic image files print at a legible size and are of a high quality for publication (600dpi for line drawings; 300dpi for colour and half-tone artwork).
• For useful information on preparing your figures for publication, please see http://cpc.cadmus.com/da.

andkov added a commit that referenced this issue Feb 15, 2017
removed trailsb, changed table headings, restricted width of forest to
-1 and 1
andkov added a commit that referenced this issue Feb 15, 2017
andkov added a commit that referenced this issue Feb 15, 2017
andkov added a commit that referenced this issue Feb 15, 2017
@andkov
Copy link
Member Author

andkov commented Feb 15, 2017

@ampiccinin ,

the adjustment we've discussed have been implemented. Please access the latest versions of report from the stable serving point for pulmonary track.

  • trailsb is a bit finicky, so i'm opting to remove it for now as not to stall the progress.
  • I'm don't see an immediate solution to output as JPEG instead of PNG, but that will NOT be an issue, i just need to find the trick that does it.

@ampiccinin
Copy link
Member

@andkov
IJE says JPEG OK

@ampiccinin
Copy link
Member

@andkov - forest plots look great! Can you adjust the spacing for Visualization? the forest plots themselves seem narrower than for the other domains. since the plot is the focus, we should give it as much space as possible. For publication, actually, I think the font of all headings etc should still be 12pt, so the other columns could all be narrower. Sad, I know - you prefer html - but for the png one we should just use standard (12pt, black) formatting. You can keep the fancy version for online viewing.
Thanks!!!!

@andkov
Copy link
Member Author

andkov commented Feb 20, 2017

TODO:

  • expand the area of the plot relative to text
  • change font size to 12 pt
  • use black color in graphs
  • save as JPEG instead of PNG
  • address the appearance of visualization domain

andkov added a commit that referenced this issue Feb 20, 2017
also added dynamic sizing of the graphs
andkov added a commit that referenced this issue Feb 20, 2017
andkov added a commit that referenced this issue Feb 20, 2017
@andkov
Copy link
Member Author

andkov commented Feb 20, 2017

@ampiccinin,
I've addressed the appearance issues of the forest plots. You can view them here or find them in `./reports/correlation-3/forest-plot-pulmonary/jpeg/ folder after syncing the repository.

I'm still working on pulling them together as a single image, as we discussed. The current solution distorts the images by stretching them to be of the same height, which is a deal breaker. I'm pretty confident I'll find a solution, but just FYI in worst case I can just pull them together in Photoshop. @wibeasley, any advice on how to concatenate multiple jpegs vertically while respecting their ratios? I"ve starting doing it here, but they come out distorted. To clarify, we need a single jpeg: putting them into an html table would have works otherwise, but alas.

@ampiccinin
Copy link
Member

@andkov : forest plots are getting there! I have some picky and not-so-picky requests & queries -

  1. To be comparable, ALL the forest plots should range from -1 to 1. Notice that the R-R default to some scale that "fills the space" - which ends up being different for each variable combination.

  2. I'm OK with saying "Male participants", but not "males" (sorry - pet peeve) ditto for women of course

  3. The "memory" plot should be labeled "immediate and recognition memory", and ideally (same for delayed) should be sorted by these within the plot (if Im being picky I would rather the "rec" at the bottom; for delayed, the digit_tot and logic_tot [but not the digit_b_tot] would go together at the end of the _de)

  4. since the I-I and S-S correlations are technically between their residual correlations (after accounting for the covariates), I wonder whether we need to call the R-R plots something more specific just to be clear. (we can just call the I-I and S-S Intercept and Slope correlations [i.e., leaving out any mention of residual])...or maybe just label all "Correlations between xxx (intercepts, slopes, residuals)" which avoids use of the term "residual correlations" which is I think what is triggering my discomfort.

  5. can you show me what the files we would actually work with are like? Web version looks great. (If you end up needing more space (or making whole thing narrower) you could shift columns 2,3&4 over to the left a bit...)

@andkov
Copy link
Member Author

andkov commented Feb 24, 2017

@ampiccinin ,
Thanks for the detailed suggestions. I agree with all of them. However, some of them may not be possible with forestplot::forestplot(), the function that is currently used to produce this graphics. I'm researching the options, but so far I don't see a way to make the x scale the same or adjust lengths of columns. Sad, but packages like these offer some simplicity of use at the cost of these fine-tuning options. But i'm not giving up yet.

The labeling and sorting, however, should be no problem. Let's see the scope of what forestplot::forestplot() can adjust and then make decision whether it will satisfy our needs. I have already made peace with the prospects of re-doing these graphs in ggplot: learning new things comes with a price. At least I learned what forestplot::forestplot() got to offer, so I can critique it with confidence.

re: 5. Do you mean the jpg files or the data files? I assume you are talking about jpgs. Yes. In order to accommodate the journal requirements I had to resort to a different (then normal) practice: instead of letting the .Rmd files (the dynamic report) print the graphs and then save a copy on disk, I explicitly produce the graphs first (see lines 143-190 in ./reports/correlation-3/correlation-3-pc.R) and then placed them onto the html canvas. In this way I get to dynamically control the height of the graph depending on the number of lines it contains (forestplot prints all of the same size, that's why previous version looked ugly).

The graphs for each of the tracks are saved in corresponding folders:

Each file has a unique name in the form:

track-domain-subgroup-index.jpg

For example:

pulmonary-memory-female-residual.jpg

@ampiccinin
Copy link
Member

@andkov : Great!! Let's try to get by with forest plot::forest plot. The plots are purely a tool for communication (though of course that is the whole point of writing an article!) so we get as close as we can without driving ourselves crazy. Once reviews come back, if anyone comments on them (I sort of doubt the details that bug me will be what stands out to reviewers) we can deal with it then.

andkov added a commit that referenced this issue Mar 1, 2017
andkov added a commit that referenced this issue Mar 1, 2017
into immediate and recognition. also corrected `delayed or working` into
`delayed and working`
andkov added a commit that referenced this issue Mar 1, 2017
andkov added a commit that referenced this issue Mar 1, 2017
@andkov
Copy link
Member Author

andkov commented Mar 1, 2017

@ampiccinin ,
I've incorporated the tweaks you've proposed and produced the forest plots for all three tracks. I have also created the stable access points at ./projects/.../README.md.

However, two things are hard to implement:

I've started it, but it needs a few values filled in. After that is done, we can sort by the type of response in memory tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants