Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored tcplPlot and auxiliary functions to support any number of curves on a comparison plot #325

Open
wants to merge 45 commits into
base: dev
Choose a base branch
from

Conversation

cthunes
Copy link
Contributor

@cthunes cthunes commented Jan 24, 2025

Removed compare.val param from tcplPlot and replaced it with compare. Use compare to choose a field from plot loaded data to compare on. For example, compare = "dsstox_substance_id" will match all of the same chemical across the loaded data (say, fld= "aeid", val = c(<list of 4 endpoints>). If all 4 endpoints test the same chemicals, then the output (for 'pdf') would be a list of plots which contain 4 curves/point sets, each for a different chemical. By default compare = "m4id", which means all samples will be plotted individually, since every m4id/s2id is always unique.

Use dat with preloaded -- and potentially manipulated data -- for more flexibility. For example one could load plot data using tcplPlotLoadData the same way we would in tcplPlot, then add a custom column, say for user decided compare grouping rather than just using the default available fields in compare. If dat is instead a list of data.tables, no compare field is needed and each list item will be interpreted by tcplPlot as a separate comparison plot.

If you have an especially large number of curves to plot on each comparison plot, the new type of compare plot could be useful. There are two new parameters to tcplPlot called and with defaults: group.fld = NULL and group.threshold = 9. This means at the specified (or default of 9) group.threshold value, curves on comparison plots will be grouped differently by color and in the legend, and a verbose table will no longer be printed as it becomes excessively large. The default group.fld if the number of curves on a plot exceeds group.threshold is modl for mc and hitc for sc (up for suggestions on better defaults). Both are fully customizable, so you can use any field available in the data for group.fld, including a custom field if dat is supplied, and any size small or large as a minimum group.threshold to switch over to the other style. Set group.threshold to a large number to effectively disable this functionality. The most common use case currently for this functionality is when plotting an entire endpoint on one comparison plot. Extensive tress testing has not been done for this to find the limits, but so far using a Tox21 endpoint, I have been successful up to about 3000 curves before I run into "node stack overflow" errors. I think ggplot may have some limits on the number of layers.

Note - verbose = TRUE is the new default for tcplPlot!

248 new unit tests passing. devtools::check() successful. Closes #293. Closes #215. Closes #280. Closes #249. Closes #228. Closes #175. Closes #311.

Loec plotting is also included within this pull request.
tcplPlot PR.pptx

also halved the 'number of curves' in the error comparison test cases that had too many plots for console plotting
@cthunes cthunes requested a review from a team January 24, 2025 21:20
@cthunes cthunes self-assigned this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment