Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge_samples checks for single value before coercing to numeric #893

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

benjjneb
Copy link
Contributor

@benjjneb benjjneb commented Mar 5, 2018

Currently merge_samples forces everything to a numeric before the
(parameterized) merge function is applied to the resulting vectors.
This loses all character information (since characters are coerced to
NA) and usable factor information (since the mapping between values and
levels are lost).

This update first checks to see if there is just a single value in each
group, and if so returns that single value, before applying the current
coerce/mean approach. This preserves character values when they are the
same within the merged group.

It also gives users more power to provide a custom funciton to handle
merging, as the numeric coersion is no longer hardcoded into the
method, but is not inside the fun argument.

Currently merge_samples forces everything to a numeric before the
(parameterized) merge function is applied to the resulting vectors.
This loses all character information (since characters are coerced to
NA) and usable factor information (since the mapping between values and
levels are lost).

This update first checks to see if there is just a single value in each
group, and if so returns that single value, before applying the current
coerce/mean approach. This preserves character values when they are the
same within the merged group.

It also gives users more power to provide a custom funciton to handle
merging, as the numeric coersion is no longer hardcoded into the
method, but is not inside the `fun` argument.
@joey711
Copy link
Owner

joey711 commented Apr 1, 2018

Hi @benjjneb ... I think fixing this might take a bit more. Dropping columns (sample variables in this context) is probably not a good general solution because it will be somewhat opaque to users. Not that the current behavior is very good, either.

There is a hackathon going on right now, developing this:

https://github.com/HCBravoLab/MicrobiomeExperiment

if it is possible for phyloseq to migrate over to depend on it, then it should imbibe conventional BioC behavior that might help with operations like the one here. Example might be mergeReplicates

Also, let me know if I'm missing the point...

Another alternative to what you've proposed is to include a default "collapse" function for discrete variables (e.g. paste0(unique(x), collapse = ", "))...

@roey-angel
Copy link

Hi, this seems to be unresolved yet.
What about applying a mean to all numeric variables in sample_data() and something like summarise_all(list(~paste(unique(.), collapse = ","))) to all other?
This will retain all unique chr and fct values.

In the meantime, I use this function to replace merge_samples()

@lvclark
Copy link

lvclark commented Sep 25, 2020

Stumbled over this as well, since I wanted to merge samples but still have categorical covariates to use in ggplot. What about, for categorical (character, factor) variables, checking if there is one unique value, and returning that value if so but otherwise returning NA? I'm looking for things to do for Hacktoberfest, so I could set up a pull request if you think it would be useful.

@lvclark
Copy link

lvclark commented Sep 28, 2020

On closer examination, I won't do a pull request for this, since the changes I would make are very similar to the changes made here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants