-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input file problem #124
Comments
Hi Alla, After reviewing your error messages, I noticed the issue likely stems from your TSV file format. Looking at the error message, it seems the first line contains "# Constructed from biom file" which is causing the parsing issues. Could you try:
This should resolve the parsing error and allow ggpicrust2 to properly read your feature table. Let me know if you need any further assistance. Best regards, |
Hi, But I can't do the analysis, I got another errors after that( """"''' Run ggpicrust2 with input file path
Starting the ggpicrust2 analysis... Converting KO to KEGG... Loading data from file... ℹ Use Sample names extracted. Starting pathway annotation... The number of statistically significant pathways exceeds the database's query limit. Please consider breaking down the analysis into smaller queries or selecting a subset of pathways for further investigation. Returning DAA results filtered annotation data frame... The following pathways are missing annotations and have been excluded: ko05340, ko00564, ko00562, ko00563, ko03030, ko00561, ko00440, ko00250, ko04062, ko00740, ko00195, ko04650, ko03450, ko00920, ko00311, ko00310, ko04146, ko00600, ko04140, ko04142, ko00604, ko04260, ko05142, ko04540, ko04710, ko04712, ko00909, ko00513, ko05110, ko04974, ko04976, ko00450, ko01051, ko00565, ko00904, ko00524, ko00300, ko00905, ko00402, ko03440, ko00750, ko00950, ko05140, ko00592, ko00591, ko00590, ko00062, ko04662, ko03070, ko00253, ko03060, ko04370, ko04730, ko04740, ko00380, ko00500, ko05120, ko04666, ko04966, ko05322, ko04964, ko05320, ko04962, ko04960, ko04660, ko00625, ko00624, ko00627, ko00626, ko00623, ko00622, ko00270, ko04380, ko00941, ko00943, ko00100, ko00945, ko01057, ko01056, ko05016, ko01058, ko04145, ko00071, ko00072, ko04360, ko05219, ko05218, ko05216, ko05215, ko05213, ko05211, ko01055, ko00902, ko05330, ko00534, ko04910, ko00531, ko04916, ko00533, ko00532, ko00360, ko00633, ko00363, ko00364, ko05130, ko00121, ko04914, ko00130, ko03050, ko00361, ko00040, ko00730, ko00362, ko01040, ko00603, ko03018, ko04270, ko00281, ko00280, ko03013, ko04626, ko05200, ko00601, ko03015, ko00312, ko05143, ko00523, ko00520, ko00521, ko05146, ko00052, ko00051, ko00400, ko04020, ko00350, ko00480, ko00643, ko00640, ko00720, ko00120, ko00965, ko04614, ko04340, ko00980, ko00410, ko00983, ko05150, ko00791, ko05131, ko04711, ko00020, ko00710, ko00196, ko02060, ko00340, ko00785, ko00550, ko00650, ko03320, ko04744, ko04745, ko00522, ko04612, ko04621, ko04620, ko04623, ko04622, ko04971, ko00460, ko04970, ko00830, ko00780, ko00511, ko00970, ko00030, ko00232, ko00230, ko04120, ko04350, ko00540, ko03022, ko03020, ko00982, ko04630, ko03010, ko05100, ko00331, ko05310, ko00908, ko04930, ko04320, ko03430, ko00906, ko00901, ko04520, ko00903, ko00471, ko00472, ko00473, ko04510, ko00942, ko04810, ko04210, ko00240, ko04012, ko04011, ko00944, ko04113, ko04640, ko04310, ko03420, ko04912, ko00670, ko04672, ko04920, ko05160, ko04144, ko00930, ko04112, ko04720, ko04722, ko04075 """"""""""" I checked metadata file it looks ok to me metadata$
Thank you, |
Hi Alla, I see you've resolved the first issue, but now encountering errors with pathway annotations and visualizations. From the error messages, it seems the all-in-one I suggest using our step-by-step pipeline instead, which gives you more control over each stage of the analysis. You can follow these steps:
kegg_abundance <- ko2kegg_abundance(file = abundance_file)
daa_results_df <- pathway_daa(abundance = kegg_abundance,
metadata = metadata,
group = "Sample type",
daa_method = "LinDA",
select.taxa = NULL,
reference = "Controls")
pathway_annotation_df <- pathway_annotation(pathway = "KO",
daa_results_df = daa_results_df,
ko_to_kegg = TRUE)
pathway_errorbar_plot <- pathway_errorbar(abundance = kegg_abundance,
daa_results_df = daa_results_df,
pathway_annotation_df = pathway_annotation_df,
group = "Sample type",
p_values_threshold = 0.05,
order = "pathway_class",
select_pathway = NULL,
p_value_bar = TRUE,
x_lab = "pathway_name") This approach will help you identify where exactly the analysis might be failing and give you more flexibility in adjusting parameters at each step. Let me know if you need any clarification or run into other issues. Best regards, |
Hello Chen, I have tried to follow your instructions and have encountered the following problems """""""
Error in pathway_daa(abundance = kegg_abundance, metadata = metadata, : """"""""'
Sample names extracted.
Starting pathway annotation... The number of statistically significant pathways exceeds the database's query limit. Please consider breaking down the analysis into smaller queries or selecting a subset of pathways for further investigation. Returning DAA results filtered annotation data frame... """"
Starting pathway annotation... The number of statistically significant pathways exceeds the database's query limit. Please consider breaking down the analysis into smaller queries or selecting a subset of pathways for further investigation. Returning DAA results filtered annotation data frame... If I understand correctly, I have too many pathways in df, I need to reduce them. Also in pathway_annotation_df I have N/A in the pathway_name, description etc columns, so I can't even get those names on a heatmap for example. Maybe there is a way to just get the pathway names and groups and visualise it against the sample names, without even statistics, just take top of some ammount? Thank you for your time |
Hello,
I have the output file from analysis of 16s dataset, picrust2 plagin was used in qiime2, then I have converted the biom file to tsv format. Now I am trying to visualise results, I have some issues. I have attached the tsv table picture, I think something wrong with it.
First I tried -
"""""
library(readr)
library(ggpicrust2)
library(tibble)
library(tidyverse)
library(ggprism)
library(patchwork)
Load necessary data: abundance data and metadata
abundance_file <- "/home/output/path_exported/ko_feature_table.biom.tsv"
metadata <- read_delim(
"/home/sample-metadata.tsv",
delim = "\t",
escape_double = FALSE,
trim_ws = TRUE
)
Run ggpicrust2 with input file path
results_file_input <- ggpicrust2(file = abundance_file,
metadata = metadata,
group = "Sample type", # For example dataset, group = "Environment"
reference = "Controls",
pathway = "KO",
daa_method = "LinDA",
ko_to_kegg = TRUE,
order = "pathway_class",
p_values_bar = TRUE,
x_lab = "pathway_name")
metadata$
Sample type
<- as.factor(metadata$Sample type
)levels(metadata$
Sample type
)"""
I got next mistakes-
"""
Starting the ggpicrust2 analysis...
Converting KO to KEGG...
Loading data from file...
Rows: 10556 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): # Constructed from biom file
ℹ Use
spec()
to retrieve the full column specification for this data.ℹ Specify the column types or set
show_col_types = FALSE
to quiet this message.Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|======================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 0.01 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Performing pathway differential abundance analysis...
Sample names extracted.
Identifying matching columns in metadata...
Matching columns identified: #SampleID . This is important for ensuring data consistency.
Using all columns in abundance.
Converting abundance to a matrix...
Reordering metadata...
Converting metadata to a matrix and data frame...
Extracting group information...
Running LinDA analysis...
Error in relevel.factor(LinDA_metadata_df$Group_group_nonsense_, ref = reference) :
'ref' must be an existing level
In addition: Warning message:
One or more parsing issues, call
problems()
on your data frame for details, e.g.:dat <- vroom(...)
problems(dat)
Starting the ggpicrust2 analysis...
Converting KO to KEGG...
Loading data from file...
Rows: 10556 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): # Constructed from biom file
ℹ Use
spec()
to retrieve the full column specification for this data.ℹ Specify the column types or set
show_col_types = FALSE
to quiet this message.Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|======================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 0.01 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Performing pathway differential abundance analysis...
Sample names extracted.
Identifying matching columns in metadata...
Matching columns identified: #SampleID . This is important for ensuring data consistency.
Using all columns in abundance.
Converting abundance to a matrix...
Reordering metadata...
Converting metadata to a matrix and data frame...
Extracting group information...
Running LinDA analysis...
Error in relevel.factor(LinDA_metadata_df$Group_group_nonsense_, ref = reference) :
'ref' must be an existing level
In addition: Warning message:
One or more parsing issues, call
problems()
on your data frame for details, e.g.:dat <- vroom(...)
problems(dat)
"""""""
"""""""
I am not sure what this is about.
When I tried to check problems - seems there are no such things, but there is no input file which is there - the path are written correctly
""""problems(metadata)
A tibble: 0 × 5
ℹ 5 variables: row , col , expected , actual , file
"""""""""""
metadata <- read_delim("/home/sample-metadata.tsv", delim = "\t", escape_double = FALSE, trim_ws = TRUE)
Rows: 24 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (4): #SampleID, Composition, Freezing time, Sample type
dbl (2): count_reads, Layers
ℹ Use
spec()
to retrieve the full column specification for this data.ℹ Specify the column types or set
show_col_types = FALSE
to quiet this message."""""""'''
then
"""""""""
ℹ Use
spec()
to retrieve the full column specification for this data.ℹ Specify the column types or set
show_col_types = FALSE
to quiet this message.Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|======================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 0.01 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Warning message:
One or more parsing issues, call
problems()
on your data frame for details, e.g.:dat <- vroom(...)
problems(dat)
"""""""""
I have tried next -
""""""
ko_abundance_file <- "/home/output/path_exported/ko_feature_table.biom.tsv"
Loading data from file...
Rows: 10556 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): # Constructed from biom file
ℹ Use
spec()
to retrieve the full column specification for this data.ℹ Specify the column types or set
show_col_types = FALSE
to quiet this message.Loading KEGG reference data. This might take a while...
Performing KO to KEGG conversion. Please be patient, this might take a while...
|======================================================================================| 100%
KO to KEGG conversion completed. Time elapsed: 0.01 seconds.
Removing KEGG pathways with zero abundance across all samples...
KEGG abundance calculation completed successfully.
Warning message:
One or more parsing issues, call
problems()
on your data frame for details, e.g.:dat <- vroom(...)
problems(dat)
""""""""""""'""
here i have an emply "kegg_abundance" variable in the Rstudio
I think something wrong with the input tsv table, but I cant understand what and how to fix it.
I would appreciate any help
Thank you for your time
Best,
Alla
The text was updated successfully, but these errors were encountered: