Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/repetitive measurements2 #5233

Open
wants to merge 112 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
4a9d453
retrieve more term props; refactor into get_term_props function.
lukasmueller Feb 27, 2024
9cf7ba3
add some comments how the system should work with multiple phenotypic…
lukasmueller Feb 28, 2024
a8cf000
add trait_repeat_type to the load_trait_props.pl script and its backend.
lukasmueller Mar 1, 2024
a8d6236
adjust storing logic to trait_repeat_type parameter.
lukasmueller Mar 1, 2024
343188c
fix some issues with variable declarations.
lukasmueller Mar 1, 2024
b441938
fix issue with query parameters.
lukasmueller Mar 3, 2024
a03f71f
fix typo in variable name
lukasmueller Mar 5, 2024
21c7f4d
add support for multiple measurements.
lukasmueller Mar 5, 2024
b57d07b
add support for multiple observations.
lukasmueller Mar 5, 2024
ad1be34
fix typo in variable name.
lukasmueller Mar 5, 2024
3570705
implement multiple values features.
lukasmueller Mar 8, 2024
66132e9
add support for multiple measurements for native search in phenotype …
lukasmueller Mar 13, 2024
aab1ed3
comment out debug info.
lukasmueller Mar 13, 2024
479501f
add missing Perl module CXGN::Trial.
lukasmueller Mar 13, 2024
4d4dde8
remove debug messages.
lukasmueller Mar 15, 2024
7277658
Merge branch 'master' into topic/repetitive_measurements
lukasmueller Mar 25, 2024
24b7f2a
add phenotype matrix test.
lukasmueller Mar 29, 2024
1a97caf
add trait_repeat_type to trait properties add function.
lukasmueller Mar 31, 2024
822dcc8
Merge branch 'master' into topic/repetitive_measurements
lukasmueller Mar 31, 2024
d7b849b
add repetitive_measurements key and tweak pod.
lukasmueller Apr 1, 2024
30260e0
prevent errors when there are no observations.
lukasmueller Apr 1, 2024
ceb1fc7
fix typo in assignment.
lukasmueller Apr 1, 2024
bab60c9
add new accessor for repetitive measurements.
lukasmueller Apr 1, 2024
917c998
rename multiple_observations_treatment to repetitive_measurements; ad…
lukasmueller Apr 1, 2024
4e79541
add trait repeat prop cvterm and transfer timestamps to correct field.
lukasmueller Apr 2, 2024
4b3a0b5
add time based queries to phenotype download in the wizard
lukasmueller May 23, 2024
25a058d
add date related queries to downloads.
lukasmueller May 24, 2024
e5d96fc
resolve conflicts.
lukasmueller Nov 1, 2024
6bc876b
add separate section for repetitve measurements in trial detail page
Sri-2023 Nov 1, 2024
07752bb
fix the ID names; and fix indentation in repetitive_measurement options
Sri-2023 Nov 1, 2024
f74c9d4
add value attributes to reptitive values in search page
Sri-2023 Nov 1, 2024
ff1fb0c
Fix ID names for repetitive options in download phenotypes dialog
Sri-2023 Nov 1, 2024
20f9d1e
swap 'phenotype_raw_data' and 'phenotype_summary_statistics' sections…
Sri-2023 Nov 1, 2024
c5d93d5
add date range option and slider to phenotype summary stats section t…
Sri-2023 Nov 2, 2024
6c43e6c
add start_date, and end_date params to the trait_pheno and pheno_summ…
Sri-2023 Nov 2, 2024
cc88f3b
add trial_collect_date_range endpoint to retrieve trial start and end…
Sri-2023 Nov 3, 2024
66c6264
add start_date and end_date to search query
Sri-2023 Nov 3, 2024
0da16d5
refactor to use the consistent trial_id
Sri-2023 Nov 3, 2024
0dc88f9
add small line graph to see trend, and zoomable large graph to view
Sri-2023 Nov 4, 2024
57b9872
fix typo
Sri-2023 Nov 4, 2024
1eaf2e0
add title to line_graph
Sri-2023 Nov 4, 2024
afb829f
correct repetitive_params in download_phenotypes_action
Sri-2023 Nov 4, 2024
bdad8ea
add seperate section to show # of phenotypes recorded in a trial; and…
Sri-2023 Nov 4, 2024
fbcff9c
Merge remote-tracking branch 'origin/master' into topic/repetitive_me…
Sri-2023 Nov 4, 2024
539e153
add 'sum' option to the repetitive_values
Sri-2023 Nov 4, 2024
5dae445
handle sum values
Sri-2023 Nov 4, 2024
dc9442a
comment out alter messages
Sri-2023 Nov 5, 2024
5ebe930
two options to retrieve all values: single line and multi line
Sri-2023 Nov 5, 2024
caf981f
fix BrApi phenotyping test
Sri-2023 Nov 5, 2024
2c79bd3
parse the time-series repetitive values
Sri-2023 Nov 5, 2024
4a7ffd9
handle collect_date in Native search
Sri-2023 Nov 5, 2024
5da8d4b
add package for storing phenotype additional info and references
Sri-2023 Nov 5, 2024
4f0a173
add accessors to max, min, duplicates, value_count
Sri-2023 Nov 5, 2024
4fd9b51
modify POD
Sri-2023 Nov 7, 2024
7f863de
missing true value
Sri-2023 Nov 7, 2024
42e1f9b
add date params get_phennotypes_for_trait function
Sri-2023 Nov 8, 2024
bedbe71
refactor phenotypes in-progress
Sri-2023 Nov 8, 2024
15771e4
add the sum_observations method repetitive_values; fix indent
Sri-2023 Nov 11, 2024
07f0303
fix phenotype_multi_categories test
Sri-2023 Nov 11, 2024
23cfc93
update PhenotypeMatrix test with all_values_single_line and sum params
Sri-2023 Nov 11, 2024
0b90a30
refactor phenotypes to store the multiple measurement values
Sri-2023 Nov 11, 2024
6908c3e
fix the BrAPI phenotying test
Sri-2023 Nov 12, 2024
c8d4183
refactor phenotypes
Sri-2023 Nov 12, 2024
bc189ff
Merge remote-tracking branch 'origin/master' into topic/repetitive_me…
Sri-2023 Nov 12, 2024
8d46891
fix ValidateNIRS test.
lukasmueller Nov 12, 2024
898bfbd
exlude notes from trait check.
lukasmueller Nov 13, 2024
2b5da46
update the default end date to the current date
Sri-2023 Nov 13, 2024
4571f3c
fix indent
Sri-2023 Nov 13, 2024
24c4eb4
exclude notes from normal phenotype storage.
lukasmueller Nov 13, 2024
dd9c667
make test labels distinct.
lukasmueller Nov 13, 2024
ced594f
fix a test, one more to go.
lukasmueller Nov 14, 2024
2b3c491
fix typo (oops) and fix last test.
lukasmueller Nov 14, 2024
83c8c59
remove duplicated repeat_type
Sri-2023 Nov 15, 2024
77f5907
filter the values based on the timestamp
Sri-2023 Nov 15, 2024
2488cf9
make the values of end_date inclusive, when they select the last date…
Sri-2023 Nov 19, 2024
61557ec
comment the sorting values by date from line_graph package
Sri-2023 Nov 19, 2024
a169caf
sort observations by date; and use latest(last) date as timestamp for…
Sri-2023 Nov 21, 2024
8899f08
Merge remote-tracking branch 'origin/master' into topic/repetitive_me…
Sri-2023 Nov 21, 2024
da35fb7
add timestamp check for the avg and sum options
Sri-2023 Nov 22, 2024
ac38f76
Merge remote-tracking branch 'origin/master' into topic/repetitive_me…
Sri-2023 Nov 22, 2024
23f68c2
modify the phenotype_matrix function to retrieve reptitive values in …
Sri-2023 Dec 3, 2024
3809678
modify note field to store multiple entries for repetitive_measurements
Sri-2023 Dec 3, 2024
ac590c9
typo
Sri-2023 Dec 3, 2024
5e90e90
use clean_up_db to clean up some tests; in SGN::Test::Fixture, also c…
lukasmueller Dec 3, 2024
4efa217
clean up db after some tests and also re-run matviews so they are cor…
lukasmueller Dec 4, 2024
600276b
try to fix tests.
lukasmueller Dec 4, 2024
d092c16
hmmm... changing the order of the values may not help in the long run...
lukasmueller Dec 6, 2024
10b6046
more sustainable approach is to sort the returned data before checking.
lukasmueller Dec 6, 2024
31d80c3
more attempts to make tests pass...
lukasmueller Dec 6, 2024
d5ea2c8
Merge branch 'master' into topic/repetitive_measurements2
lukasmueller Dec 6, 2024
a6f8ba0
fix upload_phenotype tests to handle potential delays
Sri-2023 Dec 12, 2024
46ff305
add strict and warnings options
Sri-2023 Dec 12, 2024
ed4cc06
add ability to load multiple phenotypic values by pushing on an array…
Dec 12, 2024
4479cc8
Merge remote-tracking branch 'origin/master' into topic/repetitive_me…
Sri-2023 Dec 12, 2024
2a6aea6
Merge remote-tracking branch 'origin/topic/repetitive_measurements2' …
Sri-2023 Dec 12, 2024
666d1bb
fix lint issues
Sri-2023 Dec 12, 2024
a32aa30
make Phenotype_with_multi_categories.t test pass more reliably by che…
lukasmueller Dec 12, 2024
487986a
fix all linting issues
Sri-2023 Dec 13, 2024
55dca30
update branch
Sri-2023 Dec 13, 2024
66076d9
fix this last lint issue
Sri-2023 Dec 13, 2024
39e79b9
update the manual with repetitive_measurements
Sri-2023 Dec 16, 2024
406823c
update bookdown docs
invalid-email-address Dec 16, 2024
ad285fc
make trait values dynamic as trait option changes
Sri-2023 Dec 17, 2024
1356040
update bookdown docs
invalid-email-address Dec 17, 2024
55af6b1
requested to make a defualt option - as Average on UI under repetitiv…
Sri-2023 Dec 17, 2024
47ebabf
typo in passing the type param
Sri-2023 Dec 17, 2024
4bb21d0
update bookdown docs
invalid-email-address Dec 17, 2024
873a49c
add a conversion file for ncsu excel format.
lukasmueller Dec 22, 2024
7fdb481
update bookdown docs
invalid-email-address Dec 22, 2024
66e8f49
Merge remote-tracking branch 'origin/master' into topic/repetitive_me…
Sri-2023 Jan 9, 2025
539f65f
Merge remote-tracking branch 'origin/topic/repetitive_measurements2' …
Sri-2023 Jan 9, 2025
ad65532
update bookdown docs
invalid-email-address Jan 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
336 changes: 336 additions & 0 deletions bin/convert_ncsu_excel_to_obo.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,336 @@

=head1 NAME

convert_excel_to_obo - a script to convert a spreadsheet based representation of an ontology to an obo file format

=head1 DESCRIPTION

Based on CXGN::File::Parse, this script can parse tab delimited or Excel formats (xls or xlsx) as follows:

perl convert_excel_to_obo.pl -n CO_999 -i file.xlsx -o ontology.obo

=head1 AUTHOR

Lukas Mueller <[email protected]>

October 2024

=cut

use strict;

use utf8;
use Getopt::Std;
use Data::Dumper;
use CXGN::File::Parse;

our ($opt_n, $opt_i, $opt_o, $opt_h);

getopts('n:i:o:h');

my $file = $opt_i;
my $ontology_name = $opt_n || "GENERIC";

if (!$file) {
die "Please privde a file using the -i parameter.";
}

my $outfile = $file.".obo";
my $cvpropfile = $file.".props";

open(my $F, ">", $outfile) || die "Can't open file $outfile\n";
open(my $G, ">", $cvpropfile) || die "Can't open cvprop file $cvpropfile for writing";

#Curation Variable ID Variable name Variable label Variable description Variable synonyms Context of use Growth stage Variable status Variable Xref Institution Scientist Date Language Crop Trait ID Entity Attribute Trait name Trait class Trait description Trait synonyms Main trait abbreviation Alternative trait abbreviations Trait status Trait Xref Method ID Method name Method class Method description Method Abbreviation Formula Method reference Scale ID Scale name Scale Abbreviation Scale class Scale Xref Cat 1 code Cat 1 description Cat 2 code Cat 2 description Cat 3 code Cat 3 description Cat 4 code Cat 4 description Cat 5 code Cat 5 description Cat 6 code Cat 6 description Cat 7 code Cat 7 description Cat 8 code Cat 8 description Cat 9 code Cat 9 description Cat 10 code Cat 10 description

my @col_headers = ("Variable"," Term Name - BB", "Trait class", "Term Definition", "Variable Full Name", "Synonyms", "Trait - CO", "Main trait abbreviation", "Entity", "Attribute", "Method Name", "Method class", "Method description", "Method Abbreviation", "Formula", "Scale name", "Scale abbreviation", "Scale class", "Category 1", "Category 2", "Category 3", "Category 4", "Category 5", "Category 6", "Category 7", "Category 8", "Category 9", "Category 10", "Category 11", "Category 12" );

# column labels
#
my $trait_class = "Trait class";
my $trait_name = "Trait name";
my $trait_definition = "Trait description";
my $trait_synonyms = "Trait synonyms";
my $variable_synonyms = "Variable synonyms";
my $trait_id = "Trait ID";
my $variable_name = "Variable name";
my $variable_definition = "Variable description";
my $variable_label = "Variable label";
my $variable_id = "Variable ID";
my $method_id = "Method ID";
my $method_name = "Method name";
my $method_class = "Method class";
my $method_description = "Method description";
my $scale_id = "Scale ID";
my $scale_name = "Scale name";
my $scale_class = "Scale class";
my $scale_description = "Scale description";
my $scale_abbreviation = "Scale abbreviation";
my $entity = "Entity";
my $attribute = "Attribute";
my $categories = "Categories";
my $class_id = "Class ID";
my $class_name = "Class name";

my $parser = CXGN::File::Parse->new( file => $file );

my $parsed = $parser->parse();

if ($parsed->{errors}) {
warn "The following errors occurred while parsing file $file: ".Dumper($parsed->{errors})."\n";
}

my $data = $parsed->{data};

# get all the trait classes
#
my %trait_classes;
my %traits;
my %variables;

foreach my $d (@$data) {
$trait_classes{$d->{$trait_class}}->{count}++;
}
print STDERR "TRAIT CLASSES: ".Dumper(\%trait_classes);

foreach my $d (@$data) {
my $tn = $d->{$trait_name};
print STDERR "Parsing TRAIT NAME $trait_name\n";
if (! $tn) { next; }
$traits{$tn}->{$trait_id} = $d->{$trait_id};
$traits{$tn}->{$trait_class} = $d->{$trait_class};

print STDERR "TRAIT NAME $trait_name has TRAIT CLASS $d->{$trait_class}\n";

$traits{$tn}->{$trait_definition} = $d->{$trait_definition};
}

print STDERR "TRAITS: ".Dumper(\%traits);



foreach my $d (@$data) {
my $vn = $d->{$variable_name};
if (! $vn) { next; }
$variables{$vn}->{$variable_id} = $d->{$variable_id};
$variables{$vn}->{$variable_synonyms} = $d->{$variable_synonyms};
$variables{$vn}->{$trait_name} = $d->{$trait_name};
$variables{$vn}->{$trait_definition} = $d->{$trait_definition};
$variables{$vn}->{$entity} = $d->{$entity};
$variables{$vn}->{$attribute} = $d->{$attribute};
$variables{$vn}->{$method_name} = $d->{$method_name};
$variables{$vn}->{$scale_abbreviation} = $d->{$scale_abbreviation};
$variables{$vn}->{$variable_label} = $d->{$variable_label};
$variables{$vn}->{$scale_name} = $d->{$scale_name};
$variables{$vn}->{$scale_class} = $d->{$scale_class};
$variables{$vn}->{$categories} = $d->{$categories};
print STDERR "TERM NAME - CO IN variable = $d->{$trait_name}\n";
$variables{$vn}->{$trait_name} = $d->{$trait_name};
}
print STDERR "VARIABLES: ".Dumper(\%variables);


my $root_id = format_ontology_id($opt_n, 0);
my $count = $root_id;
my $acc = sprintf "%07d", $count; # the number after the ontology name and a colon

print STDERR "Starting at term $ontology_name:$acc ...\n";

# write obo header
#
print $F <<HEADER;
format-version: 1.2
date: 10:03:2024 17:10
saved-by: Lukas_Mueller
default-namespace: $ontology_name
ontology: CO_365

HEADER

# write cvprops header
#
print $G join("\t", "trait_name", "trait_format", "trait_default_value", "trait_minimum", "trait_maximum", "trait_categories", "trait_details")."\n";


# read header
#
my $header = <$F>;

my $root_acc = $acc;
my $root_name = "ROOT";

print $F <<TERM;

[Term]
id: $ontology_name:$acc
name: ROOT
namespace: $ontology_name

TERM


$count++;

foreach my $k (sort keys %trait_classes) {

print $F format_trait(
$ontology_name,
$class_id,
$class_name,
$class_name,
undef,
$ontology_name,
$root_name,
)."\n";

$trait_classes{$k}->{acc} = $class_id;
#$trait_classes{$k}->{name} = $k;

$count++;

}

foreach my $k (sort keys %traits) {
print $F format_trait(
$ontology_name,
$traits{$k}->{$trait_id},
$traits{$k}->{$variable_name},
$traits{$k}->{$trait_definition},
$traits{$k}->{$trait_synonyms},
$trait_classes{ $traits{$k}->{$trait_class} }->{acc}, # parent id
$traits{$k}->{$trait_class}, # parent trait
)."\n";

$traits{$k}->{name} = $traits{$k}->{$trait_name};
$traits{$k}->{acc} = $traits{$k}->{$trait_id};
$count++;
}



foreach my $k (sort keys %variables) {

my $parent_trait = $variables{$k}->{$trait_name};
my $parent_trait_id = $traits{$variables{$k}->{'Trait - CO'}}->{acc};
my $parent_trait_name = $traits{ $variables{$k}->{'Trait -CO'}}->{name};

print STDERR "VARIABLE: $k. PARENT TRAIT: $parent_trait\n";

print $F format_variable(
$ontology_name,
$count,
$k, ###$variables{$k}->{'Variable Full Name'},
join(" - ", $variables{$k}->{'Term Definition'}),
$variables{$k}->{'Synonym'},
$traits{$variables{$k}->{'Trait - CO'}}->{acc}, # parent trait id
$traits{$variables{$k}->{'Trait - CO'}}->{name}, # parent trait

)."\n";

print $G format_props(
$k, # variable name
$ontology_name,
$count,
$variables{$k}->{'Scale class'},
$variables{$k}->{Categories},
);

$count++;

}

close($F);
close($G);

print STDERR "Script completed.\n";

sub format_props {
my $trait_name = shift;
my $ontology_name = shift;
my $count = shift;
my $trait_format = shift;
my $categories = shift;

my $trait_default_value = shift;
my $trait_minimum = shift;
my $trait_maximum = shift;
my $trait_details = shift;

return join ("\t", $trait_name."|".format_ontology_id($ontology_name, $count), $trait_format, $trait_default_value, $trait_minimum, $trait_maximum, $categories, $trait_details)."\n";


}


sub format_ontology_id {
my $ontology_name = shift;
my $acc = shift;

return $ontology_name.":".sprintf "%07d", $acc;
}

sub format_trait {
my $ontology_code = shift;
my $id = shift;
my $name = shift;
my $description = shift;
my $synonyms = shift;
my $parent_class_id = shift;
my $parent_trait = shift;

my $trait_id = format_ontology_id($ontology_code, $id);
my $parent_trait_id = format_ontology_id($ontology_code, $parent_class_id);

my %record = (
"[Term]" => "",
"id:" => $trait_id,
"name:" => $name,
"def:" => "\"$description\" []",
"synonym:" => $synonyms,
"namespace:" => $ontology_name,
"is_a:" => "$parent_trait_id ! $parent_trait",
);

my $data = "";
foreach my $k ("[Term]", "id:", "name:", "def:", "synonym:", "namespace:", "is_a:") {
if (defined($record{$k})) {
$data .= "$k $record{$k}\n";
}
}

return $data;
}


sub format_variable {
my $ontology_code = shift;
my $id = shift;
my $name = shift;
my $description = shift;
my $synonyms = shift;
my $parent_trait_id = shift;
my $parent_trait_name = shift;

#print STDERR "Parent trait name: $parent_trait_name\n";

my $variable_id = format_ontology_id($ontology_code, $id);
my $parent_trait_id = format_ontology_id($ontology_code, $parent_trait_id);
my %record = (
"[Term]" => "",
"id:" => $variable_id,
"name:" => $name,
"def:"=> "\"$description\" []",
"synonym:" => $synonyms,
"namespace:" => $ontology_name,
"relationship:" => "variable_of $parent_trait_id ! $parent_trait_name",
);

my $data = "";
foreach my $k ("[Term]", "id:", "name:", "def:", "synonym:", "namespace:", "relationship:") {
if (defined($record{$k})) {
$data .= "$k $record{$k}\n";
}
}

return $data;
}
7 changes: 5 additions & 2 deletions bin/load_trait_props.pl
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,21 @@ =head2 DESCRIPTION
trait_maximum
trait_categories
trait_details
trait_repeat_type

trait_name: the name of the variable human readable form (e.g., "plant height in cm")
trait_format: can be numeric, qualitative, date or boolean
trait_default_value: is the value if no value is given
trait_categories: are the different possible names of the categories, separated by /, for example "1/2/3/4/5"
trait_details: string describing the trait categories
trait_repeat_type: one of 'single', 'multiple', 'time_series'

=head2 AUTHOR

Jeremy D. Edwards ([email protected])
Jeremy D. Edwards ([email protected]) - initial script, April 2014
Lukas Mueller ([email protected]) - added trait_repeat_type, Feb 2024


April 2014

=head2 TODO

Expand Down
Loading