forked from tdhock/PeakSegJoint
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
156 lines (105 loc) · 4.83 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
2016.01.22
Docs for PeakSegJointError.
2015.11.03
Step6 outputs PeakSegJoint-predictions-viz/index.html which shows peak
predictions with links to the UCSC genome browser. If an Input sample
group is present, then we make a facetted scatterplot of Input versus
other groups. If no Input sample group is present then we make a bar
plot which shows only the other groups. If there are labels for Input
samples, then we only report "specific" peaks in the output bed
files. Specific peaks are defined as being present in not too many
Input samples (threshold picked by minimizing the number of incorrect
regions where Input samples are labeled as either up or down).
2015.09.21
Run test error estimation in parallel on chunk.order.seed and
test.fold.
2015.09.15
Plot test fold distribution across chromosomes on test error summary
page.
peak.or.null returns model with 1 peak (not S peaks as was before).
bugfix with train error animint for data sets with many (>100)
samples.
2015.09.14
exec/Step*.R scripts re-organized so that the genome prediction
problem may be split into any number of jobs, rather than just one job
per chromosome. The number of jobs can be specified as the environment
variable JOBS or the R variable n.jobs when running
00_AllSteps_qsub.R.
readBigWig is faster since we now use stdout + fread, rather than
writing bedGraph to an intermediate file.
2015.09.11
Re-factor lots of the code so that we can handle an individual that
has samples for more than one cell type. The inst/examples/ data set
now includes bcell/McGill0322.bigwig and tcell/McGill0322.bigwig to
simulate this case (although these two samples do not in fact come
from the same individual). Now we call "bcell" and "tcell" the
sample.group rather than cell.type, since sometimes e.g. for analyzing
a bunch of H3K27ac samples, we want to include an "Input" negative
control sample group (and this is not a cell type but a different
experiment type).
facet_grid(sample.group + sample.id ~ .) to see the sample group in
the data viz output.
Bugfix for test error computation in the output generated by
exec/Step3e-estimate-test-error.R.
IntervalRegressionProblems(factor.regularization=NULL) means compute
just one model with initial.regularization (rather than a sequence of
increasingly more regularized models). This is useful after CV has
been used to choose the best regularization parameter, and we just
want to fit the model with that parameter to the entire
train+validation set.
2015.09.04
In pipeline, after training, for every test fold, plot test error as a
function of number of training chunks, in order to see if the model
could be improved by adding more labels.
Use maxjobs.mclapply, a thin wrapper around mclapply which limits its
memory usage, and avoids getting jobs killed on the cluster.
2015.08.06
BUGFIX for feature computation when one or more samples has no/zero
coverage data.
2015.08.05
Support for data sets with two or more label files.
Support for making predictions on samples with no labels at all --
these samples should be used in both the fitting and prediction steps.
2015.07.30
test-qsub-pipeline.R makes sure the exec/*.R pipeline scripts run with
no errors.
PeakSegJoint.c changed so that the likelihood function is defined from
the first base with data to the last base with data, on any sample
(before, it was ALL samples, and this is problematic now that we
assume the input is a sparse profile).
2015.07.14
Remove "read one sample coverage file and save the coverage profile
for each labeled chunk" step from exec/ pipeline, since it is so fast
with bigwig files now.
2015.06.19
To support sparse bigwig data (created with -bg rather than -bga
option to coverageBed, resulting in no rows with 0 coverage), binSum
no longer returns an error code when chromEnd[i] !=
chromStart[i+1]. Instead, gaps in the coverage profile are properly
treated as bases with 0 coverage.
2015.05.22
ConvertModelList returns modelSelection.
2015.05.15
peak1.infeasible data set.
Step3 looks for a peak in the same place as previous model, but does
not enforce the seg1 < seg2 > seg3 constraint.
ConvertModelList returns segments for model with 0 peaks.
PeakSegJointSeveral function for running the C solver using several
suboptimality parameters, and taking the model with the min Poisson
loss for each model size.
rename seg_start_end to bin_start_end.
2015.05.14
real data sets where buggy heuristic does not recover visually optimal
segmentations.
bugfix for heuristic optimization for cases where there is a solution
with non-zero index (writes new cumsum vec) before a better solution
with zero index (does not write a new cumsum vec, and was stuck with
the old cumsum vec).
2015.05.05
Squared hinge loss FISTA implementation.
2015.04.17
Bugfixes for C memory issues.
2015.04.14
Fast C implementation of PeakSegJointHeuristic.
2015.04.02
binSum, multi* copied from tdhock/PeakSegDP.