algorithm logic needs tweaking #2

itang1 · 2020-06-26T11:24:35Z

Logic regarding the total_listened_time algorithm

Need to tweak a bit.

The formula is:

total_time = subregion_time + extra_time + makeup_time + surplus_time − silence_time − skip_time

The logic is currently:

subregion_time = total length of subregions that:
1. have annotations in them -- PROBLEMS 1 and 2
2. don't have makeup/surplus etc. inside them
extra time = time between 'extra time' flags
makeup_time = time between 'make up' flags
silence_time = time between 'silence' flags only within any subregion
1. if a silent region is not embedded in a subregion, it is not included in silence_time -- PROBLEM 3
skip_time = time between 'skip' flags
surplus_time is N/A not considered at this time

Problem 1: Some chunks get discounted twice

If a subregion has zero words coded (for whatever reason), an hour does NOT get added into subregion_time -- even if it was the top subregion and someone did listen through its entirety as intended.

And if that same subregion contains a skip or silence, those chunks get subtracted out again -- even though the algo already discounted that subregion

Example: 21_11_sparse_code.cha

Original coders coded SR1, SR2, SR4, SR5
Turns out nearly all of SR4 was basically one giant skip
Original coders coded nearly all of SR3 as a make-up for that
So, total listened time is ~4hrs as intended
However, it is erroneously computed to be ~3hrs

Computed variables

filename	month	extra_time	makeup_time	surplus_time	silence_time	subregion_time	skip_time	end_time	total_listen_time	silence_raw	subregion_raw	num_raw_subregion
21_11_sparse_code.cha	11	0	0.98	0	0	3	0.93	10.43	3.05	1.5	5	5

Number of coded words in each subregion

SR1	SR2	SR3	SR4	SR5	total
20	38	44	0	24	126

Problem 2: Some chunks get added in twice

If a subregion has one or more words coded, an hour gets added into subregion_time -- even if the coded words were also part of a make-up chunk.

And if that same subregion contains a make-up chunk, that chunk gets added in again -- even though the algo already added it as part of subregion_time

Example: 32_13_sparse_code.cha

background
SR5 is basically entirely a make-up region:

SR5 begins
begin make up
blah blah
end make up
SR5 ends

the problem

so because SR5 (and all of the other SRs) contains coded words, it's "subregion time" is 5 ---> therefore listened_time += 5
then, because the script sees 1hr worth of makeup time ---> therefore listened_time += 1
essentially the algorithm accidentally added in the SR5/make-up chunk twice

Computed variables

filename	month	extra_time	makeup_time	surplus_time	silence_time	subregion_time	skip_time	end_time	total_listen_time	silence_raw	subregion_raw	num_raw_subregion
32_13_sparse_code.cha	13	0.42	1	0	0.25	5	0.82	16	5.34	3.61	5	5

Number of coded words in each subregion

SR1	SR2	SR3	SR4	SR5	total
35	43	23	54	78	307

(Maybe) Problem 3: Silences in irrelevant subregions get added in (?)

For example, consider that SR5 is the lowest-ranked, and it is not needed for any sort of make-up or extra time.
Consider also that SR5 contains a silence.
Does that silence get added in to silence_time, despite SR5 being irrelevant? If so, can the script be changed such that silent chunks are considered iff (if and only if) they are within a RELEVANT subregion, and not just ANY subregion?
(This problem may or may not exist. That is, the algorithm may or may not already be doing that^; but I did not finish investigating)

The text was updated successfully, but these errors were encountered:

itang1 assigned sarpu Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algorithm logic needs tweaking #2

algorithm logic needs tweaking #2

itang1 commented Jun 26, 2020

algorithm logic needs tweaking #2

algorithm logic needs tweaking #2

Comments

itang1 commented Jun 26, 2020

Logic regarding the total_listened_time algorithm

The formula is:

The logic is currently:

Problem 1: Some chunks get discounted twice

Example: 21_11_sparse_code.cha

Computed variables

Number of coded words in each subregion

Problem 2: Some chunks get added in twice

Example: 32_13_sparse_code.cha

Computed variables

Number of coded words in each subregion

(Maybe) Problem 3: Silences in irrelevant subregions get added in (?)