Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

algorithm logic needs tweaking #2

Open
itang1 opened this issue Jun 26, 2020 · 0 comments
Open

algorithm logic needs tweaking #2

itang1 opened this issue Jun 26, 2020 · 0 comments
Assignees

Comments

@itang1
Copy link

itang1 commented Jun 26, 2020

Logic regarding the total_listened_time algorithm

Need to tweak a bit.

The formula is:

total_time = subregion_time + extra_time + makeup_time + surplus_time − silence_time − skip_time

The logic is currently:

  • subregion_time = total length of subregions that:
    1. have annotations in them -- PROBLEMS 1 and 2
    2. don't have makeup/surplus etc. inside them
  • extra time = time between 'extra time' flags
  • makeup_time = time between 'make up' flags
  • silence_time = time between 'silence' flags only within any subregion
    1. if a silent region is not embedded in a subregion, it is not included in silence_time -- PROBLEM 3
  • skip_time = time between 'skip' flags
  • surplus_time is N/A not considered at this time

Problem 1: Some chunks get discounted twice

If a subregion has zero words coded (for whatever reason), an hour does NOT get added into subregion_time -- even if it was the top subregion and someone did listen through its entirety as intended.

And if that same subregion contains a skip or silence, those chunks get subtracted out again -- even though the algo already discounted that subregion

Example: 21_11_sparse_code.cha

  • Original coders coded SR1, SR2, SR4, SR5
  • Turns out nearly all of SR4 was basically one giant skip
  • Original coders coded nearly all of SR3 as a make-up for that
  • So, total listened time is ~4hrs as intended
  • However, it is erroneously computed to be ~3hrs

Computed variables

filename month extra_time makeup_time surplus_time silence_time subregion_time skip_time end_time total_listen_time silence_raw subregion_raw num_raw_subregion
21_11_sparse_code.cha 11 0 0.98 0 0 3 0.93 10.43 3.05 1.5 5 5

Number of coded words in each subregion

SR1 SR2 SR3 SR4 SR5 total
20 38 44 0 24 126

Problem 2: Some chunks get added in twice

If a subregion has one or more words coded, an hour gets added into subregion_time -- even if the coded words were also part of a make-up chunk.

And if that same subregion contains a make-up chunk, that chunk gets added in again -- even though the algo already added it as part of subregion_time

Example: 32_13_sparse_code.cha

background
SR5 is basically entirely a make-up region:

SR5 begins
begin make up
blah blah
end make up
SR5 ends

the problem

  • so because SR5 (and all of the other SRs) contains coded words, it's "subregion time" is 5 ---> therefore listened_time += 5
  • then, because the script sees 1hr worth of makeup time ---> therefore listened_time += 1
  • essentially the algorithm accidentally added in the SR5/make-up chunk twice

Computed variables

filename month extra_time makeup_time surplus_time silence_time subregion_time skip_time end_time total_listen_time silence_raw subregion_raw num_raw_subregion
32_13_sparse_code.cha 13 0.42 1 0 0.25 5 0.82 16 5.34 3.61 5 5

Number of coded words in each subregion

SR1 SR2 SR3 SR4 SR5 total
35 43 23 54 78 307

(Maybe) Problem 3: Silences in irrelevant subregions get added in (?)

  • For example, consider that SR5 is the lowest-ranked, and it is not needed for any sort of make-up or extra time.
  • Consider also that SR5 contains a silence.
  • Does that silence get added in to silence_time, despite SR5 being irrelevant? If so, can the script be changed such that silent chunks are considered iff (if and only if) they are within a RELEVANT subregion, and not just ANY subregion?
  • (This problem may or may not exist. That is, the algorithm may or may not already be doing that^; but I did not finish investigating)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants