Skip to content

Commit

Permalink
Fix seed generation
Browse files Browse the repository at this point in the history
Extra seeds were being generated in two ways:
- If the seed generator threw out all of the seeds, the next
  non-dependency would start the process over instead of doing nothing.
  This is because using a non-dependency to initiate the seed set should
  only happen on the first non-dependency, but the paper's version of
  the condition was "when seed list is zero". Adding handling for
  determinant size limits means the seed list can be empty while
  iterating through the non-dependencies, making this assumed condition
  not work as intended.
- A seed is chosen from the seed list using sample(). This doesn't work
  as intended if the list is length 1: instead it effectively passes the
  single seed into seq_len, and samples from that, resulting in choosing
  invalid seeds. This is fixed by using sample.int for an intermediate
  result, similary to the example given in sample's man page.
  • Loading branch information
CharnelMouse committed Nov 27, 2024
1 parent f423691 commit 563e1f8
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 4 deletions.
14 changes: 10 additions & 4 deletions R/discover.r
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ find_LHSs_dfd <- function(
)

while (length(seeds) != 0) {
node <- sample(seeds, 1)
node <- seeds[sample.int(length(seeds), 1)]
while (!is.na(node)) {
if (nodes$visited[node]) {
if (nodes$category[node] == 3) { # dependency
Expand Down Expand Up @@ -660,11 +660,17 @@ generate_next_seeds <- function(max_non_deps, min_deps, lhs_attr_nodes, nodes, d
seeds <- attrs_not_in_min_deps[candidate_categories >= 0]
}else{
seeds <- integer()
for (nfd in max_non_deps) {
for (n in seq_along(max_non_deps)) {
nfd <- max_non_deps[[n]]
max_non_dep_c <- remove_pruned_subsets(lhs_attr_nodes, nfd, nodes$bits)
if (length(seeds) == 0)
# paper condition is "seeds is empty", trimming cross-intersections
# by detset_limit means seeds can be empty before the end,
# which would cause the remaining nfds to start from scratch.
# we therefore just initiate the seeds on the first nfd, as
# intended anyway.
if (n == 1) {
seeds <- max_non_dep_c
else {
}else{
seeds <- cross_intersection(seeds, max_non_dep_c, nodes, detset_limit)
}
}
Expand Down
21 changes: 21 additions & 0 deletions tests/testthat/test-discover.r
Original file line number Diff line number Diff line change
Expand Up @@ -693,3 +693,24 @@ describe("discover", {
# )
# })
})

describe("generate_next_seeds", {
it("generates correctly under a detset limit", {
# example: three attributes, min. dep is whole set {1, 2, 3},
# max. non-deps are the two-element subsets
# generated seed set is empty, regardless of limit
nodes <- nonempty_powerset(3, use_visited = TRUE)
nodes$category <- as.integer(c(0, -1, -2, 0, -2, -2, 2))
nodes$visited <- c(FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE)
expect_identical(
generate_next_seeds(
max_non_deps = c(3L, 5L, 6L),
min_deps = 7L,
lhs_attr_nodes = c(1L, 2L, 4L),
nodes = nodes,
detset_limit = 1
),
integer()
)
})
})

0 comments on commit 563e1f8

Please sign in to comment.