Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern De-Duplication based on Subsequence Detection #1031

Merged
merged 3 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion __tests__/util/state.js
Original file line number Diff line number Diff line change
@@ -1,9 +1,26 @@
/* globals describe, expect, it */

import '../test-utils/mock-window-url'
import { queryIsValid } from '../../lib/util/state'
import { isValidSubsequence, queryIsValid } from '../../lib/util/state'

describe('util > state', () => {
describe('isValidSubsequence', () => {
it('should handle edge cases correctly', () => {
expect(isValidSubsequence([0], [0])).toBe(true)
expect(isValidSubsequence([0], [1])).toBe(false)
expect(isValidSubsequence([], [])).toBe(true)
expect(isValidSubsequence([], [9])).toBe(false)
expect(isValidSubsequence([9], [])).toBe(true)
expect(isValidSubsequence([9], [9, 9])).toBe(false)
expect(isValidSubsequence([9, 9, 9], [9, 9])).toBe(true)
})
it('should handle normal cases correctly', () => {
expect(isValidSubsequence([1, 2, 3, 4, 5], [5, 6, 3])).toBe(false)
expect(isValidSubsequence([1, 2, 3, 4, 5], [2, 3, 4])).toBe(true)
expect(isValidSubsequence([1, 2, 4, 4, 3], [2, 3, 4])).toBe(false)
expect(isValidSubsequence([1, 2, 3, 4, 5], [1, 3, 4])).toBe(false)
})
})
describe('queryIsValid', () => {
const fakeFromLocation = {
lat: 12,
Expand Down
29 changes: 28 additions & 1 deletion lib/actions/apiV2.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import { generateModeSettingValues } from '../util/api'
import {
getActiveItineraries,
getActiveItinerary,
isValidSubsequence,
getRouteOperator,
queryIsValid
} from '../util/state'
Expand Down Expand Up @@ -655,7 +656,33 @@ export const findRoute = (params) =>

const newRoute = clone(route)
const routePatterns = {}
newRoute.patterns.forEach((pattern) => {

// Sort patterns by length to make algorithm below more efficient
const patternsSortedByLength = newRoute.patterns.sort(
(a, b) => a.stops.length - b.stops.length
)

// Remove all patterns that are subsets of larger patterns
const filteredPatterns = patternsSortedByLength
// Start with the largest for performance
.reverse()
.filter((pattern) => {
// Compare to all other patterns TODO: make this beat O(n^2)
return !patternsSortedByLength.find((p) => {
// Don't compare against ourself
if (p.id === pattern.id) return false

// If our pattern is longer, it's not a subset
if (p.stops.length < pattern.stops.length) return false

return isValidSubsequence(
p.stops.map((s) => s.id),
pattern.stops.map((s) => s.id)
)
})
})
Comment on lines +669 to +683
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

Suggested change
.filter((pattern) => {
// Compare to all other patterns TODO: make this beat O(n^2)
return !patternsSortedByLength.find((p) => {
// Don't compare against ourself
if (p.id === pattern.id) return false
// If our pattern is longer, it's not a subset
if (p.stops.length < pattern.stops.length) return false
return isValidSubsequence(
p.stops.map((s) => s.id),
pattern.stops.map((s) => s.id)
)
})
})
.filter((pattern, patternIndex) => {
// Compare to all other patterns larger than the current pattern
return !patternsSortedByLength.find((p, pIndex) => {
if (pIndex >= patternIndex) return false
return isValidSubsequence(
p.stops.map((s) => s.id),
pattern.stops.map((s) => s.id)
)
})
})

You're already sorting by length so checking anything above a certain index number feels like a waste of time, and you could remove the if (p.id === pattern.id) block because p and pattern should have the same index. This gets the same result for 1-line patterns but might need more testing!!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check the pattern id because the indexes don't always line up. I agree the length check is not required but to do that we need to start checking the second array at the right index and that logic is currently not present


filteredPatterns.forEach((pattern) => {
const patternStops = pattern.stops.map((stop) => {
const color =
stop.routes?.length > 0 &&
Expand Down
21 changes: 21 additions & 0 deletions lib/util/state.js
Original file line number Diff line number Diff line change
Expand Up @@ -886,3 +886,24 @@ export function getOperatorAndRoute(routeObject, transitOperators, intl) {
)
)
}

/**
* Helper method returns true if an array is a subsequence of another.
*
* More efficient than comparing strings as we don't need to look at the entire
* array.
*/
export function isValidSubsequence(array, sequence) {
// Find starting point
let i = 0
let j = 0
while (array[i] !== sequence[j] && i < array.length) {
i = i + 1
}
// We've found the starting point, now we test to see if the rest of the sequence is matched
while (array[i] === sequence[j] && i < array.length && j < sequence.length) {
i = i + 1
j = j + 1
}
return j === sequence.length
}
Loading