-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Twofishes indexer does not handle MongoDB query failures properly. #53
Comments
Subsequent execution of parse.py with the same input data, and without deleting the output directory completed successfully, but the index seems to be corrupted. Twofishes starts, responds to queries without crashing, but throws exceptions on each query:
|
Preface this 'solution' by saying that I'm not a Scala programmer ... so there may be a better way to handle this. However, I was able to temporarily get past this build error, and avoid the corrupted index by editing fsqio/src/jvm/io/fsq/twofishes/indexer/output/PrefixIndexer.scala to add a try/catch block around the section of code where the error occurs (example below). In the current version of this file, that means enclosing lines 115 through 140. for {
(prefix, index) <- sortedPrefixes.zipWithIndex
} {
if (index % 1000 == 0) {
log.info("done with %d of %d prefixes".format(index, numPrefixes))
}
try {
val records = getRecordsByPrefix(prefix, PrefixIndexer.MaxNamesToConsider)
val (woeMatches, woeMismatches) = records.partition(r =>
bestWoeTypes.contains(r.woeTypeOrThrow))
val (prefSortedRecords, unprefSortedRecords) =
sortRecordsByNames(woeMatches.toList)
val fids = new HashSet[StoredFeatureId]
//roundRobinByCountryCode(prefSortedRecords).foreach(f => {
prefSortedRecords.foreach(f => {
if (fids.size < PrefixIndexer.MaxFidsToStorePerPrefix) {
fids.add(f.fidAsFeatureId)
}
})
if (fids.size < PrefixIndexer.MaxFidsWithPreferredNamesBeforeConsideringNonPreferred) {
//roundRobinByCountryCode(unprefSortedRecords).foreach(f => {
unprefSortedRecords.foreach(f => {
if (fids.size < PrefixIndexer.MaxFidsToStorePerPrefix) {
fids.add(f.fidAsFeatureId)
}
})
}
prefixWriter.append(prefix, fidsToCanonicalFids(fids.toList))
} catch {
case e: Exception => println("Skipping due to error processing prefixes")
}
} If you already built the mongo db, you'll obviously want to throw that out and start from |
To address Issue foursquare#53 (foursquare#53), I have wrapped prefix processing in a try/catch. I am not a scala programmer, so this may not be the best implementation. However, if we do not contain regex errors related to prefix processing, the processing fails, and corrupts the database/indices. Therefore, I feel like its worthwhile to trap the error here, and just skip over processing errors.
During importing the data into the DB with "./src/jvm/io/fsq/twofishes/scripts/parse.py -w
pwd
/data/" the indexer crashed with the following error message. After the crash the indexer was stuck doing nothing, and the no more records were processed. Processed data was downloaded with "./src/jvm/io/fsq/twofishes/scripts/download-world.sh"The text was updated successfully, but these errors were encountered: