-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio Transcription Error - AbstractTranscriptTask - SQLITE_CANTOPEN_ISDIR #2267
Comments
Hi @gfd2020, I think the synchronization in OCRParser was used just to avoid a race condition that could cause the creation of more than 1 connection per OCR results DB, per process. If enableExternalParsing == true, several connections to the same OCR DB are created, one per parsing process. I suggest you searching into the processing log for the same SQLite error, but coming from the OCRParser, it would be logged as WARN, not as ERROR, going just to the log, not to the console. The approach used in AbstractAudioTranscriptTask was creating one connection per worker thread. I think it is fine and shouldn't cause the error you hit. I think the error you got was caused the unreliable SMB protocol, or maybe by some temporary network problem. If you find the same error in the log coming from the OCRParser, that would be a good confirmation the synchronization wouldn't help. But if you can consistently reproduce the error, and if changing AbstractAudioTranscriptTask to use just 1 single static connection fixes the error, I agree to change it to 1 single static connection. |
Hi @lfcnassif , thanks for the help.
I turned on enableExternalParsing now and it seems like warn (SQLITE_CANTOPEN_ISDIR) is happening more often, I'll check further.
Yes, OCRParser just logged the same info (SQLITE_CANTOPEN_ISDIR) but just a WARN, no as ERROR.
I process this case several times and the application crash in all runs. OCRParser does not raiser ERROR, just WARN (SQLITE_CANTOPEN_ISDIR). Just thinking here. Maybe if the sqlite database was in the iped temporary processing folder, this error would be mitigated, right?
I'll have to change the source code to make the task just have a static connection, right? |
Hi @gfd2020,
With enableExternalParsing = false, right?
I think so. Not the same SQLite error, but others were also reported in the past related to sleuth.db being populated in a network share. I thought about using the temp folder for this in the past. Beyond TSK, OCR and the transcription modules, thumbnails and subitems extracted from containers are also stored in SQLite DBs in output folder. Teoretically all of them could be affected and ideally should use the same approach (created in output or temporary folder). And when using --append, the DBs would need to be moved from output to temp folder (moving the subitems DBs would take a reasonable ammount of time) to append the next evidence, then moved back to output again...
Yes. |
Hi @lfcnassif ,
Yes.
Wow. That would take a lot of work. I'll continue doing some tests... |
From the tests I did, sometimes the IPED crashes and other times it doesn't... I tried to perform other configurations in the sqlite database connection parameters. Source: https://www.sqlite.org/useovernet.html If this configuration mitigates the problem of using a network connection, I would suggest perhaps adding a configuration option to activate this mode. Test case with approximately 4,686 candidate audio files for transcription. transcriptions IPED version 4.1.5 ( IPED did not crash): transcriptions IPED Master with WAL mode modification: Second teste case with approximately 23079 candidate audio files for transcription. transcriptions IPED version 4.1.5 ( IPED did not crash): transcriptions IPED Master with WAL mode modification: |
Hi @gfd2020. Your tests results are very interesting. I've considered in the past to use WAL in sleuthkit sqlite DB, but abandoned the idea (without testing) because of a statement in the official sqlite site saying Have you tried to use just 1 static connection? |
Hi @lfcnassif . I also did this test in the OCR task and the result in WAL mode was worse. I believe it because it has a slightly different implementation. I think that for other tasks that already work and are stable, it's best not to change them. Just this audio one, if you could add this configurable option, it would be interesting. |
I got the following error below when processing a ufdr when transcribing audio. Versions 4.1.5 and master have the same error.
Computer: 32 Core (threads). UFDR and output directory are on a network drive.
This image was also being processed with OCR and did not show any connection errors with ocr database.
Looking at the OCRParser code, I noticed that it has a synchronized connection control.
Wouldn't this error be because the implementation of the audio transcription class is not having the same concurrency treatment?
This error does not seem to be exactly what is reported (SQLITE_CANTOPEN_ISDIR) because the audio transcription sqlite file is properly connected to the connection. I believe that the exception was raised due to several connections writing to the database.
2024-07-25 10:23:21 [ERROR] [task.transcript.AbstractTranscriptTask] Unexpected exception while transcribing: 0000786-AUDIO.opus
java.io.IOException: org.sqlite.SQLiteException: [SQLITE_CANTOPEN_ISDIR] The file is really a directory (unable to open database file)
at iped.engine.task.transcript.AbstractTranscriptTask.storeTextInDb(AbstractTranscriptTask.java:169) ~[iped-engine-4.1.5.jar:?]
at iped.engine.task.transcript.AbstractTranscriptTask.process(AbstractTranscriptTask.java:404) [iped-engine-4.1.5.jar:?]
at iped.engine.task.transcript.AudioTranscriptTask.process(AudioTranscriptTask.java:41) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processMonitorTimeout(AbstractTask.java:277) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:192) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.sendToNextTask(AbstractTask.java:225) [iped-engine-4.1.5.jar:?]
at iped.engine.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:205) [iped-engine-4.1.5.jar:?]
at iped.engine.core.Worker.process(Worker.java:177) [iped-engine-4.1.5.jar:?]
at iped.engine.core.Worker.run(Worker.java:265) [iped-engine-4.1.5.jar:?]
Caused by: org.sqlite.SQLiteException: [SQLITE_CANTOPEN_ISDIR] The file is really a directory (unable to open database file)
at org.sqlite.core.DB.newSQLException(DB.java:1012) ~[sqlite-jdbc-3.34.0.jar:?]
at org.sqlite.core.DB.newSQLException(DB.java:1024) ~[sqlite-jdbc-3.34.0.jar:?]
at org.sqlite.core.DB.execute(DB.java:866) ~[sqlite-jdbc-3.34.0.jar:?]
at org.sqlite.core.DB.executeUpdate(DB.java:904) ~[sqlite-jdbc-3.34.0.jar:?]
at org.sqlite.jdbc3.JDBC3PreparedStatement.executeUpdate(JDBC3PreparedStatement.java:98) ~[sqlite-jdbc-3.34.0.jar:?]
at iped.engine.task.transcript.AbstractTranscriptTask.storeTextInDb(AbstractTranscriptTask.java:167) ~[iped-engine-4.1.5.jar:?]
... 26 more
The text was updated successfully, but these errors were encountered: