Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast OCR in CLI arg #296

Open
louis030195 opened this issue Sep 9, 2024 · 1 comment
Open

fast OCR in CLI arg #296

louis030195 opened this issue Sep 9, 2024 · 1 comment

Comments

@louis030195
Copy link
Collaborator

Certainly! We'll need to modify both the Swift code, the Rust bindings, and the CLI arguments to allow changing the OCR accuracy. Here's how we can do that:

  1. First, let's modify the Swift code to accept an accuracy parameter:
@available(macOS 10.15, *)
@_cdecl("perform_ocr")
public func performOCR(imageData: UnsafePointer<UInt8>, length: Int, width: Int, height: Int, useFastRecognition: Bool)
  -> UnsafeMutablePointer<CChar>? {
  // ... existing code ...

  let textRequest = VNRecognizeTextRequest { request, error in
    // ... existing code ...
  }

  textRequest.recognitionLevel = useFastRecognition ? .fast : .accurate

  // ... rest of the existing code ...
}
  1. Now, let's update the Rust bindings in apple.rs:
#[cfg(target_os = "macos")]
#[link(name = "screenpipe")]
extern "C" {
    fn perform_ocr(
        image_data: *const c_uchar,
        length: usize,
        width: i32,
        height: i32,
        use_fast_recognition: bool,
    ) -> *mut c_char;
    fn free_string(ptr: *mut c_char);
}

#[cfg(target_os = "macos")]
pub fn perform_ocr_apple(image: &DynamicImage, use_fast_recognition: bool) -> String {
    let rgba = image.to_rgba8();
    let (width, height) = rgba.dimensions();
    let raw_data = rgba.as_raw();

    unsafe {
        let result_ptr = perform_ocr(
            raw_data.as_ptr(),
            raw_data.len(),
            width as i32,
            height as i32,
            use_fast_recognition,
        );
        let _guard = OcrResultGuard(result_ptr);
        let result = CStr::from_ptr(result_ptr).to_string_lossy().into_owned();
        result
    }
}
  1. Finally, let's update the CLI arguments in cli.rs:
#[derive(Clone, Debug, ValueEnum, PartialEq)]
pub enum CliOcrAccuracy {
    Fast,
    Accurate,
}

#[derive(Parser)]
pub struct Cli {
    // ... existing fields ...

    /// OCR accuracy level (only applicable for AppleNative OCR engine)
    #[arg(long, value_enum, default_value_t = CliOcrAccuracy::Accurate)]
    pub ocr_accuracy: CliOcrAccuracy,

    // ... rest of the existing fields ...
}

Now you can use the ocr_accuracy field in your main application logic to determine whether to use fast or accurate recognition when calling perform_ocr_apple. For example:

let use_fast_recognition = matches!(cli.ocr_accuracy, CliOcrAccuracy::Fast);
let ocr_result = perform_ocr_apple(&image, use_fast_recognition);

This change allows users to specify the OCR accuracy level using a command-line argument like --ocr-accuracy fast or --ocr-accuracy accurate. The default is set to "accurate" to maintain the existing behavior.

Remember to update your Swift compilation command to include the new parameter, and make sure to propagate this change through your application wherever the OCR function is called.

Copy link

linear bot commented Sep 9, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant