2023 Solution challenge Demo Video of ImReader
ImReader is the solution to solve the 10th goal of the UN's SDGs, Reduced Inequalities. It's a service for people who use the Voice Assistant program (iPhone is Voice over, Android is Voice Assistant), such as a visual impairment.
The Voice Asthma program is especially helpful for blind people to use smartphones. However, some images cannot be read and simply described as "detailed images." Because of this, they have more difficulty in acquiring information than others.
To solve this problem, we developed a service called ImReader! If voice assistant program that we developed ourselves recognizes an image, it does not inform it as a "detailed image" instead extracts the text from the image by inserting the image into the Deep Learning model(OCR). This text is then told to the user using the TTS API. Through this process, there is an advantage that the user can access not only the plain text but also the text in the image.
We really tried to complete the service, but unfortunately we didn't finish it on time. So we'll show you a prototype of UI and a communication results between the server and the model.
-
Prototype (Situation: When ordering)
- Turn off the Voice Assistant built into the System to use ImReader.
- Users look around the app to order the food they want
- If there is a character in the image, the character recognized by the model is heard to the user using the TTS.
- Users proceed with the rest including payment
-
Communication Result (Server and the model)
You can test it by following the steps!
- Please go to Postman.
- Please set the link to http://35.234.33.62/img-src .
- Please send the base64 code in the following format to the body. { "base64": "base64-string" }
- Press Send to get the corresponding results.
- The client implemented as Kotlin sends the image information to base64.
- The server then uses the virtual machine in GCP to run the OCR model with that information.
- When the returned result value is sent back to the client.
- The client hears the text to the user through the TTS API.
[ Android ] | [ Back-End ] | [ Deep Learning ] | [ Deep Learning ] |
Park Jaeyoung |
Park Injae |
Lee Seulbi |
Jeon Junseok |