Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/Add latest national ID card version in OCR process #2353

Merged
merged 18 commits into from
Dec 15, 2023
Merged

Conversation

Merkur39
Copy link
Member

Cette PR ajoute lors de l'OCR la possibilité de gérer plusieurs versions d'un même document.

Ce changement à mené à repenser, avec @paultranvan, à certaines implémentations, autant dans la configuration du fichier papersDefinitions.json, que dans quelques fonctions existantes.

Côté UI/UX, nous ajoutons un nouvel écran à la suite de l'écran du traitement OCR, pour demander à l'utilisateur de confirmer la version reconnue.
Et c'est à la suite de cette confirmation que nous extrayons les bonnes données, et affichons la suite du process.

The rotated file is saved in the "formData" state only
once we move on to the next step, and not at each rotation.
The state of the "formData" is therefore not up to date if
the last file added has been rotated.
@Merkur39 Merkur39 force-pushed the feat/697 branch 3 times, most recently from 777f598 to c7f49e7 Compare December 13, 2023 15:59
In view of future developments, it is preferable
to separate the responsibilities of each helper.

It was also discussed with @paultranvan to change the
type of the `ocrAttributes` attribute in the config file,
for more flexibility.
This modal comes after the animated OCR processing modal,
to ask the user to confirm the recognized version.
paultranvan and others added 5 commits December 14, 2023 12:05
This introduces a new 'stripChars' method in validationRules,
useful to remove extra characters from the regex.
Those characters are useful for detection, but should be
removed afterwards.
This is the french identity card format, created in march 21.
By @paultranvan
This introduces the possibility to detect a paper version,
based on the 2 sides OCR + reference rules.
Those rules can be applied on front and/or back side.
All the rules must be respected in order to detect a version.
For now, the rules are simple regex, evaluated against the
OCR.
By @paultranvan
@Merkur39 Merkur39 force-pushed the feat/697 branch 2 times, most recently from 56e2350 to a977bba Compare December 14, 2023 12:35
Copy link
Contributor

@JF-Cozy JF-Cozy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bien le découpage des commits 👍

packages/cozy-mespapiers-lib/src/helpers/findAttributes.js Outdated Show resolved Hide resolved
packages/cozy-mespapiers-lib/src/helpers/findAttributes.js Outdated Show resolved Hide resolved
@@ -6,6 +6,17 @@ import log from 'cozy-logger'
const MAX_TEXT_SHIFT_THRESHOLD = 5 // in %
const MAX_LINE_SHIFT_THRESHOLD = 5 // in px

const normalizeText = text => {
// TODO: more normalization might be necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y'a t-il un todo à faire, ou faut-il modifier le commentaire pour être plus explicite ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Le TODO est pour là pour plus tard justement, il pourrait il y avoir des besoins supplémentaires de normalisation, mais je préfère les ajouter au fil de l'eau plutôt qu'en faire trop et tomber sur des effets de bords non maîtrisés

@Merkur39 Merkur39 marked this pull request as ready for review December 14, 2023 14:36
@Merkur39 Merkur39 requested a review from paultranvan December 14, 2023 14:36
By being too strict on the validation rules, we were sometimes missing
the recognition, typically because of a '0' interpreted as a 'O'.
We are now more flexible on the date post-validation format, by using
date-fns parse method.
We also try/catch the method to avoid crashes when the date format is
wrong.
Those additional regex rules are quite useful for number and expiration
date attributes, as the passport is hard to correctly scan, leading to
detection failures for boxes detection.
@Merkur39 Merkur39 merged commit 0c68d11 into master Dec 15, 2023
3 checks passed
@Merkur39 Merkur39 deleted the feat/697 branch December 15, 2023 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants