ocr done#6
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds server-side passport OCR to the check-in flow by accepting an uploaded passport image, running a Python MRZ reader, and saving extracted passport fields to the existing CHECKIN record.
Changes:
- Adds
POST /checkin/:id/passport/ocrwithmulterupload handling and Python subprocess OCR execution. - Adds
mrz_reader.pyfor MRZ extraction and field normalization. - Adds
multerand its lockfile dependencies.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
routes/checkin.js |
Adds OCR upload endpoint and persistence of extracted passport data. |
mrz_reader.py |
Adds PassportEye/Tesseract-based MRZ parsing script. |
package.json |
Adds multer dependency for multipart uploads. |
package-lock.json |
Locks multer and transitive dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+7
to
+8
| pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' | ||
| os.environ['TESSDATA_PREFIX'] = r'C:\Program Files\Tesseract-OCR\tessdata' |
Comment on lines
+22
to
+27
| def parse_dob(raw): | ||
| yy = ''.join(c for c in raw[:2] if c.isdigit()) | ||
| if not yy: | ||
| return None | ||
| yyyy = f"19{yy}" if int(yy) >= 30 else f"20{yy}" | ||
| return f"{yyyy}-01-01" |
Comment on lines
+15
to
+20
| def fix_last_name(raw): | ||
| # H between two vowel-adjacent consonants is likely M | ||
| # More targeted: SLI_ANI pattern — the _ is M | ||
| import re | ||
| raw = re.sub(r'(?<=[A-Z])H(?=[A-Z])', 'M', raw) | ||
| return raw |
| } | ||
| }); | ||
| }); | ||
| console.log("[OCR] PassportEye result:", parsed); |
| try { | ||
| resolve(JSON.parse(stdout)); | ||
| } catch (e) { | ||
| reject(new Error("Failed to parse PassportEye output: " + stdout)); |
| return res.status(404).json({ message: "Check-in not found" }); | ||
| } | ||
| const parsed = await new Promise((resolve, reject) => { | ||
| execFile("python", ["mrz_reader.py", filePath], (err, stdout, stderr) => { |
| }); | ||
| }); | ||
| console.log("[OCR] PassportEye result:", parsed); | ||
| if (parsed.error || !parsed.passportNumber) { |
| if (allowed.includes(file.mimetype)) { | ||
| cb(null, true); | ||
| } else { | ||
| cb(new Error("Only JPEG and PNG images are allowed")); |
Comment on lines
+4
to
+5
| import pytesseract | ||
| from passporteye import read_mrz |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.