Real-Time Risk Analysis Pipeline for Mortgage Lending
Automated, serverless AWS pipeline that OCRs Grundbuchauszüge, searches high-value sections for risk terms, and alerts the credit team within minutes.
MoreThe Challenge: Slow, Manual Risk Detection in a Critical Process
In German mortgage lending, the Grundbuchauszug (land registry extract) is a critical legal document. It contains the official history of a property, including ownership, rights, and most importantly any existing liens or encumbrances that could pose a risk to the bank.
The traditional workflow involved credit department specialists manually downloading these multi-page, often scanned, PDF documents for each loan application. They would then spend valuable time reading through them to find a specific list of "red flag" keywords (e.g., notices of insolvency, compulsory auctions). This process was:
- Slow: Taking hours or even days, delaying the entire loan approval process.
- Error-Prone: A critical keyword missed by human review could lead to significant financial risk.
- Inefficient: Specialists were spending time on a repetitive, low-value task instead of complex case analysis.
The goal was to create a system that could automate this entire process, delivering critical risk alerts to the credit department the moment a document enters the bank's system.
The Solution: An End-to-End Automated Analysis Pipeline
I designed and built a fully automated, serverless solution that handles the entire workflow from document ingestion to final risk alert, requiring zero manual intervention. The process unfolds in several intelligent steps:
- Automated Document Ingestion: The system continuously monitors third-party lending platforms for new loan applications. When a new Grundbuchauszug is submitted, it is automatically downloaded.
- Cloud-Based Text Extraction: Each downloaded PDF is sent to a cloud-based OCR (Optical Character Recognition) service. The service analyzes the document, even if it's a low-quality scan, and returns a structured JSON file containing all the extracted text, lines, and their page locations.
- Intelligent Keyword Search: The core of the system is an intelligent search algorithm. Instead of naively searching the entire document, the code first identifies the most critical sections, Section II and III of the document, where encumbrances and liens are listed. The keyword search is then focused exclusively on these high-value pages, dramatically increasing both speed and accuracy.
- Metadata Extraction: To ensure every finding is traceable, the system also identifies and extracts the unique land registry sheet number from the document, allowing multiple extracts within a single application to be distinguished.
The Technology Stack & Implementation in AWS
- Compute: AWS Lambda was used for all business logic. The process was split into two distinct functions: a Fetcher Lambda, triggered by a scheduled Amazon EventBridge rule every hour to check for new documents; and a Processor Lambda, automatically triggered by an Amazon S3 event the moment the Fetcher uploads a new document.
- Storage: Amazon S3 acts as the central hub. It stores the incoming PDF documents, the
keywords.csv
configuration file, and the final JSON analysis results. - Security: All sensitive credentials (API keys, passwords) are securely stored and managed in AWS Secrets Manager, completely separating them from the application code.
- Dependencies: All Python libraries, including custom internal modules and complex packages (pandas, cryptography), are managed as AWS Lambda Layers, built in a Linux-based cloud environment (AWS CloudShell) for runtime compatibility.
The Impact: From Days to Minutes
- Real-Time Alerts: The credit department can receive an automated information with any "red flag" findings, including the specific keyword found and the page number, within an hour of the document arriving at the bank.
- Increased Efficiency: The system completely eliminated the manual review time for this task, freeing up specialists to focus on higher-value credit analysis.
- Reduced Risk: By automating the search, the risk of human error in missing a critical term was reduced to zero, strengthening the bank's risk management posture.