visionshield with codex

913 字

5 分钟

visionshield with codex

2026-04-25

项目

今天主要用 codex 回顾和整理了 VisionShield。

VisionShield 是我的 capstone project，目标是在 screen sharing 的场景下做 real-time sensitive data detection and masking。现在的版本是 macOS prototype，核心流程是：用户选择 screen region，然后 app 不断 capture frame，用 Apple Vision 做 OCR，再把 OCR lines 交给 privacy engine 判断哪些内容需要被 blur。

使用 codex 前，这个项目的问题很明显：我主要依赖 regex。regex 对 email、password、API key、salary、TFN、student id 这种 structured data 还可以，但对 contextual sensitive data 就比较弱，比如 business confidential note、academic record、location context 这种不一定有固定格式的内容。换句话说，它能抓“长得像敏感数据”的东西，但不太会理解“为什么这一行应该敏感”。

这次 codex 主要帮我做了几个方向的整理和优化。

第一是 detection architecture。原来的 deterministic rules 没有被删掉，因为 regex 对格式稳定的内容仍然是最可靠的 baseline。codex 帮我把这个部分重新整理成 REGEX source family，同时引入 GLiNER2 作为 local NLP path，用来处理 Business、Education、Location 等更依赖语义的 category。现在项目的定位更清楚：不是纯 regex scanner，而是 REGEX + NLP 的 hybrid privacy detector。

第二是 analysis pipeline。privacy_engine.py 现在不只是简单跑 pattern matching，还包括 OCR line normalization、adjacent line merging、custom rules、semantic candidate filtering、temporal persistence 等逻辑。这个结构对 real-time app 很重要，因为 OCR 结果本身会有碎片、重复、漏识别和延迟。如果 detection 只看单行原始 OCR，结果会很不稳定。

第三是 false positive handling。之前一个很大的问题是，系统会把“讨论敏感数据的文档”也当成真的敏感数据。例如一句话只是说系统应该检测 password 或 student id，并不代表屏幕上真的暴露了 password 或 student id。codex 帮我加了 explanatory text filtering，让系统尝试区分 documentation / requirement language 和 actual exposed data。这个对 capstone demo 很有用，因为 false positive 太多的话，整个系统会显得很粗糙。

第四是 live UI overlay。之前 debug badge 比较乱，很多 [REGEX|...] 叠在画面上，看起来不太像一个稳定的 prototype。现在 overlay 更偏展示用途：status panel 显示 sensitive line count、REGEX/NLP split、OCR 状态；每个 masked block 只保留小的 source marker，例如 R 表示 regex，N 表示 NLP。这样 demo 的时候更容易解释系统为什么挡住某一块内容。

第五是 evaluation。这个部分其实是这次最重要的收获。codex 帮我把项目从“能跑 demo”往“能量化说明效果”推了一步。现在 repo 里有 40 个 manually written text fixtures、gold labels、clean-text ablation、styled screenshot OCR test 和 video replay test。这样可以比较：

REGEX only
NLP only (GLiNER2)
REGEX + NLP

目前的结论也比较诚实：regex precision 很高，但 recall 不够；NLP 能补 contextual categories，但会引入更多 false positives；hybrid 版本是当前最合理的折中。这个比单纯说“我用了 AI 检测敏感信息”要扎实很多。

文档也被整理了一轮。README、testing report、repo structure、next plan 都更适合放进 capstone 的技术叙述里。尤其是 next plan 里把后续方向分成 local AI feasibility 和 Windows OCR backend exploration，这样项目边界会清楚一些，不会显得像突然要做一个过大的 commercial product。

整体来说，codex 这次没有把 VisionShield 变成最终版，但它帮我把项目的 technical story 整理出来了：当前版本是 Apple Vision OCR + asynchronous pipeline + REGEX/NLP hybrid detection + motion-aware masking + repeatable evaluation。之后真正要继续做的，就是提升 semantic detection 的稳定性，减少 false positives，并考虑 Windows 上的 OCR backend。

注：这篇是使用 codex 根据项目记录整理写的，不是我本人直接写的。