ENGINEERING
DATA SOVEREIGNTY.
?��?工�? Notion Internal API,實?��??��??�遷移�?br> ?�迴?��??�斷點�??��??�件?��?以�??�自?��???Confluence ?��???
// README.md: THE_WHY
"In the era of AI & Vibe Coding, building tools is faster than ever. But Token costs and Time are real."
?�們�??�找?��?題�?,然後�?估是?��?算�??��?幹�?它�?
?��??�移 500+ Notion ?�面 = 120+ 小�? ?�腦?��???每次?�本)
?�發?�蟲?�自?��?管�? = 一次�?/span> ?�發?�本 + 1 小�? ?��?)
?�就?�為什麼�??�快?�用 AI 寫工?��?式�??��?將人類�??��??��?中解?��??�解決更?�難?��?題�?/span>
PERFORMANCE_METRICS
> Data Integrity: 100%
> Memory Usage: <150MB (Streaming)
Exponential Efficiency
?�於 500+ ?��??��??�實?�移?��?測試?�人工搬?��??�耗�? (?�估 9 ?��?/??,�?容�??��?層�?結�??? Notion Crawler ?��?並�??��??�自?��?轉�?,�??�週�?工�?縮短??1 小�??��??��?
SYSTEM_ARCHITECTURE
Recursive Crawler Engine
- Reverse Internal API:直?�串?? `loadPageChunk` ?��?結�???Block 資�?,速度�?Playwright �?10 ?��?/li>
- ?�迴?�歷 (Recursive):自?�解?��??�面 (Sub-pages) ??Database Rows,精確�??�無?�層級�?構�?/li>
- ?�偵測�???/b>:實�?Exponential Backoff ??Jitter ?��?延遲,�??��???429 Rate Limit??/li>
Markdown Transpiler
- AST �??:�??��? JSON �???�抽象�?法樹,確�? Table, Callout, Code Block 等�??��?件精確渲?��?/li>
- Knowledge Stitcher:自?��??�並縫�??�散??API ?�數?�面 (Input/Output/Schema),�?組為?��? Truth??/li>
Resilience & Failover
- Dual-Domain Failover:優?��?�? `notion.so`,�??�自?��??�至 `notion.site` ?��??��?,確�?99.9% ?�用?��?/li>
- Connection Pooling:使?? `requests.Session` 維�? TCP ???池�?減�? TLS ?��??�銷,�??�大?�爬?��??��?/li>
- Granular Checkpoints:SQLite/JSON 記�? Page ID ?�?��?實現 100% ?��?續傳??/li>
Confluence Integrator
- BFS Traversal:採?�廣度優?��??��??��?確�??��??��?對優?�於子�??�建立�??��? Orphan Pages??/li>
- Smart Transform:自?��? Mermaid ?�塊�??�為 Confluence Macro,並修復?�面?��??��???? (Internal Links)??/li>
- Auto-Root Management:自?�在 Space ?�目?�建�?`Notion_KB`,支??`--clean` ?�迴?�除以進�?乾淨?�部署�?/li>
Legacy Mode (Fallback)
- Playwright Renderer:�??�瀏覽?�模?��??��? DOM �?? Breadcrumb 決�?路�?,解�?API ?��??��??�特�?Edge Case??
- Interactive Crawling:支?? Auto-Scroll 觸發 Lazy Loading?�自?��???Toggle?��???Database??/li>
- Stealth Mode:使??Headed 模�??�隨機延??(3-10s) 繞�? Cloudflare 驗�???/li>
Test Suite (Quality Gate)
- 130 Unit Tests:使??pytest 覆�?三大?��?模�??��??��??�渲?��?輯�??�併策略,確保�?次�??��??��??��?行為??/li>
- Zero-IO Pure Testing:RichText 轉�??�Block 渲�??��?題�?歧�??��??�輯?�為純函式測試�??��? Mock 外部?��???/li>
- Filesystem
Isolation:�?併�?樹建構測試使??pytest
tmp_pathfixture,�??��??��?污�??�實檔�?系統??
DEV_EXPERIENCE (DX)
?�發?��??��?機�?快速迭�?��??/span>> Mocking API responses...
> Ready. (0ms latency)
Offline Replay
?�發�???�輯?�直?��??�本??Snapshot�?b>完全?��??�網,�?迭代?�度?��? 100 ?��?/p>
> [SKIP] POST /wiki/rest/api/content
> No changes applied.
Dry Run Mode
模擬轉譯?��??��?程�??�輸?�日誌而�??��?寫入,確保�??��? Confluence 資�??��?突�?/p>
> Failed: 3 pages (Rate Limited)
> Resuming from last success...
Smart Resume
程�??��??�自?��???Checkpoint,跳?�已?��? (Success) ?��??��??��?試失?��??��?/p>
CLI_COMMANDS
使用?��? API 快速爬?��??��??�覽?��??�。適?�於大批?��??��??��?/p>
python crawl_notion_api.py --token $NOTION_TOKEN_V2 --page $ROOT_PAGE_ID
將零???案�?併為 API ?�件,並?��? MkDocs ?�地伺�??��?覽�?/p>
python build_knowledge_base.py && mkdocs serve
?��?讀??output ?��?並�??�至?��? Space?�`--source all` �?��?��?上傳??/p>
python upload_to_confluence.py --source all --space ENGINEERING
FALLBACK_STRATEGY
Why We Need a Fallback
Notion ?�部 API (loadPageChunk) 屬於?�公?�端點�??��??�能變更?��??��??��?證�??��??? ?�爬?�被識別?�自?��?工具,API 請�?將直?��???403 ??429??br>
?�此?��?案內�?Playwright ?�覽?�模�?/b>作為完整?�援?��?�? 以�?實瀏覽?�渲?��??��?完全繞�? API 層�?確�??�任何�?境�??�能完�?資�??�移??
API vs Playwright 比�?
| API Mode | Playwright | |
|---|---|---|
| ?�度 | ~10 min | ~3 hrs |
| ?�偵�?/td> | Header ?��? | ?�實?�覽??/span> |
| API 依賴 | ?�公??API | ??API 依賴 |
| Cloudflare | ?�能被�???/td> | 完全繞�? |
| ?��?續傳 | ✓ | ✓ |
| 記憶�?/td> | <50MB | ~500MB |
TEST_SUITE
pytest ??130 tests across 3 core modulesMODULE COVERAGE
TEST CATEGORIES
pip install pytest && pytest -v --tb=short