Dự án secondbrain theo thiết kế của tân
  • Python 92.7%
  • Svelte 2.2%
  • Astro 2%
  • JavaScript 1.3%
  • TypeScript 1.2%
  • Other 0.6%
Find a file
tant 6b088c31cf Update yuki-reader to use unified yuki-core API
Update frontend to connect to the consolidated backend endpoint.
2026-01-24 22:14:23 +07:00
.claude Update documentation for unified architecture 2026-01-24 22:14:13 +07:00
docs Update documentation for unified architecture 2026-01-24 22:14:13 +07:00
nginx Update Docker deployment for yuki-core unified backend 2026-01-24 19:41:45 +07:00
packages Update yuki-reader to use unified yuki-core API 2026-01-24 22:14:23 +07:00
.env.example Update documentation for unified architecture 2026-01-24 22:14:13 +07:00
.gitignore Update gitignore: add .mcp.json and test-results 2026-01-24 18:59:22 +07:00
docker-compose.yml Update yuki-reader to use yuki-core API 2026-01-24 19:55:44 +07:00
README.md Update all documentation for yuki-core unified backend 2026-01-24 20:00:16 +07:00

Yuki

A knowledge management system designed as a "Second Brain" for web content.

┌─────────────────────────────────────────────────────────────────────────┐
│                        KNOWLEDGE MANAGEMENT SYSTEM                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                      nginx (Reverse Proxy)                       │   │
│   │                          Port 80                                 │   │
│   │             /api/* → yuki-core    /* → yuki-reader              │   │
│   └───────────────────────────────┬─────────────────────────────────┘   │
│                                   │                                      │
│           ┌───────────────────────┼───────────────────────┐             │
│           │                       │                       │             │
│           ▼                       ▼                       ▼             │
│   ┌─────────────────┐     ┌─────────────────┐     ┌─────────────┐      │
│   │   yuki-core     │     │   yuki-reader   │     │    Ollama   │      │
│   │ (Unified        │     │   (Web UI)      │     │ (LLM)       │      │
│   │  Backend)       │     │   Port 3000     │     │ Port 11434  │      │
│   │  Port 8000      │     │   (internal)    │     │ (External)  │      │
│   └─────────────────┘     └─────────────────┘     └─────────────┘      │
└─────────────────────────────────────────────────────────────────────────┘

Features

Content Collection

  • Multi-client fetching: httpx → cloudscraper → nodriver fallback chain
  • Smart extraction: Site-specific extractors (VnExpress, Wikipedia, Medium, etc.)
  • Background crawling: Async jobs with progress tracking
  • Scheduled crawling: Cron-based automatic re-crawl
  • Content change detection: Track updates to previously crawled content
  • Event system: Webhooks for crawl/item events
  • Admin dashboard: Monitor jobs and system status

Knowledge Processing

  • Embeddings: Generate vector embeddings via Ollama
  • Named Entity Recognition: Extract people, places, organizations
  • Knowledge graph: Build entity relationships
  • Semantic search: Find content by meaning, not just keywords
  • Entity deduplication: Merge duplicate entities automatically
  • Real-time updates: WebSocket for processing status

Quick Start

# Clone the repository
git clone https://github.com/user/yuki.git
cd yuki

# Start all services
docker compose up -d

# View logs
docker compose logs -f

Services available at:

Manual Setup

# yuki-core
cd packages/yuki-core
uv sync
uv run uvicorn app.main:app --reload

Note: yuki-core requires Ollama running locally:

ollama pull nomic-embed-text
ollama pull qwen2.5:7b  # for NER

API Examples

Extract Content

# Single URL
curl -X POST http://localhost/api/extract \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

# Batch URLs
curl -X POST http://localhost/api/extract/batch \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com/1", "https://example.com/2"]}'

Background Crawl

# Start crawl job
curl -X POST http://localhost/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/series", "mode": "collection"}'

# Check status
curl http://localhost/api/crawl/{job_id}

# Cancel job
curl -X DELETE http://localhost/api/crawl/{job_id}
# Search by meaning
curl -X POST http://localhost/api/search/semantic \
  -H "Content-Type: application/json" \
  -d '{"query": "machine learning applications", "limit": 10}'

Knowledge Graph

# Get entities
curl "http://localhost/api/entities?type=PERSON&limit=20"

# Get entity relationships
curl http://localhost/api/graph/entity/{entity_id}/relations

Data Access

# List collections
curl http://localhost/api/collections

# Get items in collection
curl "http://localhost/api/items?collection_id={id}"

# Get plain text content (optimized for AI)
curl http://localhost/api/items/{id}/text

Configuration

Environment Variables

Variable Default Description
YUKI_PORT 8000 API server port
YUKI_DB_PATH ./data/yuki-core.lance Database path
YUKI_DEFAULT_CLIENT auto httpx/cloudscraper/nodriver/auto
YUKI_MIN_DELAY_MS 1000 Min delay between requests
YUKI_MAX_CONCURRENT_JOBS 10 Max concurrent crawl jobs
YUKI_OLLAMA_BASE_URL http://localhost:11434 Ollama API URL
YUKI_OLLAMA_EMBED_MODEL nomic-embed-text Embedding model
YUKI_OLLAMA_LLM_MODEL qwen2.5:7b LLM for NER
YUKI_WORKER_ENABLED true Enable background processing
YUKI_MAX_CONCURRENT_TASKS 3 Concurrent processing tasks
YUKI_API_KEY_ENABLED false Enable API key auth
YUKI_RATE_LIMIT_ENABLED false Enable rate limiting
YUKI_WEBHOOKS_ENABLED false Enable webhook events

Project Structure

yuki/
├── packages/
│   ├── yuki-core/             # Unified Backend (port 8000)
│   │   ├── app/
│   │   │   ├── api/           # REST endpoints
│   │   │   ├── fetcher/       # HTTP clients (httpx, cloudscraper, nodriver)
│   │   │   ├── processor/     # Content extraction by domain
│   │   │   ├── processors/    # Embedder, NER, Relations
│   │   │   ├── services/      # Business logic
│   │   │   ├── storage/       # LanceDB (12 tables)
│   │   │   └── worker/        # Background processing pipeline
│   │   └── tests/
│   │
│   └── yuki-reader/           # Web UI (port 3000)
│       └── ...
│
├── nginx/                     # Reverse proxy config
├── docs/                      # Documentation
├── docker-compose.yml
└── data/                      # Runtime data (gitignored)
    └── yuki-core/

API Security

When deploying publicly, enable API key authentication:

# Generate secure API key
python -c "import secrets; print(secrets.token_urlsafe(32))"

# Set environment variables
YUKI_API_KEY_ENABLED=true
YUKI_API_KEY=your-generated-key-here

Include API key in requests:

curl http://localhost/api/items \
  -H "X-API-Key: your-api-key-here"

Public endpoints (no API key required):

  • GET / - API info
  • GET /health - Health check
  • GET /docs - Swagger documentation
  • GET /metrics - Prometheus metrics

Testing

cd packages/yuki-core && uv run pytest

# With coverage
uv run pytest --cov=app --cov-report=html

Documentation

License

MIT