tant/yuki

Dự án secondbrain theo thiết kế của tân

Python 92.7%
Svelte 2.2%
Astro 2%
JavaScript 1.3%
TypeScript 1.2%
Other 0.6%

Find a file

tant 6b088c31cf Update yuki-reader to use unified yuki-core API Update frontend to connect to the consolidated backend endpoint.		2026-01-24 22:14:23 +07:00
.claude	Update documentation for unified architecture	2026-01-24 22:14:13 +07:00
docs	Update documentation for unified architecture	2026-01-24 22:14:13 +07:00
nginx	Update Docker deployment for yuki-core unified backend	2026-01-24 19:41:45 +07:00
packages	Update yuki-reader to use unified yuki-core API	2026-01-24 22:14:23 +07:00
.env.example	Update documentation for unified architecture	2026-01-24 22:14:13 +07:00
.gitignore	Update gitignore: add .mcp.json and test-results	2026-01-24 18:59:22 +07:00
docker-compose.yml	Update yuki-reader to use yuki-core API	2026-01-24 19:55:44 +07:00
README.md	Update all documentation for yuki-core unified backend	2026-01-24 20:00:16 +07:00

README.md

Yuki

A knowledge management system designed as a "Second Brain" for web content.

┌─────────────────────────────────────────────────────────────────────────┐
│                        KNOWLEDGE MANAGEMENT SYSTEM                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                      nginx (Reverse Proxy)                       │   │
│   │                          Port 80                                 │   │
│   │             /api/* → yuki-core    /* → yuki-reader              │   │
│   └───────────────────────────────┬─────────────────────────────────┘   │
│                                   │                                      │
│           ┌───────────────────────┼───────────────────────┐             │
│           │                       │                       │             │
│           ▼                       ▼                       ▼             │
│   ┌─────────────────┐     ┌─────────────────┐     ┌─────────────┐      │
│   │   yuki-core     │     │   yuki-reader   │     │    Ollama   │      │
│   │ (Unified        │     │   (Web UI)      │     │ (LLM)       │      │
│   │  Backend)       │     │   Port 3000     │     │ Port 11434  │      │
│   │  Port 8000      │     │   (internal)    │     │ (External)  │      │
│   └─────────────────┘     └─────────────────┘     └─────────────┘      │
└─────────────────────────────────────────────────────────────────────────┘

Features

Content Collection

Multi-client fetching: httpx → cloudscraper → nodriver fallback chain
Smart extraction: Site-specific extractors (VnExpress, Wikipedia, Medium, etc.)
Background crawling: Async jobs with progress tracking
Scheduled crawling: Cron-based automatic re-crawl
Content change detection: Track updates to previously crawled content
Event system: Webhooks for crawl/item events
Admin dashboard: Monitor jobs and system status

Knowledge Processing

Embeddings: Generate vector embeddings via Ollama
Named Entity Recognition: Extract people, places, organizations
Knowledge graph: Build entity relationships
Semantic search: Find content by meaning, not just keywords
Entity deduplication: Merge duplicate entities automatically
Real-time updates: WebSocket for processing status

Quick Start

Using Docker (Recommended)

# Clone the repository
git clone https://github.com/user/yuki.git
cd yuki

# Start all services
docker compose up -d

# View logs
docker compose logs -f

Services available at:

Manual Setup

# yuki-core
cd packages/yuki-core
uv sync
uv run uvicorn app.main:app --reload

Note: yuki-core requires Ollama running locally:

ollama pull nomic-embed-text
ollama pull qwen2.5:7b  # for NER

API Examples

Extract Content

# Single URL
curl -X POST http://localhost/api/extract \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

# Batch URLs
curl -X POST http://localhost/api/extract/batch \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com/1", "https://example.com/2"]}'

Background Crawl

# Start crawl job
curl -X POST http://localhost/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/series", "mode": "collection"}'

# Check status
curl http://localhost/api/crawl/{job_id}

# Cancel job
curl -X DELETE http://localhost/api/crawl/{job_id}

Semantic Search

# Search by meaning
curl -X POST http://localhost/api/search/semantic \
  -H "Content-Type: application/json" \
  -d '{"query": "machine learning applications", "limit": 10}'

Knowledge Graph

# Get entities
curl "http://localhost/api/entities?type=PERSON&limit=20"

# Get entity relationships
curl http://localhost/api/graph/entity/{entity_id}/relations

Data Access

# List collections
curl http://localhost/api/collections

# Get items in collection
curl "http://localhost/api/items?collection_id={id}"

# Get plain text content (optimized for AI)
curl http://localhost/api/items/{id}/text

Configuration

Environment Variables

Variable	Default	Description
`YUKI_PORT`	8000	API server port
`YUKI_DB_PATH`	./data/yuki-core.lance	Database path
`YUKI_DEFAULT_CLIENT`	auto	httpx/cloudscraper/nodriver/auto
`YUKI_MIN_DELAY_MS`	1000	Min delay between requests
`YUKI_MAX_CONCURRENT_JOBS`	10	Max concurrent crawl jobs
`YUKI_OLLAMA_BASE_URL`	http://localhost:11434	Ollama API URL
`YUKI_OLLAMA_EMBED_MODEL`	nomic-embed-text	Embedding model
`YUKI_OLLAMA_LLM_MODEL`	qwen2.5:7b	LLM for NER
`YUKI_WORKER_ENABLED`	true	Enable background processing
`YUKI_MAX_CONCURRENT_TASKS`	3	Concurrent processing tasks
`YUKI_API_KEY_ENABLED`	false	Enable API key auth
`YUKI_RATE_LIMIT_ENABLED`	false	Enable rate limiting
`YUKI_WEBHOOKS_ENABLED`	false	Enable webhook events

Project Structure

yuki/
├── packages/
│   ├── yuki-core/             # Unified Backend (port 8000)
│   │   ├── app/
│   │   │   ├── api/           # REST endpoints
│   │   │   ├── fetcher/       # HTTP clients (httpx, cloudscraper, nodriver)
│   │   │   ├── processor/     # Content extraction by domain
│   │   │   ├── processors/    # Embedder, NER, Relations
│   │   │   ├── services/      # Business logic
│   │   │   ├── storage/       # LanceDB (12 tables)
│   │   │   └── worker/        # Background processing pipeline
│   │   └── tests/
│   │
│   └── yuki-reader/           # Web UI (port 3000)
│       └── ...
│
├── nginx/                     # Reverse proxy config
├── docs/                      # Documentation
├── docker-compose.yml
└── data/                      # Runtime data (gitignored)
    └── yuki-core/

API Security

When deploying publicly, enable API key authentication:

# Generate secure API key
python -c "import secrets; print(secrets.token_urlsafe(32))"

# Set environment variables
YUKI_API_KEY_ENABLED=true
YUKI_API_KEY=your-generated-key-here

Include API key in requests:

curl http://localhost/api/items \
  -H "X-API-Key: your-api-key-here"

Public endpoints (no API key required):

GET / - API info
GET /health - Health check
GET /docs - Swagger documentation
GET /metrics - Prometheus metrics

Testing

cd packages/yuki-core && uv run pytest

# With coverage
uv run pytest --cov=app --cov-report=html

Documentation

Architecture - System design and data flow
API Reference - Complete API documentation
Getting Started - Setup guide
User Guide - Usage examples
Roadmap - Planned features

License

MIT