Architecture¶
Pipeline Overview¶
OpenSift follows a four-stage pipeline:
graph TD
A[User Query] --> B[Query Planner]
B -->|search queries + criteria| C[Search Adapters]
C -->|raw results| D[Evidence Verifier]
D -->|assessments| E[Result Classifier]
E --> F[Structured Response]
W[WisModel LLM] -.->|powers| B
W -.->|powers| D
Stage 1: Query Planner¶
Takes a natural language question and generates via LLM:
- Search queries — 2–4 precise keyword phrases for the search backend
- Screening criteria — 1–4 quantified rules, each with type, description, and weight
Stage 2: Search Adapters¶
Dispatches the generated queries to one or more search backends via the adapter pattern. Results are normalized to a standard schema.
Stage 3: Evidence Verifier¶
Verifies each search result against each criterion using the LLM:
| Assessment | Meaning |
|---|---|
| Support | Criterion clearly met, with cited evidence |
| Somewhat Support | Partially relevant but not fully met |
| Reject | Clearly does not meet the criterion |
| Insufficient Information | Not enough info to judge |
Stage 4: Result Classifier¶
Automatically classifies based on verification results:
| Classification | Rule |
|---|---|
| Perfect | All criteria are Support |
| Partial | At least one non-time criterion is Support or Somewhat Support |
| Reject | All criteria are Reject, or only time criteria pass |
Project Structure¶
opensift/
├── src/opensift/
│ ├── core/ # Core AI pipeline
│ │ ├── engine.py # Orchestrator (Plan → Search → Verify → Classify)
│ │ ├── planner/planner.py # Query planning
│ │ ├── verifier/verifier.py # Result verification
│ │ ├── classifier.py # Classification
│ │ └── llm/ # LLM client + prompt templates
│ ├── adapters/ # Search backend adapters (pluggable)
│ │ ├── base/ # Abstract interface
│ │ ├── atomwalker/ # AtomWalker academic search
│ │ ├── elasticsearch/ # Elasticsearch
│ │ ├── opensearch/ # OpenSearch
│ │ ├── solr/ # Apache Solr
│ │ ├── meilisearch/ # MeiliSearch
│ │ └── wikipedia/ # Wikipedia
│ ├── models/ # Data models (Pydantic)
│ ├── client/ # Python SDK
│ ├── api/ # REST API (FastAPI)
│ ├── config/ # Config management
│ └── observability/ # Logging
├── tests/
│ ├── unit/ # Unit tests (mocked)
│ └── integration/ # Integration tests (Docker)
├── deployments/docker/ # Docker Compose files
├── docs/ # Documentation (this site)
└── pyproject.toml
Data Flow¶
sequenceDiagram
participant U as User
participant API as REST API
participant E as Engine
participant P as Planner
participant S as Search Adapter
participant V as Verifier
participant C as Classifier
U->>API: POST /v1/search
API->>E: execute(query, options)
E->>P: plan(query)
P-->>E: search_queries + criteria
E->>S: search(queries)
S-->>E: raw results
E->>V: verify(results, criteria)
V-->>E: assessments
E->>C: classify(assessments)
C-->>E: perfect / partial / reject
E-->>API: structured response
API-->>U: JSON response