# Contributing to The DETERMINATOR

Thank you for your interest in contributing to The DETERMINATOR! This guide will help you get started.

## Table of Contents

- [Git Workflow](#git-workflow)
- [Getting Started](#getting-started)
- [Development Commands](#development-commands)
- [MCP Integration](#mcp-integration)
- [Common Pitfalls](#common-pitfalls)
- [Key Principles](#key-principles)
- [Pull Request Process](#pull-request-process)

> **Note**: Additional sections (Code Style, Error Handling, Testing, Implementation Patterns, Code Quality, and Prompt Engineering) are available as separate pages in the [documentation](https://deepcritical.github.io/GradioDemo/contributing/).  
> **Note on Project Names**: "The DETERMINATOR" is the product name, "DeepCritical" is the organization/project name, and "determinator" is the Python package name.

## Repository Information

- **GitHub Repository**: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo) (source of truth, PRs, code review)
- **HuggingFace Space**: [`DataQuests/DeepCritical`](https://huggingface.co/spaces/DataQuests/DeepCritical) (deployment/demo)
- **Package Name**: `determinator` (Python package name in `pyproject.toml`)

## Git Workflow

- `main`: Production-ready (GitHub)
- `dev`: Development integration (GitHub)
- Use feature branches: `yourname-dev`
- **NEVER** push directly to `main` or `dev` on HuggingFace
- GitHub is source of truth; HuggingFace is for deployment

### Dual Repository Setup

This project uses a dual repository setup:

- **GitHub (`DeepCritical/GradioDemo`)**: Source of truth for code, PRs, and code review
- **HuggingFace (`DataQuests/DeepCritical`)**: Deployment target for the Gradio demo

#### Remote Configuration

When cloning, set up remotes as follows:

```bash
# Clone from GitHub
git clone https://github.com/DeepCritical/GradioDemo.git
cd GradioDemo

# Add HuggingFace remote (optional, for deployment)
git remote add huggingface-upstream https://huggingface.co/spaces/DataQuests/DeepCritical
```

**Important**: Never push directly to `main` or `dev` on HuggingFace. Always work through GitHub PRs. GitHub is the source of truth; HuggingFace is for deployment/demo only.

## Getting Started

1. **Fork the repository** on GitHub: [`DeepCritical/GradioDemo`](https://github.com/DeepCritical/GradioDemo)
2. **Clone your fork**:

   ```bash
   git clone https://github.com/yourusername/GradioDemo.git
   cd GradioDemo
   ```

3. **Install dependencies**:

   ```bash
   uv sync --all-extras
   uv run pre-commit install
   ```

4. **Create a feature branch**:

   ```bash
   git checkout -b yourname-feature-name
   ```

5. **Make your changes** following the guidelines below
6. **Run checks**:

   ```bash
   uv run ruff check src tests
   uv run mypy src
   uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire
   ```

7. **Commit and push**:

   ```bash
   git commit -m "Description of changes"
   git push origin yourname-feature-name
   ```

8. **Create a pull request** on GitHub

## Package Manager

This project uses [`uv`](https://github.com/astral-sh/uv) as the package manager. All commands should be prefixed with `uv run` to ensure they run in the correct environment.

### Installation

```bash
# Install uv if you haven't already (recommended: standalone installer)
# Unix/macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Alternative: pipx install uv
# Or: pip install uv

# Sync all dependencies including dev extras
uv sync --all-extras

# Install pre-commit hooks
uv run pre-commit install
```

## Development Commands

```bash
# Installation
uv sync --all-extras              # Install all dependencies including dev
uv run pre-commit install          # Install pre-commit hooks

# Code Quality Checks (run all before committing)
uv run ruff check src tests       # Lint with ruff
uv run ruff format src tests      # Format with ruff
uv run mypy src                   # Type checking
uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with coverage

# Testing Commands
uv run pytest tests/unit/ -v -m "not openai" -p no:logfire              # Run unit tests (excludes OpenAI tests)
uv run pytest tests/ -v -m "huggingface" -p no:logfire                 # Run HuggingFace tests
uv run pytest tests/ -v -p no:logfire                                  # Run all tests
uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire  # Tests with terminal coverage
uv run pytest --cov=src --cov-report=html -p no:logfire                # Generate HTML coverage report (opens htmlcov/index.html)

# Documentation Commands
uv run mkdocs build                # Build documentation
uv run mkdocs serve                # Serve documentation locally (http://127.0.0.1:8000)
```

### Test Markers

The project uses pytest markers to categorize tests. See [Testing Guidelines](docs/contributing/testing.md) for details:

- `unit`: Unit tests (mocked, fast)
- `integration`: Integration tests (real APIs)
- `slow`: Slow tests
- `openai`: Tests requiring OpenAI API key
- `huggingface`: Tests requiring HuggingFace API key
- `embedding_provider`: Tests requiring API-based embedding providers
- `local_embeddings`: Tests using local embeddings

**Note**: The `-p no:logfire` flag disables the logfire plugin to avoid conflicts during testing.

## Code Style & Conventions

### Type Safety

- **ALWAYS** use type hints for all function parameters and return types
- Use `mypy --strict` compliance (no `Any` unless absolutely necessary)
- Use `TYPE_CHECKING` imports for circular dependencies:

<!--codeinclude-->
[TYPE_CHECKING Import Pattern](../src/utils/citation_validator.py) start_line:8 end_line:11
<!--/codeinclude-->

### Pydantic Models

- All data exchange uses Pydantic models (`src/utils/models.py`)
- Models are frozen (`model_config = {"frozen": True}`) for immutability
- Use `Field()` with descriptions for all model fields
- Validate with `ge=`, `le=`, `min_length=`, `max_length=` constraints

### Async Patterns

- **ALL** I/O operations must be async (`async def`, `await`)
- Use `asyncio.gather()` for parallel operations
- CPU-bound work (embeddings, parsing) must use `run_in_executor()`:

```python
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, cpu_bound_function, args)
```

- Never block the event loop with synchronous I/O

### Linting

- Ruff with 100-char line length
- Ignore rules documented in `pyproject.toml`:
  - `PLR0913`: Too many arguments (agents need many params)
  - `PLR0912`: Too many branches (complex orchestrator logic)
  - `PLR0911`: Too many return statements (complex agent logic)
  - `PLR2004`: Magic values (statistical constants)
  - `PLW0603`: Global statement (singleton pattern)
  - `PLC0415`: Lazy imports for optional dependencies

### Pre-commit

- Pre-commit hooks run automatically on commit
- Must pass: lint + typecheck + test-cov
- Install hooks with: `uv run pre-commit install`
- Note: `uv sync --all-extras` installs the pre-commit package, but you must run `uv run pre-commit install` separately to set up the git hooks

## Error Handling & Logging

### Exception Hierarchy

Use custom exception hierarchy (`src/utils/exceptions.py`):

<!--codeinclude-->
[Exception Hierarchy](../src/utils/exceptions.py) start_line:4 end_line:31
<!--/codeinclude-->

### Error Handling Rules

- Always chain exceptions: `raise SearchError(...) from e`
- Log errors with context using `structlog`:

```python
logger.error("Operation failed", error=str(e), context=value)
```

- Never silently swallow exceptions
- Provide actionable error messages

### Logging

- Use `structlog` for all logging (NOT `print` or `logging`)
- Import: `import structlog; logger = structlog.get_logger()`
- Log with structured data: `logger.info("event", key=value)`
- Use appropriate levels: DEBUG, INFO, WARNING, ERROR

### Logging Examples

```python
logger.info("Starting search", query=query, tools=[t.name for t in tools])
logger.warning("Search tool failed", tool=tool.name, error=str(result))
logger.error("Assessment failed", error=str(e))
```

### Error Chaining

Always preserve exception context:

```python
try:
    result = await api_call()
except httpx.HTTPError as e:
    raise SearchError(f"API call failed: {e}") from e
```

## Testing Requirements

### Test Structure

- Unit tests in `tests/unit/` (mocked, fast)
- Integration tests in `tests/integration/` (real APIs, marked `@pytest.mark.integration`)
- Use markers: `unit`, `integration`, `slow`

### Mocking

- Use `respx` for httpx mocking
- Use `pytest-mock` for general mocking
- Mock LLM calls in unit tests (use `MockJudgeHandler`)
- Fixtures in `tests/conftest.py`: `mock_httpx_client`, `mock_llm_response`

### TDD Workflow

1. Write failing test in `tests/unit/`
2. Implement in `src/`
3. Ensure test passes
4. Run checks: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`

### Test Examples

```python
@pytest.mark.unit
async def test_pubmed_search(mock_httpx_client):
    tool = PubMedTool()
    results = await tool.search("metformin", max_results=5)
    assert len(results) > 0
    assert all(isinstance(r, Evidence) for r in results)

@pytest.mark.integration
async def test_real_pubmed_search():
    tool = PubMedTool()
    results = await tool.search("metformin", max_results=3)
    assert len(results) <= 3
```

### Test Coverage

- Run `uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire` for coverage report
- Run `uv run pytest --cov=src --cov-report=html -p no:logfire` for HTML coverage report (opens `htmlcov/index.html`)
- Aim for >80% coverage on critical paths
- Exclude: `__init__.py`, `TYPE_CHECKING` blocks

## Implementation Patterns

### Search Tools

All tools implement `SearchTool` protocol (`src/tools/base.py`):

- Must have `name` property
- Must implement `async def search(query, max_results) -> list[Evidence]`
- Use `@retry` decorator from tenacity for resilience
- Rate limiting: Implement `_rate_limit()` for APIs with limits (e.g., PubMed)
- Error handling: Raise `SearchError` or `RateLimitError` on failures

Example pattern:

```python
class MySearchTool:
    @property
    def name(self) -> str:
        return "mytool"
    
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(...))
    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
        # Implementation
        return evidence_list
```

### Judge Handlers

- Implement `JudgeHandlerProtocol` (`async def assess(question, evidence) -> JudgeAssessment`)
- Use pydantic-ai `Agent` with `output_type=JudgeAssessment`
- System prompts in `src/prompts/judge.py`
- Support fallback handlers: `MockJudgeHandler`, `HFInferenceJudgeHandler`
- Always return valid `JudgeAssessment` (never raise exceptions)

### Agent Factory Pattern

- Use factory functions for creating agents (`src/agent_factory/`)
- Lazy initialization for optional dependencies (e.g., embeddings, Modal)
- Check requirements before initialization:

<!--codeinclude-->
[Check Magentic Requirements](../src/utils/llm_factory.py) start_line:152 end_line:170
<!--/codeinclude-->

### State Management

- **Magentic Mode**: Use `ContextVar` for thread-safe state (`src/agents/state.py`)
- **Simple Mode**: Pass state via function parameters
- Never use global mutable state (except singletons via `@lru_cache`)

### Singleton Pattern

Use `@lru_cache(maxsize=1)` for singletons:

<!--codeinclude-->
[Singleton Pattern Example](../src/services/statistical_analyzer.py) start_line:252 end_line:255
<!--/codeinclude-->

- Lazy initialization to avoid requiring dependencies at import time

## Code Quality & Documentation

### Docstrings

- Google-style docstrings for all public functions
- Include Args, Returns, Raises sections
- Use type hints in docstrings only if needed for clarity

Example:

<!--codeinclude-->
[Search Method Docstring Example](../src/tools/pubmed.py) start_line:51 end_line:58
<!--/codeinclude-->

### Code Comments

- Explain WHY, not WHAT
- Document non-obvious patterns (e.g., why `requests` not `httpx` for ClinicalTrials)
- Mark critical sections: `# CRITICAL: ...`
- Document rate limiting rationale
- Explain async patterns when non-obvious

## Prompt Engineering & Citation Validation

### Judge Prompts

- System prompt in `src/prompts/judge.py`
- Format evidence with truncation (1500 chars per item)
- Handle empty evidence case separately
- Always request structured JSON output
- Use `format_user_prompt()` and `format_empty_evidence_prompt()` helpers

### Hypothesis Prompts

- Use diverse evidence selection (MMR algorithm)
- Sentence-aware truncation (`truncate_at_sentence()`)
- Format: Drug → Target → Pathway → Effect
- System prompt emphasizes mechanistic reasoning
- Use `format_hypothesis_prompt()` with embeddings for diversity

### Report Prompts

- Include full citation details for validation
- Use diverse evidence selection (n=20)
- **CRITICAL**: Emphasize citation validation rules
- Format hypotheses with support/contradiction counts
- System prompt includes explicit JSON structure requirements

### Citation Validation

- **ALWAYS** validate references before returning reports
- Use `validate_references()` from `src/utils/citation_validator.py`
- Remove hallucinated citations (URLs not in evidence)
- Log warnings for removed citations
- Never trust LLM-generated citations without validation

### Citation Validation Rules

1. Every reference URL must EXACTLY match a provided evidence URL
2. Do NOT invent, fabricate, or hallucinate any references
3. Do NOT modify paper titles, authors, dates, or URLs
4. If unsure about a citation, OMIT it rather than guess
5. Copy URLs exactly as provided - do not create similar-looking URLs

### Evidence Selection

- Use `select_diverse_evidence()` for MMR-based selection
- Balance relevance vs diversity (lambda=0.7 default)
- Sentence-aware truncation preserves meaning
- Limit evidence per prompt to avoid context overflow

## MCP Integration

### MCP Tools

- Functions in `src/mcp_tools.py` for Claude Desktop
- Full type hints required
- Google-style docstrings with Args/Returns sections
- Formatted string returns (markdown)

### Gradio MCP Server

- Enable with `mcp_server=True` in `demo.launch()`
- Endpoint: `/gradio_api/mcp/`
- Use `ssr_mode=False` to fix hydration issues in HF Spaces

## Common Pitfalls

1. **Blocking the event loop**: Never use sync I/O in async functions
2. **Missing type hints**: All functions must have complete type annotations
3. **Hallucinated citations**: Always validate references
4. **Global mutable state**: Use ContextVar or pass via parameters
5. **Import errors**: Lazy-load optional dependencies (magentic, modal, embeddings)
6. **Rate limiting**: Always implement for external APIs
7. **Error chaining**: Always use `from e` when raising exceptions

## Key Principles

1. **Type Safety First**: All code must pass `mypy --strict`
2. **Async Everything**: All I/O must be async
3. **Test-Driven**: Write tests before implementation
4. **No Hallucinations**: Validate all citations
5. **Graceful Degradation**: Support free tier (HF Inference) when no API keys
6. **Lazy Loading**: Don't require optional dependencies at import time
7. **Structured Logging**: Use structlog, never print()
8. **Error Chaining**: Always preserve exception context

## Pull Request Process

1. Ensure all checks pass: `uv run ruff check src tests && uv run mypy src && uv run pytest --cov=src --cov-report=term-missing tests/unit/ -v -m "not openai" -p no:logfire`
2. Update documentation if needed
3. Add tests for new features
4. Update CHANGELOG if applicable
5. Request review from maintainers
6. Address review feedback
7. Wait for approval before merging

## Project Structure

- `src/`: Main source code
- `tests/`: Test files (`unit/` and `integration/`)
- `docs/`: Documentation source files (MkDocs)
- `examples/`: Example usage scripts
- `pyproject.toml`: Project configuration and dependencies
- `.pre-commit-config.yaml`: Pre-commit hook configuration

## Questions?

- Open an issue on [GitHub](https://github.com/DeepCritical/GradioDemo)
- Check existing [documentation](https://deepcritical.github.io/GradioDemo/)
- Review code examples in the codebase

Thank you for contributing to The DETERMINATOR!