File size: 9,400 Bytes
dbda359 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
# AI Comic Factory - Project Documentation
## Project Overview
**AI Comic Factory** is a Next.js application that generates AI-powered comic strips using Large Language Models (LLMs) and image generation APIs. Users input a prompt, select a comic style, and the system generates a complete comic with panels, dialog, and artwork.
**Key Features:**
- Generate complete comics from a single text prompt
- Multiple comic art styles and fonts
- Support for multiple LLM providers (OpenAI, Anthropic, Groq, Hugging Face)
- Multiple image generation engines (SDXL, OpenAI DALL-E, Replicate)
- Interactive comic editor with speech bubbles and captions
- Export to CLAP format (Cinematic Language and Audio Protocol)
- Community sharing features (optional)
- OAuth integration with Hugging Face
## Technology Stack
**Frontend:**
- Next.js 14.2.7 with App Router
- React 18.3.1 with TypeScript 5.4.5
- Tailwind CSS 3.4.1 with custom comic fonts
- shadcn/ui component library (Radix UI primitives)
- Zustand for state management
- React Konva for canvas-based comic editing
- Framer Motion alternatives via Tailwind animations
**Backend/API:**
- Next.js Server Actions (9 server functions identified)
- Multiple LLM integrations: OpenAI, Anthropic Claude, Groq, Hugging Face
- Multiple rendering engines: SDXL, Replicate, VideoChain API, OpenAI DALL-E
- Image processing with Sharp, HTML2Canvas
- Docker containerization
**Key Dependencies:**
- `@aitube/clap` - CLAP format support for multimedia projects
- `@anthropic-ai/sdk` - Claude AI integration
- `@huggingface/inference` - Hugging Face model access
- `groq-sdk` - Groq API integration
- `openai` - OpenAI API integration
- `replicate` - Replicate.com API integration
- Custom font handling with 13 different comic fonts
## Project Structure
```
src/
βββ app/ # Next.js app router
β βββ engine/ # Core business logic
β β βββ presets.ts # Comic style presets (678 lines, 4 main presets)
β β βββ render.ts # Image generation engine
β β βββ caption.ts # Caption generation
β β βββ censorship.ts # Content filtering
β βββ interface/ # UI components (22 directories)
β β βββ page/ # Comic page layout
β β βββ panel/ # Individual comic panels
β β βββ bottom-bar/ # Controls and actions
β β βββ settings-dialog/ # Configuration UI
β β βββ ...
β βββ queries/ # Server-side data fetching (13 files)
β β βββ predict.ts # LLM prediction orchestration
β β βββ predictNextPanels.ts # Panel generation logic
β β βββ predictWith*.ts # Provider-specific implementations
β β βββ ...
β βββ store/ # Zustand state management
β β βββ index.ts # Main app state (21KB)
β βββ layouts/ # Comic layout definitions
β βββ main.tsx # Main application component
βββ components/
β βββ ui/ # shadcn/ui components (27 components)
β βββ icons/ # Custom icons
βββ lib/ # Utility functions (49 files)
β βββ fonts.ts # Comic font definitions
β βββ bubble/ # Speech bubble utilities
β βββ [various utilities for image processing, parsing, etc.]
βββ fonts/ # 13 custom comic fonts
βββ types.ts # TypeScript type definitions (217 lines)
```
## Development Commands
```bash
# Development
npm run dev # Start development server
npm run build # Production build
npm run start # Start production server
npm run lint # ESLint checking
# Node version
nvm use # Uses Node v20.17.0 (specified in .nvmrc)
```
## Environment Configuration
The application requires extensive environment configuration in `.env.local`:
**Core Engines:**
- `LLM_ENGINE`: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "OPENAI" | "GROQ" | "ANTHROPIC"
- `RENDERING_ENGINE`: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "REPLICATE" | "VIDEOCHAIN" | "OPENAI"
**Authentication (configure only what you use):**
- `AUTH_HF_API_TOKEN` - Hugging Face API token
- `AUTH_OPENAI_API_KEY` - OpenAI API key
- `AUTH_GROQ_API_KEY` - Groq API key
- `AUTH_ANTHROPIC_API_KEY` - Anthropic/Claude API key
- `AUTH_REPLICATE_API_TOKEN` - Replicate.com token
- `AUTH_VIDEOCHAIN_API_TOKEN` - VideoChain API token
**LLM Configuration:**
- `LLM_HF_INFERENCE_API_MODEL` - Default: "HuggingFaceH4/zephyr-7b-beta"
- `LLM_OPENAI_API_MODEL` - Default: "gpt-4-turbo"
- `LLM_GROQ_API_MODEL` - Default: "mixtral-8x7b-32768"
- `LLM_ANTHROPIC_API_MODEL` - Default: "claude-3-opus-20240229"
**Rendering Configuration:**
- `RENDERING_HF_INFERENCE_API_BASE_MODEL` - Default: "stabilityai/stable-diffusion-xl-base-1.0"
- `RENDERING_REPLICATE_API_MODEL` - Default: "stabilityai/sdxl"
- `MAX_NB_PAGES` - Default: 6
## Architecture Patterns
**State Management:**
- Zustand store with typed selectors and actions
- Complex state includes: panels, speeches, captions, renderedScenes, layouts
- Real-time panel generation status tracking
**LLM Integration Pattern:**
- Abstracted provider interface through `predict()` function
- Provider-specific implementations in separate files
- Standardized prompt templates and response parsing
- Support for multiple prompt formats (Zephyr, Llama, etc.)
**Image Generation Flow:**
1. User provides prompt + selects preset
2. LLM generates panel descriptions, speech, and captions
3. Each panel description is sent to rendering engine
4. Images are generated and cached
5. User can edit speech bubbles and captions
6. Final comic can be exported as image or CLAP file
**Server Actions Architecture:**
- 9 server actions for LLM predictions and rendering
- Clean separation between UI and server logic
- Error handling and fallbacks for API failures
**Comic Preset System:**
- 4 main preset categories with 678 lines of configuration
- Each preset defines: art style, color scheme, font, LLM prompts, image prompts
- Extensible system for adding new comic styles
**Font System:**
- 13 custom comic fonts loaded as local fonts
- Includes both Google Fonts (Indie Flower, The Girl Next Door) and custom fonts
- Proper CSS variable integration for consistent typography
## Key Business Logic
**Panel Generation (`predictNextPanels`):**
- Generates multiple comic panels from a single prompt
- Handles continuation of existing stories
- Parses LLM responses into structured panel data (instructions, speech, captions)
- Error recovery and retry logic
**Rendering Pipeline (`render.ts`):**
- Multi-provider image generation (Replicate, HF, OpenAI, VideoChain)
- Automatic fallbacks between providers
- Image caching and optimization
- Support for different aspect ratios and resolutions
**State Persistence:**
- LocalStorage integration for user settings
- CLAP file format support for project serialization
- OAuth state management with Hugging Face
## Development Patterns
**Component Organization:**
- Feature-based component structure in `app/interface/`
- Reusable UI components in `components/ui/`
- Custom hooks in `lib/` for complex logic
**Type Safety:**
- Comprehensive TypeScript definitions in `types.ts`
- Strict typing for LLM engines, rendering engines, and data flows
- Generic interfaces for extensible provider support
**Error Handling:**
- Graceful degradation for API failures
- User feedback through toast notifications
- Fallback content for missing images/data
**Performance Considerations:**
- Image optimization with Sharp
- Lazy loading of comic panels
- Efficient state updates with Zustand
- Canvas-based rendering for complex layouts
## Testing & Quality
- **Linting**: ESLint with Next.js configuration
- **No test files found** - this is an area for improvement
- **Type checking**: Strict TypeScript configuration
- **Docker**: Production containerization available
## Deployment
- Designed for Hugging Face Spaces deployment
- Docker containerization with Node.js Alpine
- Standalone Next.js output for containerized deployment
- Environment-based configuration for different deployment targets
## Community & Contributions
- Open source project on Hugging Face
- Community contributions documented in `CONTRIBUTORS.md`
- Optional community sharing features
- OAuth integration for user management
## Development Notes
- **No API routes found** - uses Server Actions exclusively
- **Canvas-based editing** with React Konva for interactive panels
- **Multi-provider architecture** allows switching between AI services
- **Extensive font library** for authentic comic typography
- **CLAP format integration** for multimedia project export
- **Rate limiting** configurable for production usage
## Quick Start for Developers
1. Copy `.env` to `.env.local` and configure your API keys
2. Choose your LLM_ENGINE and RENDERING_ENGINE
3. Install dependencies: `npm install`
4. Run development server: `npm run dev`
5. The app will guide you through first-time setup
Most common development setup:
- LLM_ENGINE: "OPENAI" with OpenAI API key
- RENDERING_ENGINE: "REPLICATE" with Replicate token
- This provides reliable, high-quality results for testing |