| # AI Comic Factory - Project Documentation | |
| ## Project Overview | |
| **AI Comic Factory** is a Next.js application that generates AI-powered comic strips using Large Language Models (LLMs) and image generation APIs. Users input a prompt, select a comic style, and the system generates a complete comic with panels, dialog, and artwork. | |
| **Key Features:** | |
| - Generate complete comics from a single text prompt | |
| - Multiple comic art styles and fonts | |
| - Support for multiple LLM providers (OpenAI, Anthropic, Groq, Hugging Face) | |
| - Multiple image generation engines (SDXL, OpenAI DALL-E, Replicate) | |
| - Interactive comic editor with speech bubbles and captions | |
| - Export to CLAP format (Cinematic Language and Audio Protocol) | |
| - Community sharing features (optional) | |
| - OAuth integration with Hugging Face | |
| ## Technology Stack | |
| **Frontend:** | |
| - Next.js 14.2.7 with App Router | |
| - React 18.3.1 with TypeScript 5.4.5 | |
| - Tailwind CSS 3.4.1 with custom comic fonts | |
| - shadcn/ui component library (Radix UI primitives) | |
| - Zustand for state management | |
| - React Konva for canvas-based comic editing | |
| - Framer Motion alternatives via Tailwind animations | |
| **Backend/API:** | |
| - Next.js Server Actions (9 server functions identified) | |
| - Multiple LLM integrations: OpenAI, Anthropic Claude, Groq, Hugging Face | |
| - Multiple rendering engines: SDXL, Replicate, VideoChain API, OpenAI DALL-E | |
| - Image processing with Sharp, HTML2Canvas | |
| - Docker containerization | |
| **Key Dependencies:** | |
| - `@aitube/clap` - CLAP format support for multimedia projects | |
| - `@anthropic-ai/sdk` - Claude AI integration | |
| - `@huggingface/inference` - Hugging Face model access | |
| - `groq-sdk` - Groq API integration | |
| - `openai` - OpenAI API integration | |
| - `replicate` - Replicate.com API integration | |
| - Custom font handling with 13 different comic fonts | |
| ## Project Structure | |
| ``` | |
| src/ | |
| βββ app/ # Next.js app router | |
| β βββ engine/ # Core business logic | |
| β β βββ presets.ts # Comic style presets (678 lines, 4 main presets) | |
| β β βββ render.ts # Image generation engine | |
| β β βββ caption.ts # Caption generation | |
| β β βββ censorship.ts # Content filtering | |
| β βββ interface/ # UI components (22 directories) | |
| β β βββ page/ # Comic page layout | |
| β β βββ panel/ # Individual comic panels | |
| β β βββ bottom-bar/ # Controls and actions | |
| β β βββ settings-dialog/ # Configuration UI | |
| β β βββ ... | |
| β βββ queries/ # Server-side data fetching (13 files) | |
| β β βββ predict.ts # LLM prediction orchestration | |
| β β βββ predictNextPanels.ts # Panel generation logic | |
| β β βββ predictWith*.ts # Provider-specific implementations | |
| β β βββ ... | |
| β βββ store/ # Zustand state management | |
| β β βββ index.ts # Main app state (21KB) | |
| β βββ layouts/ # Comic layout definitions | |
| β βββ main.tsx # Main application component | |
| βββ components/ | |
| β βββ ui/ # shadcn/ui components (27 components) | |
| β βββ icons/ # Custom icons | |
| βββ lib/ # Utility functions (49 files) | |
| β βββ fonts.ts # Comic font definitions | |
| β βββ bubble/ # Speech bubble utilities | |
| β βββ [various utilities for image processing, parsing, etc.] | |
| βββ fonts/ # 13 custom comic fonts | |
| βββ types.ts # TypeScript type definitions (217 lines) | |
| ``` | |
| ## Development Commands | |
| ```bash | |
| # Development | |
| npm run dev # Start development server | |
| npm run build # Production build | |
| npm run start # Start production server | |
| npm run lint # ESLint checking | |
| # Node version | |
| nvm use # Uses Node v20.17.0 (specified in .nvmrc) | |
| ``` | |
| ## Environment Configuration | |
| The application requires extensive environment configuration in `.env.local`: | |
| **Core Engines:** | |
| - `LLM_ENGINE`: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "OPENAI" | "GROQ" | "ANTHROPIC" | |
| - `RENDERING_ENGINE`: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "REPLICATE" | "VIDEOCHAIN" | "OPENAI" | |
| **Authentication (configure only what you use):** | |
| - `AUTH_HF_API_TOKEN` - Hugging Face API token | |
| - `AUTH_OPENAI_API_KEY` - OpenAI API key | |
| - `AUTH_GROQ_API_KEY` - Groq API key | |
| - `AUTH_ANTHROPIC_API_KEY` - Anthropic/Claude API key | |
| - `AUTH_REPLICATE_API_TOKEN` - Replicate.com token | |
| - `AUTH_VIDEOCHAIN_API_TOKEN` - VideoChain API token | |
| **LLM Configuration:** | |
| - `LLM_HF_INFERENCE_API_MODEL` - Default: "HuggingFaceH4/zephyr-7b-beta" | |
| - `LLM_OPENAI_API_MODEL` - Default: "gpt-4-turbo" | |
| - `LLM_GROQ_API_MODEL` - Default: "mixtral-8x7b-32768" | |
| - `LLM_ANTHROPIC_API_MODEL` - Default: "claude-3-opus-20240229" | |
| **Rendering Configuration:** | |
| - `RENDERING_HF_INFERENCE_API_BASE_MODEL` - Default: "stabilityai/stable-diffusion-xl-base-1.0" | |
| - `RENDERING_REPLICATE_API_MODEL` - Default: "stabilityai/sdxl" | |
| - `MAX_NB_PAGES` - Default: 6 | |
| ## Architecture Patterns | |
| **State Management:** | |
| - Zustand store with typed selectors and actions | |
| - Complex state includes: panels, speeches, captions, renderedScenes, layouts | |
| - Real-time panel generation status tracking | |
| **LLM Integration Pattern:** | |
| - Abstracted provider interface through `predict()` function | |
| - Provider-specific implementations in separate files | |
| - Standardized prompt templates and response parsing | |
| - Support for multiple prompt formats (Zephyr, Llama, etc.) | |
| **Image Generation Flow:** | |
| 1. User provides prompt + selects preset | |
| 2. LLM generates panel descriptions, speech, and captions | |
| 3. Each panel description is sent to rendering engine | |
| 4. Images are generated and cached | |
| 5. User can edit speech bubbles and captions | |
| 6. Final comic can be exported as image or CLAP file | |
| **Server Actions Architecture:** | |
| - 9 server actions for LLM predictions and rendering | |
| - Clean separation between UI and server logic | |
| - Error handling and fallbacks for API failures | |
| **Comic Preset System:** | |
| - 4 main preset categories with 678 lines of configuration | |
| - Each preset defines: art style, color scheme, font, LLM prompts, image prompts | |
| - Extensible system for adding new comic styles | |
| **Font System:** | |
| - 13 custom comic fonts loaded as local fonts | |
| - Includes both Google Fonts (Indie Flower, The Girl Next Door) and custom fonts | |
| - Proper CSS variable integration for consistent typography | |
| ## Key Business Logic | |
| **Panel Generation (`predictNextPanels`):** | |
| - Generates multiple comic panels from a single prompt | |
| - Handles continuation of existing stories | |
| - Parses LLM responses into structured panel data (instructions, speech, captions) | |
| - Error recovery and retry logic | |
| **Rendering Pipeline (`render.ts`):** | |
| - Multi-provider image generation (Replicate, HF, OpenAI, VideoChain) | |
| - Automatic fallbacks between providers | |
| - Image caching and optimization | |
| - Support for different aspect ratios and resolutions | |
| **State Persistence:** | |
| - LocalStorage integration for user settings | |
| - CLAP file format support for project serialization | |
| - OAuth state management with Hugging Face | |
| ## Development Patterns | |
| **Component Organization:** | |
| - Feature-based component structure in `app/interface/` | |
| - Reusable UI components in `components/ui/` | |
| - Custom hooks in `lib/` for complex logic | |
| **Type Safety:** | |
| - Comprehensive TypeScript definitions in `types.ts` | |
| - Strict typing for LLM engines, rendering engines, and data flows | |
| - Generic interfaces for extensible provider support | |
| **Error Handling:** | |
| - Graceful degradation for API failures | |
| - User feedback through toast notifications | |
| - Fallback content for missing images/data | |
| **Performance Considerations:** | |
| - Image optimization with Sharp | |
| - Lazy loading of comic panels | |
| - Efficient state updates with Zustand | |
| - Canvas-based rendering for complex layouts | |
| ## Testing & Quality | |
| - **Linting**: ESLint with Next.js configuration | |
| - **No test files found** - this is an area for improvement | |
| - **Type checking**: Strict TypeScript configuration | |
| - **Docker**: Production containerization available | |
| ## Deployment | |
| - Designed for Hugging Face Spaces deployment | |
| - Docker containerization with Node.js Alpine | |
| - Standalone Next.js output for containerized deployment | |
| - Environment-based configuration for different deployment targets | |
| ## Community & Contributions | |
| - Open source project on Hugging Face | |
| - Community contributions documented in `CONTRIBUTORS.md` | |
| - Optional community sharing features | |
| - OAuth integration for user management | |
| ## Development Notes | |
| - **No API routes found** - uses Server Actions exclusively | |
| - **Canvas-based editing** with React Konva for interactive panels | |
| - **Multi-provider architecture** allows switching between AI services | |
| - **Extensive font library** for authentic comic typography | |
| - **CLAP format integration** for multimedia project export | |
| - **Rate limiting** configurable for production usage | |
| ## Quick Start for Developers | |
| 1. Copy `.env` to `.env.local` and configure your API keys | |
| 2. Choose your LLM_ENGINE and RENDERING_ENGINE | |
| 3. Install dependencies: `npm install` | |
| 4. Run development server: `npm run dev` | |
| 5. The app will guide you through first-time setup | |
| Most common development setup: | |
| - LLM_ENGINE: "OPENAI" with OpenAI API key | |
| - RENDERING_ENGINE: "REPLICATE" with Replicate token | |
| - This provides reliable, high-quality results for testing |