Scrapling - Advanced Web Scraping API

A powerful web scraping API with AI-powered content extraction, session management, and multiple scraping modes (HTTP, JavaScript rendering, and stealthy browser automation).

Features

🚀 REST API - FastAPI-based endpoints for programmatic access
🤖 AI-Powered Extraction - Natural language queries for content extraction
🔐 Session Management - Persistent sessions for efficient batch processing
🌐 Multiple Scraping Modes:
- Standard HTTP (fast, low protection)
- Dynamic fetching (JavaScript support)
- Stealthy browser (anti-bot bypass)
📊 Structured Output - Returns data in JSON, Markdown, HTML, or Text formats
🎨 Gradio UI - Interactive web interface for testing

API Endpoints

Base URL

https://grazieprego-scrapling.hf.space

Quick Reference

Endpoint	Method	Description
`/health`	GET	Check API status
`/api/scrape`	POST	Stateless scrape request
`/api/session`	POST	Create persistent session
`/api/session/{id}/scrape`	POST	Scrape using session
`/api/session/{id}`	DELETE	Close session
`/docs`	GET	API documentation (HTML)
`/api-docs`	GET	API documentation (JSON)

Usage Examples

1. Stateless Scrape (One-off requests)

curl -X POST https://grazieprego-scrapling.hf.space/api/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "query": "Extract all product prices",
    "model_name": "alias-fast"
  }'

2. Session-Based Scraping (Multiple requests)

import requests

# Create session
session = requests.post(
    'https://grazieprego-scrapling.hf.space/api/session',
    json={'model_name': 'alias-fast'}
)
session_id = session.json()['session_id']

try:
    # Multiple scrapes using the same session
    urls = [
        'https://example.com/page1',
        'https://example.com/page2',
        'https://example.com/page3'
    ]
    
    for url in urls:
        result = requests.post(
            f'https://grazieprego-scrapling.hf.space/api/session/{session_id}/scrape',
            json={'url': url, 'query': 'Extract product data'}
        )
        print(f"Scraped {url}: {result.json()}")
finally:
    # Always close the session
    requests.delete(f'https://grazieprego-scrapling.hf.space/api/session/{session_id}')

3. Using the Gradio UI

Visit the space URL and use the interactive interface:

Fetch (HTTP) tab: For standard HTTP scraping
Stealthy Fetch (Browser) tab: For sites with bot protection

API Documentation

HTML Docs: https://grazieprego-scrapling.hf.space/docs
JSON Docs: https://grazieprego-scrapling.hf.space/api-docs

Request Parameters

`/api/scrape` & `/api/session/{id}/scrape`

{
  "url": "https://example.com",
  "query": "Extract all headings and prices",
  "model_name": "alias-fast"
}

Parameters:

url (string, required): The URL to scrape
query (string, required): Natural language extraction instruction
model_name (string, optional): AI model to use (default: "alias-fast")

`/api/session`

{
  "model_name": "alias-fast"
}

Response Format

{
  "url": "https://example.com",
  "query": "Extract prices",
  "response": {
    "status": 200,
    "content": ["# Product 1: $19.99", "# Product 2: $29.99"],
    "url": "https://example.com"
  }
}

Best Practices

Use stateless endpoints for one-off requests
Use sessions for batch processing multiple URLs
Always close sessions when finished to free resources
Implement error handling - 500 errors may occur on complex sites
Add retry logic for production use
Respect rate limits - use responsibly

Error Handling

404: Session not found
500: Internal server error (check detail field for specifics)
Common issues:
- URL unreachable or timeout
- JavaScript-heavy sites may need stealthy_fetch
- Bot protection may block requests

Deployment

This space uses Docker with:

Python 3.11
FastAPI + Uvicorn
Gradio 5.x
Playwright for browser automation
Scrapling for advanced scraping

License

MIT License - See LICENSE file for details

Credits

Built with Scrapling - Advanced web scraping library

Note: This is a demonstration space. For production use, consider self-hosting with appropriate rate limiting and authentication.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support