Scrapling - Advanced Web Scraping API
A powerful web scraping API with AI-powered content extraction, session management, and multiple scraping modes (HTTP, JavaScript rendering, and stealthy browser automation).
Features
- π REST API - FastAPI-based endpoints for programmatic access
- π€ AI-Powered Extraction - Natural language queries for content extraction
- π Session Management - Persistent sessions for efficient batch processing
- π Multiple Scraping Modes:
- Standard HTTP (fast, low protection)
- Dynamic fetching (JavaScript support)
- Stealthy browser (anti-bot bypass)
- π Structured Output - Returns data in JSON, Markdown, HTML, or Text formats
- π¨ Gradio UI - Interactive web interface for testing
API Endpoints
Base URL
https://grazieprego-scrapling.hf.space
Quick Reference
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Check API status |
/api/scrape |
POST | Stateless scrape request |
/api/session |
POST | Create persistent session |
/api/session/{id}/scrape |
POST | Scrape using session |
/api/session/{id} |
DELETE | Close session |
/docs |
GET | API documentation (HTML) |
/api-docs |
GET | API documentation (JSON) |
Usage Examples
1. Stateless Scrape (One-off requests)
curl -X POST https://grazieprego-scrapling.hf.space/api/scrape \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"query": "Extract all product prices",
"model_name": "alias-fast"
}'
2. Session-Based Scraping (Multiple requests)
import requests
# Create session
session = requests.post(
'https://grazieprego-scrapling.hf.space/api/session',
json={'model_name': 'alias-fast'}
)
session_id = session.json()['session_id']
try:
# Multiple scrapes using the same session
urls = [
'https://example.com/page1',
'https://example.com/page2',
'https://example.com/page3'
]
for url in urls:
result = requests.post(
f'https://grazieprego-scrapling.hf.space/api/session/{session_id}/scrape',
json={'url': url, 'query': 'Extract product data'}
)
print(f"Scraped {url}: {result.json()}")
finally:
# Always close the session
requests.delete(f'https://grazieprego-scrapling.hf.space/api/session/{session_id}')
3. Using the Gradio UI
Visit the space URL and use the interactive interface:
- Fetch (HTTP) tab: For standard HTTP scraping
- Stealthy Fetch (Browser) tab: For sites with bot protection
API Documentation
- HTML Docs: https://grazieprego-scrapling.hf.space/docs
- JSON Docs: https://grazieprego-scrapling.hf.space/api-docs
Request Parameters
/api/scrape & /api/session/{id}/scrape
{
"url": "https://example.com",
"query": "Extract all headings and prices",
"model_name": "alias-fast"
}
Parameters:
url(string, required): The URL to scrapequery(string, required): Natural language extraction instructionmodel_name(string, optional): AI model to use (default: "alias-fast")
/api/session
{
"model_name": "alias-fast"
}
Response Format
{
"url": "https://example.com",
"query": "Extract prices",
"response": {
"status": 200,
"content": ["# Product 1: $19.99", "# Product 2: $29.99"],
"url": "https://example.com"
}
}
Best Practices
- Use stateless endpoints for one-off requests
- Use sessions for batch processing multiple URLs
- Always close sessions when finished to free resources
- Implement error handling - 500 errors may occur on complex sites
- Add retry logic for production use
- Respect rate limits - use responsibly
Error Handling
- 404: Session not found
- 500: Internal server error (check
detailfield for specifics) - Common issues:
- URL unreachable or timeout
- JavaScript-heavy sites may need
stealthy_fetch - Bot protection may block requests
Deployment
This space uses Docker with:
- Python 3.11
- FastAPI + Uvicorn
- Gradio 5.x
- Playwright for browser automation
- Scrapling for advanced scraping
License
MIT License - See LICENSE file for details
Credits
Built with Scrapling - Advanced web scraping library
Note: This is a demonstration space. For production use, consider self-hosting with appropriate rate limiting and authentication.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support