WebCrawler v2.0

🪏 WebCrawler (v2.0)

Turn the Web into Data.
A premium, AI-ready web crawler that extracts clean Markdown and detects technology stacks.

✨ Core Features

🚀 Deep Extraction Engine

Pure Markdown: Converts chaotic HTML into clean, semantic Markdown suitable for LLM training or notes.
Smart Cleaning: Automatically removes ads, popups, navigation bars, and footers.
Relative Link Resolution: Ensures all images and links work by converting relative paths to absolute URLs.

🧠 Tech Stack Intelligence

Heuristic Detection: Identifies frontend frameworks (React, Vue, Next.js, Tailwind) even when obfuscated.
Server-Side Analysis: Detects underlying server technologies matches via builtwith.
Visual Grid: Results are displayed in a modern, "Nexus" style grid card layout.

🎨 Premium "Nexus" UI

Tailwind CSS: Custom-built dark theme with noise textures and glassmorphism.
Responsive Design: Fully optimized for Mobile, Tablet, and Desktop.
Interactive Editor: Integrated QuillJS editor to refine your content before export.

🛠️ Developer Extensions

Vercel Ready: Configured for instant serverless deployment.
API Structure: Modular crawler.py service easy to integrate into other apps.
Live Docs: Documentation served directly within the app at /docs.

🚀 Quick Start

Local Development

Clone the Repository bash git clone https://github.com/raksitbell/webcrawler.git cd webcrawler
Install Dependencies bash pip install -r requirements.txt
Run the App bash python run.py Access the app at http://127.0.0.1:5001.

☁️ Deploy to Vercel

This project is configured for Vercel.

Install Vercel CLI: npm i -g vercel
Run vercel in the project directory.

📚 Documentation

Detailed documentation is available in the docs folder or via the Documentation link in the app.

Tech Stack Deep Dive: Architecture, libraries, and design patterns.
Changelog: History of version updates.

📄 License

MIT License.