WebCrawler (v2.0)
Turn the Web into Data.
A premium, AI-ready web crawler that extracts clean Markdown and detects technology stacks.
✨ Core Features
🚀 Deep Extraction Engine
- Pure Markdown: Converts chaotic HTML into clean, semantic Markdown suitable for LLM training or notes.
- Smart Cleaning: Automatically removes ads, popups, navigation bars, and footers.
- Relative Link Resolution: Ensures all images and links work by converting relative paths to absolute URLs.
🧠 Tech Stack Intelligence
- Heuristic Detection: Identifies frontend frameworks (React, Vue, Next.js, Tailwind) even when obfuscated.
- Server-Side Analysis: Detects underlying server technologies matches via
builtwith. - Visual Grid: Results are displayed in a modern, "Nexus" style grid card layout.
🎨 Premium "Nexus" UI
- Tailwind CSS: Custom-built dark theme with noise textures and glassmorphism.
- Responsive Design: Fully optimized for Mobile, Tablet, and Desktop.
- Interactive Editor: Integrated QuillJS editor to refine your content before export.
🛠️ Developer Extensions
- Vercel Ready: Configured for instant serverless deployment.
- API Structure: Modular
crawler.pyservice easy to integrate into other apps. - Live Docs: Documentation served directly within the app at
/docs.
🚀 Quick Start
Local Development
-
Clone the Repository
bash git clone https://github.com/raksitbell/webcrawler.git cd webcrawler -
Install Dependencies
bash pip install -r requirements.txt -
Run the App
bash python run.pyAccess the app athttp://127.0.0.1:5001.
☁️ Deploy to Vercel
This project is configured for Vercel.
- Install Vercel CLI:
npm i -g vercel - Run
vercelin the project directory.
📚 Documentation
Detailed documentation is available in the docs folder or via the Documentation link in the app.
- Tech Stack Deep Dive: Architecture, libraries, and design patterns.
- Changelog: History of version updates.
📄 License
MIT License.