How to Use Trellian SiteSpider for Efficient Website Crawling
Overview
Trellian SiteSpider is a desktop website crawler that scans sites to map pages, find broken links, gather metadata (titles, meta descriptions), and identify SEO issues. Use it to audit structure, locate errors, and generate crawl reports.
Quick setup
- Download & install: Get the installer for Windows and run it.
- Create a new project: Enter the site URL and a project name; set a local save folder.
- Set crawl limits: Choose maximum pages to crawl and depth (recommended: start with depth 3).
- Robots and authentication: Enable obey robots.txt by default; add HTTP auth or cookies if crawling protected areas.
- User-agent & rate: Set a polite user-agent string and limit requests per second to avoid server load (0.5–2 req/s).
Recommended crawl settings for efficiency
- Start with sitemap (if available): Import sitemap.xml to target important pages first.
- Follow internal links only: Disable external domain crawling to save time.
- Adjust thread count: Use 4–8 threads depending on your machine and server tolerance.
- Exclude query strings: Ignore URL parameters that create duplicate content unless needed.
- Canonical handling: Respect canonical tags to avoid redundant URLs.
Prioritize useful checks
- Broken links (404s): Identify and export a list for fixes.
- Redirect chains: Find 3xx chains causing crawl inefficiency.
- Duplicate titles/meta descriptions: Spot and consolidate duplicative SEO tags.
- Page depth & orphan pages: Map depth to prioritize high-value shallow pages; find pages not linked from anywhere.
- Page size & load time: Flag very large resources slowing crawls.
Running the crawl
- Run a small test crawl (100–500 URLs) to validate settings.
- Review errors and unexpected exclusions (robots, auth).
- Run full crawl with chosen limits and export periodic snapshots if long-running.
Reporting & exports
- Export CSVs: Pages, links, errors, and metadata for spreadsheet analysis.
- Use filters: Filter by status code, depth, content type before exporting.
- Generate summary: Create an executive overview of top issues (broken links, large pages, duplicate tags).
Workflow tips
- Iterative approach: Fix high-impact issues, then re-crawl to verify.
- Schedule recurring audits: Monthly or after major site changes.
- Combine tools: Use SiteSpider outputs with Google Search Console, Screaming Frog, or site analytics for deeper insights.
- Document changes: Track fixes in a spreadsheet or ticketing system and note crawl dates.
Troubleshooting
- If pages are missing, check robots.txt and authentication settings.
- If crawl
Leave a Reply
You must be logged in to post a comment.