Extract speaker names, titles, companies, and bios from conference websites. Supports direct HTML scraping and Apify web scraper fallback for JS-heavy sites. Use for pre-event research and outreach targeting.
npx goose-skills install conference-speaker-scraper --claude # Installs to: ~/.claude/skills/conference-speaker-scraper/
Extract speaker names, titles, companies, and bios from conference website /speakers pages. Supports direct HTML scraping with multiple extraction strategies, plus Apify fallback for JS-heavy sites.
Only dependency is pip install requests. No API key needed for direct scraping mode.
# Scrape speakers from a conference page
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py \
--url "https://example.com/speakers"
# Use Apify for JS-heavy sites
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py \
--url "https://example.com/speakers" --mode apify
# Custom conference name (otherwise inferred from URL)
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py \
--url "https://example.com/speakers" --conference "Sage Future 2026"
# Output formats
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py --url URL --output json # default
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py --url URL --output csv
python3 skills/conference-speaker-scraper/scripts/scrape_speakers.py --url URL --output summaryFetches the page HTML and tries multiple extraction strategies in order, using whichever returns the most results:
<h2>/<h3> + <p> structures<script type="application/ld+json"> with speaker dataUses apify/cheerio-scraper actor with a custom page function that targets common speaker card selectors. Standard POST/poll/GET dataset pattern.
| Flag | Default | Description |
|---|---|---|
--url | required | Conference speakers page URL |
--conference | inferred | Conference name (otherwise inferred from URL domain) |
--mode | direct | direct (HTML scraping) or apify (Apify cheerio scraper) |
--output | json | Output format: json, csv, or summary |
--token | env var | Apify token (only needed for apify mode) |
--timeout | 300 | Max seconds for Apify run |
{
"name": "Jane Smith",
"title": "VP of Finance",
"company": "Acme Corp",
"bio": "Jane leads the finance transformation at...",
"linkedin_url": "https://linkedin.com/in/janesmith",
"image_url": "https://...",
"conference": "Sage Future 2026",
"source_url": "https://sagefuture2026.com/speakers"
}apify/cheerio-scraper -- minimal Apify creditsHTML scraping is inherently fragile across conference sites. The multi-strategy approach maximizes coverage, but JS-heavy sites will require Apify mode. When direct scraping returns 0 results, try --mode apify.
apify/cheerio-scraper -- minimal Apify creditsCheck and improve your brand's visibility across AI search engines (ChatGPT, Perplexity, Gemini, Grok, Claude, DeepSeek). Set up tracking, run visibility analyses, audit your website for AI readability, and get actionable recommendations. Uses the npx goose-aeo@latest CLI.
Extract competitor and customer intelligence from any company's landing page HTML. Discovers tech stack, analytics tools, ad pixels, customer logos, SEO metadata, CTAs, hidden elements, and more. No API keys required.
Discover all customers of a given company by scanning websites, case studies, review sites, press, social media, job postings, and more. Use when you need competitive intelligence on who a company sells to.