Phrasit

Search Phrasit

Search every tool, guide, and citation page.

FREE - URLS - CSV EXPORT

Link Extractor

Extract URLs from pasted text, deduplicate them, summarize domains, and copy results as a list or CSV.

3 links3 domains
/tools/word-counter
https://example.com/docs
https://phrasit.example/tools
Domain summary
(relative)1
example.com1
phrasit.example1

Finding URLs in messy copy

Use the link extractor when a document, email thread, transcript, or pasted HTML contains more links than you want to collect by hand. Deduplicated output is useful for source lists, QA checks, redirect reviews, and turning notes into a clean reading queue.

The domain summary helps spot repeated hosts and accidental tracking links. For public pages, still open important URLs before publishing so you catch redirects, expired pages, or links that require authentication.

About the Link extractor

The link extractor scans text and lifts out every URL it can find: absolute http and https links, www addresses, and root-relative paths that start with a slash. It removes duplicates, trims trailing punctuation that often clings to a link in prose, and groups the results by domain so you can see at a glance where they all point.

It is built for the moment you have a wall of text, an email, a Markdown file, or a chunk of scraped HTML, and you just want the links. Use it to audit outbound links in a draft, collect every reference from a document, or pull a quick site map from copied page source. It works entirely in the browser, with no upload and no link following.

How to use it

  1. Paste the text or markup containing links into the input area.
  2. Decide whether to keep relative paths: turn on Ignore relative to drop entries that start with a slash.
  3. Turn on HTTP/HTTPS only if you want to exclude bare www and relative matches and keep just fully qualified links.
  4. Keep Sort by domain on to cluster links from the same site together, which makes duplicates and patterns obvious.
  5. Check the domain summary for per-site counts, then Copy list for plain URLs or Copy CSV for a url,domain table.

Examples

Auditing outbound links in a draft

Paste an article that links to https://example.com/docs twice and to a partner site once. The extractor dedupes the repeated link and the domain summary shows example.com at 2 and the partner domain at 1, so you instantly see which sites your draft sends readers to most.

Separating relative paths from real URLs

Copy a navigation block containing /tools/word-counter and https://phrasit.example/tools. With Ignore relative on, the slash-prefixed path drops out and only the absolute URL remains. Leave it off and the relative path is labelled (relative) in the domain summary instead.

Frequently asked questions

How does it handle punctuation stuck to a link?
Trailing dots, commas, semicolons, colons, exclamation marks, and question marks are stripped, so a sentence ending in 'see https://example.com.' yields https://example.com without the full stop. Punctuation inside the path is kept.
Does it follow links or check if they work?
No. It only extracts the text of each URL. It never sends a request, so it cannot tell you whether a link is live, redirected, or broken. Pair it with a link checker if you need status codes.
What does (relative) or (invalid) mean in the summary?
Relative links that begin with a slash are grouped under (relative) because they have no host. Anything that cannot be parsed into a valid URL is grouped under (invalid), which usually flags a malformed or truncated match.
Will it catch links written as www.example.com without http?
Yes, bare www addresses are matched. For the domain summary they are treated as https so the host can be read. Turn on HTTP/HTTPS only if you would rather exclude them and keep fully qualified links.
Can I get the results split into a spreadsheet?
Copy CSV outputs two quoted columns, url and domain, on each line. The quoting keeps the file valid when a URL contains a comma, and the domain column lets you filter or pivot by site.

Good to know

The matcher is deliberately broad so it catches links even when they are not wrapped in anchor tags, which is common in plain text, email, and Markdown. The trade-off is that it can occasionally grab a path-like fragment that was never meant to be a link, so a quick skim of the output is worth it on noisy input.

Because it reads only what is in the box, it sees links exactly as written. If a page builds URLs with JavaScript or hides them behind shorteners, those final destinations will not appear unless the resolved URL is already present in the text. For collecting references from a document or tidying a list of citations, sort by domain first: clustering by host is the quickest way to spot duplicates and to confirm every link points where you expect.

Related tools