Remove (Delete) Duplicate Email Addresses in Text Files — 5 Simple Ways

Remove (Delete) Duplicate Email Addresses in Text Files — 5 Simple Ways

1) Use sort + uniq (Linux/macOS)

  • Command: sort emails.txt | uniq > deduped.txt
  • Preserves one instance of each exact line. Use sort -u to combine steps.
  • To keep original order, use awk/perl methods below.

2) awk to preserve first occurrence order

  • Command: awk ‘!seen[$0]++’ emails.txt > deduped.txt
  • Keeps the first appearance of each exact line and removes later duplicates.

3) Python script for flexible parsing

  • Example (handles emails within larger text and normalizes case):

Code

import re with open(‘emails.txt’) as f:text = f.read() emails = re.findall(r’[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}‘, text) seen, out = set(), [] for e in emails:

k = e.lower() if k not in seen:     seen.add(k)     out.append(e) 

with open(‘deduped.txt’,‘w’) as f:

f.write(" 

”.join(out))

4) PowerShell (Windows)

  • Command: Get-Content emails.txt | Sort-Object -Unique | Set-Content deduped.txt
  • To preserve first occurrence order:

Code

\(seen = @{} </span>Get-Content emails.txt | ForEach-Object { if (-not \)seen.ContainsKey(\(_)) { \)seen[\(_] = \)true; $_ } } | Set-Content deduped.txt

5) Text editors / spreadsheet tools

  • Use editors with regex find/replace (e.g., VS Code) or import into Excel/Sheets and use “Remove duplicates”.
  • Good for small files and visual review; prone to manual error on large files.

Tips & considerations

  • Normalization: lowercase emails, trim whitespace, remove surrounding punctuation before deduping.
  • Email parsing: use robust regex or libraries for complex text; avoid naive patterns that capture invalid strings.
  • Large files: use streaming approaches (awk, Python iterator, or external tools) to avoid high memory use.
  • Back up original file before changes.
  • If you need a ready-to-run script for your platform or want handling for emails embedded in paragraphs, tell me your OS and file sample.

Comments

Leave a Reply