uniq - Count and Remove Duplicates
Learn uniq for counting duplicates and finding repeated patterns. Essential for frequency analysis.
uniq: Find What Repeats
You want to know what appears most often in a file. Or remove duplicate lines. Or find lines that only appear once. That's uniq.
But here's the catch: uniq only catches adjacent duplicates. If duplicates aren't next to each other, uniq won't see them. That's why uniq almost always comes after sort.
The Golden Pattern
sort file.txt | uniq
Sort first, then uniq. This removes all duplicates.
Counting Occurrences
This is uniq's killer feature:
sort file.txt | uniq -c
Output:
3 apple
1 banana
5 cherry
Now you know cherry appears 5 times.
The Full Frequency Analysis Pipeline
sort file.txt | uniq -c | sort -rn
1. Sort (so duplicates are adjacent) 2. Count duplicates 3. Sort by count, highest first
This answers "what are the most common items?" in any dataset.
Only Show Duplicates
sort file.txt | uniq -d
Lines that appear more than once.
Only Show Unique Lines
sort file.txt | uniq -u
Lines that appear exactly once.
Case-Insensitive
sort -f file.txt | uniq -i
"Apple" and "apple" count as the same.
Real Examples
Most common IP addresses in a log:
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head
Find duplicate entries:
sort names.txt | uniq -d
Find entries that appear exactly once:
sort data.txt | uniq -u
Why Sort First?
uniq compares each line to the one before it. Given this unsorted input:
apple
banana
apple
uniq sees "apple", then "banana" (different!), then "apple" (different from banana!). It won't catch that apple appears twice.
After sorting:
apple
apple
banana
Now uniq sees the duplicate.
Quick Reference
| What you want | Command |
|---------------|---------|
| Remove duplicates | sort file \| uniq |
| Count occurrences | sort file \| uniq -c |
| Top frequencies | sort file \| uniq -c \| sort -rn |
| Only duplicates | sort file \| uniq -d |
| Only unique | sort file \| uniq -u |
Practice
uniq is essential in CTF challenges for frequency analysis - cracking simple ciphers, finding patterns in data, identifying anomalies.
Remember: sort first, then uniq. They're a package deal.