uniq: Find What Repeats

You want to know what appears most often in a file. Or remove duplicate lines. Or find lines that only appear once. That's uniq.

But here's the catch: uniq only catches adjacent duplicates. If duplicates aren't next to each other, uniq won't see them. That's why uniq almost always comes after sort.

The Golden Pattern

sort file.txt | uniq

Sort first, then uniq. This removes all duplicates.

Counting Occurrences

This is uniq's killer feature:

sort file.txt | uniq -c

Output:

      3 apple
      1 banana
      5 cherry

Now you know cherry appears 5 times.

The Full Frequency Analysis Pipeline

sort file.txt | uniq -c | sort -rn

1. Sort (so duplicates are adjacent) 2. Count duplicates 3. Sort by count, highest first

This answers "what are the most common items?" in any dataset.

Only Show Duplicates

sort file.txt | uniq -d

Lines that appear more than once.

Only Show Unique Lines

sort file.txt | uniq -u

Lines that appear exactly once.

Case-Insensitive

sort -f file.txt | uniq -i

"Apple" and "apple" count as the same.

Real Examples

Most common IP addresses in a log:

awk '{print $1}' access.log | sort | uniq -c | sort -rn | head

Find duplicate entries:

sort names.txt | uniq -d

Find entries that appear exactly once:

sort data.txt | uniq -u

Why Sort First?

uniq compares each line to the one before it. Given this unsorted input:

apple
banana
apple

uniq sees "apple", then "banana" (different!), then "apple" (different from banana!). It won't catch that apple appears twice.

After sorting:

apple
apple
banana

Now uniq sees the duplicate.

Quick Reference

| What you want | Command | |---------------|---------| | Remove duplicates | sort file \| uniq | | Count occurrences | sort file \| uniq -c | | Top frequencies | sort file \| uniq -c \| sort -rn | | Only duplicates | sort file \| uniq -d | | Only unique | sort file \| uniq -u |

Practice

uniq is essential in CTF challenges for frequency analysis - cracking simple ciphers, finding patterns in data, identifying anomalies.

Remember: sort first, then uniq. They're a package deal.