grep
- searching the haystack made easy
When you need to search a piece of text from a very large source, grep
command is the answer. grep
accepts strings, regular expressions and it can produce output in various formats.
The most basic syntax to search a piece of text in a file is:
grep "pattern" file_name
If we need to search multiple files for a pattern, we can do the following:
grep "pattern" file1 file2 file3 ...
grep
can also be used with standard input.
echo "linux is awesome" | grep linux
If no file or directory is specified, grep
reads the STDIN
for input.
# Highlight the matched text
One major feature of grep is it's ability to highlight the matched string in its output. There are 3 color options: auto
, always
and never
. To use the color options, we can pass an additional --color
flag to instruct which option to use.
grep "pattern" file_name --color=always
As we can already guess, with --color=always
option it will always color the output matched string. With --color=auto
, grep
displays color in the output only if the output is not piped to a command or redirected to any file. The last option, --color=never
will turn off the coloring.
If you want to turn on coloring for every grep
operation, but feeling lazy to type --color=always
with each command (which would be me), you could add GREP_OPTIONS
to your environment.
export GREP_OPTIONS='--color=always'
This will turn on coloring for all grep
commands.
# Limit what to output
Normally, grep
only outputs the line that contains the matched pattern. But there are circumstances where we might want to see the contents of the entire file with the matched pattern highlighted. We can do the following for this:
grep --color 'pattern\|' file_name
This will output all the lines containing the pattern as well as the lines that have an end, i.e the complete contents of the file with only the matched part colored. Of course, we can pass multile patterns to be matched.
grep --color 'pattern_one\|pattern_two\|' file_name
Notice the \
before |
. This is because grep
only interprets some of the special characters. However we can also write commands without \
.
grep -E --color 'pattern_one|pattern_two|' file_name
The -E
option instructs grep
to use full set of regular expressions. However, there is a better way of writing such patterns containing regular expressions . We can use egrep
, which is an extended version of grep
that supports extended regular expressions out of the box. Therefore, we can rewrite the above commands like this.
egrep --color 'pattern_one|pattern_two|' file_name
But what if we neither need the complete line that contains the matching text nor the complete contents, rather just the matching content?
For example, if we are searching for all the email addresses in a file, we do not need the line that contains the email, rather we just need all the emails that appears in the file.
Well, we can use the -o
option, which stands for only matching, that limits the output to only the matched text.
egrep -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" file_name
Neat, isn't it?
What if we need to see all the lines except the ones where the given pattern appears?
grep -v "pattern" file_name
The -v
option inverts the results and thus the above command will output all the contents except the lines containing the matched pattern.
# Count the appearances
If we need to get a count of how many times a pattern appears in a file, we can use the -c
flag. From the above email pattern example, if we need to see how many email addresses are there in a file, we could do the following.
egrep -c "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" file_name
But wait, there is a gotcha.
The -c
counts only the number of lines that contains the pattern, not the actual number of matched pattern. So if there are multiple matches in a single line, it will count only once!
We can overcome this with a little tweak.
Remember how we printed only the matching portions with the -o
flag above. If pipe wc -l
with it,
egrep -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" file_name | wc -l
it will give us the exact number of times a particular pattern appears in a file.
# Extract additional information
If we want to print the line numbers where the searched pattern appears, we can use the -n
flag.
grep -n --color "pattern" file_name
The -n
flag will also print the name of the file, in case we are searching in multiple files.
If we want to see the offset where the pattern starts, we can use the -b
flag.
echo "linux is the answer" | grep -b -o "the"
This is print 9:the
as the searched word "the" starts at the 9th position of the line. Note that that -b
flag is always used with the -o
flag.
# Search within directories
To see which files contain a particular pattern , we can use the -l
flag.
grep -l "pattern" file1 file2 file3 ...
This will output only the file names contains the given pattern. If we want to see the name of the files that does not contain the files, we can use -L
flag (note the case-sensitive flags). This basically inverts results we get with -l
flag.
Well, so far we have only seen how to search in a single file or a set of file which are specified in the command. However, we can also search all the files in a complete directory and it's subdirectories recursively.
grep -r "pattern" .
Here .
specifies the current directory. The -r
flag implies that the search should be done recursively in all subdirectories. This search is case-sensitive.
If we want to search all occurances of the pattern, we can append -i
flag.
grep -ir "pattern" .
This will output all matching patterns irrespective of the case.
# The pattern file
If we have too many patterns, strings etc that we want to search, piping them with |
is very tedious. In such scenarios, we can use a pattern file, that contains all the strings/patterns we want to search, each in a new line. Then we can use that pattern file in the grep
command.
grep -ir . -f pattern_file
This will print all the matching strings from the pattern file.
ヽ(´▽`)/
(You can find Part 1 of this series here )
Top comments (3)
Hey, i like this series! keep going. I am using grep often, but now i know i was using only fraction of stuff it can do. i was using constructions like:
cat file | grep pattern
silly me :)
Hey, I'm glad that this is of help to you. I plan to write another 2 articles in this series :) Let's see.
2 more? wow, i a looking forward! :)