The grep and sed UNIX commands tend to come up when talking about basic text search and manipulation. Grep is a line-based search utility and is used primarily to return lines from a file, or files, that match a particular search term. Sed is similar, as in it is a line-by-line style utility, but is meant more for string replacement within lines of text. Many power users will combine the search functionalities of grep with the replacement abilities of sed for efficient text manipulation orders of magnitude faster than say, a Python script, in some cases.

##Grep

Grep is a line-by-line search utility which will look at each line of a file and check for a match against a given search term.

Say you have a unsorted file (or multiple files) of names…

Kumail Nanjiani
Thomas Middleditch
Amanda Crew
Josh Brener
T.J. Miller
Zach Woods
Martin Starr
Josh Chang

Then say you’d like to list only those people who’s first name is “Josh”. You could write a simple grep command to do this. The basic syntax of a grep command is a matcher (e.g. search term) and the search space (e.g. a file or files).

grep 'Josh .*' names.txt
Josh Brener
Josh Chang

In our example from above, grep will match lines which start with a word “Josh” followed by a space character then any combination of characters in the file names.txt

##Sed

Sed is also a line based tool most commonly used to replace the occurrence of one set of characters with another in a specified file. A contrived example is replacing all occurrences of the word “Josh” with “JOSH” in some files.

sed 's/Josh/JOSH/' names.txt
JOSH Brener
JOSH Chang

Now let’s say we’d like to take our list of names from the grep example and manipulate it so the names are in the format Last, First. We could do this with sed by surrounding parts of the matcher with parenthesis to remember them for use later.

sed 's/\(.*\) \(.*\)/\2, \1/' names.txt

We surround our first .* matcher in parenthesis to remember the sequence of characters before the space character, then a second set of parenthesis to remember all the characters after the space. We replay them back in the replacement part of the command with /1 indicating the first remembered pattern and /2 as our second remembered pattern.

Nanjiani, Kumail
Middleditch, Thomas
Crew, Amanda
Brener, Josh
Miller, T.J.
Woods, Zach
Starr, Martin
Chang, Josh

##Combining Grep and Sed

The real power of grep and sed comes from combining the powers of grep’s line matching and sed’s replacement. So in our previous names examples, if we wanted to search through a file, grab only the people who’s first name is Josh then format them to Last, First, we could take the output of the grep command and use it as input to the sed command with the UNIX pipe character

grep '^Josh .*$' names.txt | sed 's/\(.*\) \(.*\)/\2, \1/'
Brener, Josh
Chang, Josh

###A note about regular expressions

Both grep and sed utilize regular expressions (regex for short) to define which sequence of characters to match, and the rules for how to match them. Writing regex matchers is a subject on it’s own but certainly a skill needed at some level. More information about regular expressions can be found in this article, and if you prefer a more interactive introduction check out RegexOne.com