tr

Translate, squeeze, and/or delete characters.

Syntax
      tr [options]... SET1 [SET2]

Options

   -c, -C, --complement
         Use the complement of SET1 

   -d, --delete
         Delete characters in SET1, do not translate.

   -s, --squeeze-repeats
         Replace each input sequence of a repeated character that is listed in SET1 with a single
         occurrence of that character.

   -t, --truncate-set1
         First truncate SET1 to length of SET2

   --help
         Display this help and exit.

   --version
         Output version information and exit.

'tr' copies standard input to standard output, performing one of the following operations:

The SET1 and (if given) SET2 arguments define ordered sets of characters, referred to below as SET1 and SET2. These sets are the characters of the input that 'tr' operates on. The '--complement' ('-c') option replaces SET1 with its complement (all of the characters that are not in SET1).

Specifying sets of characters

The format of the SET1 and SET2 arguments resembles the format of regular expressions; however, they are not regular expressions, only lists of characters. Most characters just represent themselves in these strings, but the strings can contain the shorthands listed below, for convenience. Some of them can be used only in SET1 or SET2, as noted below.

Backslash escapes
A backslash followed by a character not listed below causes an error message.

   \a          Audible BEL Control-G.

   \b          Backspace  Control-H.

   \f          Form feed  Control-L.

   \n          New line   Control-J.

   \r          Return     Control-M.

   \t          Horizontal tab  Control-I.

   \v          Vertical tab    Control-K.

   \OOO        The character with the value given by OOO, which is 1 to 3 octal digits,

   \\          A backslash.

Ranges

The notation M-N expands to all of the characters from M through N, in ascending order. M should collate before N; if it doesn't, an error results. As an example, '0-9' is the same as '0123456789'.

Although GNU tr does not support the System V syntax that uses square brackets to enclose ranges, translations specified in that format will still work as long as the brackets in STRING1 correspond to identical brackets in STRING2.

Repeated characters

The notation [C*N] in SET2 expands to N copies of character C. Thus, [y*6]'is the same as yyyyyy.

The notation [C*] in STRING2 expands to as many copies of C as are needed to make SET2 as long as SET1.

If N begins with '0', it is interpreted in octal, otherwise in decimal.

Character classes

The notation [:CLASS:] expands to all of the characters in the (predefined) class CLASS.

The characters expand in no particular order, except for the 'upper' and 'lower' classes, which expand in ascending order. When the --delete (-d) and --squeeze-repeats (-s) options are both given, any character class can be used in SET2. Otherwise, only the character classes 'lower' and 'upper' are accepted in SET2, and then only if the corresponding character class ('upper' and 'lower', respectively) is specified in the same relative position in SET1. Doing this specifies case conversion.

The class names are given below; an error results when an invalid class name is given.

    'alnum'    Letters and digits.

    'alpha'    Letters.

    'blank'    Horizontal whitespace.

    'cntrl'    Control characters.

    'digit'    Digits.

    'graph'    Printable characters, not including space.

    'lower'    Lowercase letters.

    'print'    Printable characters, including space.

    'punct'    Punctuation characters.

    'space'    Horizontal or vertical whitespace.

    'upper'    Uppercase letters.

    'xdigit'   Hexadecimal digits.

Equivalence classes

The syntax [=C=] expands to all of the characters that are equivalent to C, in no particular order. Equivalence classes are a relatively recent invention intended to support non-English alphabets. But there seems to be no standard way to define them or determine their contents. Therefore, they are not fully implemented in GNU 'tr'; each character's equivalence class consists only of that character, which is of no particular use.

Translating

'tr' performs translation when SET1 and SET2 are both given and the '--delete' ('-d') option is not given.
'tr' translates each character of its input that is in SET1 to the corresponding character in SET2.

Characters not in SET1 are passed through unchanged. When a character appears more than once in SET1 and the corresponding characters in SET2 are not all the same, only the final one is used.
For example, these two commands are equivalent:

     tr aaa xyz
     tr a z

A common use of 'tr' is to convert lowercase characters to uppercase. This can be done in many ways. Here are three of them:

     tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
     tr a-z A-Z
     tr '[:lower:]' '[:upper:]'

When 'tr' is performing translation, SET1 and SET2 typically have the same length. If SET1 is shorter than SET2, the extra characters at
the end of SET2 are ignored.

On the other hand, making SET1 longer than SET2 is not portable; POSIX.2 says that the result is undefined. In this situation, BSD 'tr'
pads SET2 to the length of SET1 by repeating the last character of SET2 as many times as necessary. System V 'tr' truncates SET1 to the length of SET2.

By default, GNU 'tr' handles this case like BSD 'tr'. When the '--truncate-set1' ('-t') option is given, GNU 'tr' handles this case like the System V 'tr' instead. This option is ignored for operations other than translation.

Acting like System V 'tr' in this case breaks the relatively common BSD idiom:

tr -cs A-Za-z0-9 '\012'

because it converts only zero bytes (the first element in the complement of SET1), rather than all non-alphanumerics, to newlines.

Squeezing repeats and deleting

When given just the '--delete' ('-d') option, 'tr' removes any input characters that are in SET1.

When given just the '--squeeze-repeats' ('-s') option, 'tr' replaces each input sequence of a repeated character that is in SET1 with a
single occurrence of that character.

When given both '--delete' and '--squeeze-repeats', 'tr' first performs any deletions using SET1, then squeezes repeats from any
remaining characters using SET2.

The '--squeeze-repeats' option can also be used when translating, in which case 'tr' first performs translation, then squeezes repeats from
any remaining characters using SET2.

Here are some examples to illustrate various combinations of options:

Remove all zero bytes:

   tr -d '\000'

Put all words on lines by themselves. This converts all non-alphanumeric characters to newlines, then squeezes each string of repeated newlines into a single newline:

   tr -cs 'a-zA-Z0-9' '[\n*]'

Convert each sequence of repeated newlines to a single newline:

   tr -s '\n'

Find doubled occurrences of words in a document. For example, people often write "the the" with the duplicated words separated by a newline. The bourne shell script below works first by converting each sequence of punctuation and blank characters to a single newline.
That puts each "word" on a line by itself.
Next it maps all uppercase characters to lower case, and finally it runs uniq with the -d option to print out only the words that were adjacent duplicates.

          #!/bin/sh
          cat "$@" \
            | tr -s '[:punct:][:blank:]' '\n' \
            | tr '[:upper:]' '[:lower:]' \
            | uniq -d

Warning messages

Setting the environment variable 'POSIXLY_CORRECT' turns off the following warning and error messages, for strict compliance with POSIX.2. Otherwise, the following diagnostics are issued:

1. When the --delete option is given but --squeeze-repeats is not, and SET2 is given, GNU 'tr' by default prints a usage message and exits, because SET2 would not be used. The POSIX specification says that SET2 must be ignored in this case. Silently ignoring arguments is a bad idea.

2. When an ambiguous octal escape is given. For example, \400 is actually \40 followed by the digit 0, because the value 400 octal does not fit into a single byte.

GNU 'tr' does not provide complete BSD or System V compatibility. For example, it is impossible to disable interpretation of the POSIX constructs '[:alpha:]', '[=c=]', and '[c*10]'. Also, GNU 'tr' does not delete zero bytes automatically, unlike traditional Unix versions, which provide no way to preserve zero bytes.

Examples

Swap the case of a string:

$ echo "Hello World" | tr "A-Za-z" "a-zA-Z"
hELLO wORLD

Make an entire file uppercase:

$ cat file_of_lower_case_text | tr "[a-z]" "[A-Z]"
or
$ tr "[:lower:]" "[:upper:]" < file_of_lower_case_text

Make a string lower case:

$ echo "Hello World" | tr "[:upper:]" "[:lower:]"
hello world

As a function:

$ toLower() {
echo $1 | tr "[:upper:]" "[:lower:]"
}

$ toLower SomeMixEDCaseText
somemixedcasetext

Split the path into its elements, one per line:

$ echo $PATH | tr ":" "\n" | sort

Swap braces with parentheses and vice versa:

$ echo "brackets demo(){}swap" | tr '{}()' '(){}'
brackets demo{}()swap

ROT 13 a string, the 13th letter is 'm':

$ echo 'Hello world' | tr 'A-Za-z' 'N-ZA-Mn-za-m'
Uryyb jbeyq

$ echo 'Uryyb jbeyq' | tr 'A-Za-z' 'N-ZA-Mn-za-m'
Hello world

If the string is all lower case then the ROT13 transform can be simplified:

$ echo 'hello world' | tr 'a-z' 'n-za-m'
uryyb jbeyq

Create an alias to perform ROT13:

$ alias rot13="tr '[A-Za-z]' '[N-ZA-Mn-za-m]'"

"Chance is always powerful. - Let your hook be always cast; in the pool where you least expect it, there will be a fish" ~ Ovid

Related linux commands

gawk - Find and Replace text within file(s).
grep - Search file(s) for lines that match a given pattern.
Equivalent Windows command: FINDSTR - Search for strings in files.


 
Copyright © 1999-2024 SS64.com
Some rights reserved