Cut Command Examples

来源:互联网 发布:学生手机兼职赚钱软件 编辑:程序博客网 时间:2024/06/06 10:56

                                               

About cut

Cut out selected fields of each line of a file. Cut command can be used to display only specific columns from a text file or other command outputs.

Syntax

[skypeGNU@localhost ~]$ cut --help                                                

Usage: cut OPTION... [FILE]...                                                    Print selected parts of lines from each FILE to standard output.                   Mandatory arguments to long options are mandatory for short options too.  -b, --bytes=LIST            select only these bytes                         -c, --characters=LIST           select only these characters                    -d, --delimiter=DELIM      use DELIM instead of TAB for field delimiter    -f, --fields=LIST               select only these fields;  also print any line that contains no                                 delimiter character, unless the -s option is specified                    -n                          with -b: don't split multibyte characters          --complement            complement the set of selected bytes, characters or fields                                       -s, --only-delimited            do not print lines not containing delimiters          --output-delimiter=STRING  use STRING as the output delimiter the                            default is to use the input delimiter     Use one, and only one of -b, -c or -f.  Each LIST is made up of one range, or many ranges separated by commas. Selected input is written in the same order that it is read, and is written exactly once.Each range is one of:   N     N'th byte, character or field, counted from 1  N-    from N'th byte, character or field, to end of line  N-M   from N'th to M'th (included) byte, character or field  -M    from first to M'th (included) byte, character or field        With no FILE, or when FILE is -, read standard input.  

-b list    The list following -b specifies byte positions (for instance, -b1-72 would pass  the first 72bytes of each line). When -b and -n are used together, list is adjusted         so that no multi-byte character is split. If -b is used, the input line should contain 1023 bytes or less.

 

-c list    The list following -c specifies character positions (for instance, -c1-72 would pass the first 72 characters of each line). 


-f list    The list following -f is a list of fields assumed to be separated in the file by a delimiter character (see -d ); for instance, -f1,7 copies the first and seventh field only. Lines with no field delimiters will be passed through intact (useful for table subheadings), unless -s is specified. If -f is used, the input line should contain 1023 characters or less. 


list        A comma-separated or blank-character-separated list of integer field numbers (in increasing order), with optional - to indicate ranges (for instance, 1,4,7;         1-3,8; -5,10(short for 1-5,10); or 3- (short for third through last field)). 


-n        Do not split characters. When -b list and -n are used together, list is adjusted so that no multi-byte character is split. -d delim The character following -d is the field delimiter (-f option only).Default is tab. Space or other characters with special meaning to the shell must be quoted. delim can be a multi-byte character. 


-s        Suppresses lines with no delimiter characters in case of -f option. Unless  specified, lines with no delimiters will be passed through untouched. file A path name of an input file. If no file operands are specified, or if a file operand is -, the standard input will be used.  

Examples

Following are some of the examples.

For most of the example, we’ll be using the following test file. 

[skypeGNU@localhost ~]$ cat test.txt

cat command for file oriented operations.cp command for copy files or directories.ls command to list out files and directories with its attributes. 

1. Select Column of Characters

To extract only a desired column from a file use -c option.

The following example displays 2nd character from each line of a file test.txt

[skypeGNU@localhost ~]$ cut -c2 test.txt

aps 

As seen above, the characters a, p, s are the second character from each line of the test.txt file.

2. Select Column of Characters using Range

Range of characters can also be extracted from a file by specifying start and end position delimited with -. The following example extracts first 3 characters and the 5rd character of each line from a file called test.txt

[skypeGNU@localhost ~]$ cut -b1-3,5 test.txt

catccp ols o 

这里需要注意的是:cut命令如果使用了-b,-c或-f选项,那么执行此命令时,cut会先把后面所有的范围进行从小到大排序,然后再提取。所以想通过指定范围的方式来排列特定的字符[字节/域]顺序是行不通的。

[skypeGNU@localhost ~]$ cut -b5,1-3 test.txt

catccp ols o 

3. Select Column of Characters using either Start or End Position

Either start position or end position can be passed to cut command with -c option.The following specifies only the start position before the ‘-’. This example extracts from 4rd character to end of each line from test.txt file.

[skypeGNU@localhost ~]$ cut -c4- test.txt 

 command for file oriented operations.command for copy files or directories.command to list out files and directories with its attributes.  

The following specifies only the end position after the ‘-’. This example extracts 5 characters from the beginning of each line from test.txt file. 

[skypeGNU@localhost ~]$ cut -c-5 test.txt

cat ccp cols co 

4. Select a Specific Field from a File

Instead of selecting x number of characters, if you like to extract a whole field, you can combine option -f and -d. The option -f specifies which field you want to extract, and the option -d specifies what is the field delimiter that is used in the input file.The following example displays only first field of each lines from /etc/passwd file using the field delimiter : (colon). In this case, the 1st field is the username. The file

[skypeGNU@localhost ~]$ head -5 /etc/passwd | cut -d':' -f1

rootbindaemonadmlp 

5. Select Multiple Fields from a File

You can also extract more than one fields from a file or stdout. Below example displays username and home directory of users who has the login shell as “/bin/bash”.[skypeGNU@localhost ~]$ grep '/bin/bash' /etc/passwd | cut -d':' -f1,6

root:/rootskypeGNU:/home/skypeGNU  

To display the range of fields specify start field and end field as shown below. In this example, we are selecting field 1 through 4, 6 and 7

[skypeGNU@localhost ~]$ grep '/bin/bash' /etc/passwd | cut -d':' -f1-4,6,7

root:x:0:0:/root:/bin/bashskypeGNU:x:500:500:/home/skypeGNU:/bin/bash 

6. Select Fields Only When a Line Contains the Delimiter

In our /etc/passwd example, if you pass a different delimiter other than : (colon), cut will just display the whole line.In the following example, we’ve specified the delimiter as | (pipe), and cut command simply displays the whole line, even when it doesn’t find any line that has | (pipe) as delimiter.

[skypeGNU@localhost ~]$ head -3 /etc/passwd | cut -d'|' -f1

root:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/nologin  

But, it is possible to filter and display only the lines that contains the specified delimiter using -s option.The following example doesn’t display any output, as the cut command didn’t find any lines that has | (pipe) as delimiter in the /etc/passwd file.

[skypeGNU@localhost ~]$ head -3 /etc/passwd | cut -d'|' -s -f1

无输出

7. Select All Fields Except the Specified

FieldsIn order to complement the selection field list use option --complement.The following example displays all the fields from /etc/passwd file except field 7

[skypeGNU@localhost ~]$ head -3 /etc/passwd | cut -d':' -f7

/bin/bash/sbin/nologin/sbin/nologin 

[skypeGNU@localhost ~]$ head -3 /etc/passwd | cut -d':' --complement -f7

root:x:0:0:root:/rootbin:x:1:1:bin:/bindaemon:x:2:2:daemon:/sbin 

8. Change Output Delimiter for Display

By default the output delimiter is same as input delimiter that we specify in the cut -d option.To change the output delimiter use the option –output-delimiter as shown below. In this example, the input delimiter is : (colon), but the output delimiter is # (hash).

[skypeGNU@localhost ~]$ head -3 /etc/passwd | cut -d':' -f1,6,7 --output-delimiter='#' 

root#/root#/bin/bashbin#/bin#/sbin/nologindaemon#/sbin#/sbin/nologin 

9. Change Output Delimiter to Newline

In this example, each and every field of the cut command output is displayed in a separate line. We still used --output-delimiter, but the value is $’\n’ which indicates that we should add a newline as the output delimiter.

[skypeGNU@localhost ~]$ grep '^root:' /etc/passwd | cut -d':' -f1,6

root:/root 

[skypeGNU@localhost ~]$ grep '^root:' /etc/passwd | cut -d':' -f1,6 --output-delimiter=$'\n'

root/root  

10. Combine Cut with Other Unix Command Output

The power of cut command can be realized when you combine it with the stdout of some other Unix command.Once you master the basic usage of cut command that we’ve explained above, you can wisely use cut command to solve lot of your text manipulation requirements. 

(1) Displays the unix login names for all the users in the system.

[skypeGNU@localhost ~]$ cut -d':' -f1 /etc/passwd | head -3

rootbindaemon 


(2) Displays the total memory available on the system.

[skypeGNU@localhost ~]$ free | tr -s ' ' | sed '/^Mem/!d' | cut -d' ' -f2

1021060 


关于字符 -c 和字节 -b 的讨论:

[skypeGNU@localhost ~]$ cat test_cn.txt

复旦大学上海交通大学南京大学中国人民大学香港科技大学 
                   

[skypeGNU@localhost ~]$ cut -c2 test_cn.txt

旦 海 京 国 港


[skypeGNU@localhost ~]$ cut -b2 test_cn.txt

� � � � �

看到了吧,上面发生了什么情况大哭。用-c则会以字符为单位,输出正常;而-b只会傻傻的以字节(8位二进制位)来计算,输出就是乱码。既然提到了这个知识点,就再补充一点。 


            在计算机中,所有的数据在存储和运算时都要使用二进制数表示(因为计算机用高电平和低电平分别表示1和0),例如,像a、b、c、d这样的52个字母(包括大写)、以及0、1等数字还有一些常用的符号(例 如*、#、@等)在计算机中存储时也要使用二进制数来表示,而具体用哪些二进制数字表示哪个符号,当然每个人都可以约定自己的一套(这就叫编码),而大家如果要想互相通信而不造成混乱,那么大家就必须使用相同的编码规则,于是美国有关的标准化组织就出台了所谓的ASCII编码,统一规定了上述常用符号用哪些二进制数来表示。

           ASCII 码使用指定的7 位或8 位二进制数组合来表示128 或256 种可能的字符。标准ASCII 码也叫基础ASCII码,使用7 位二进制数来表示所有的大写和小写字母,数字0 到9、标点符号, 以及在美式英语中使用的特殊控制字符。所以,对于英文来说一个字符对应一个字节是没有任何问题的,一个字符8bit。但是问题是汉字编码数量庞大,字形复杂,所以只用第一个字节是没有办法表示的。 所以必须用多个字节来表示一个字符。常见的中文字符集有: GB2312-80字符集,中文名国家标准字符集, Big-5字符集,中文名大五码, GBK字符集,中文名国家标准扩展字符集。

ISO/IEC 10646 / Unicode字符集,一个字符用16bit表示. 

字符串在内存中的存放方法:

在 ASCII 阶段,单字节字符串使用一个字节存放一个字符(SBCS)。比如,"Bob123" 在内存中为:

426F6231323300Bob123\0
在使用 ANSI 编码支持多种语言阶段,每个字符使用一个字节或多个字节来表示(MBCS),因此,这种方式存放的字符也被称作多字节字符。比如,"中文123" 在中文 Windows 95 内存中为7个字节,每个汉字占2个字节,每个英文和数字字符占1个字节

D6  D0CE  C431323300中文123\0

在 UNICODE 被采用之后,计算机存放字符串时,改为存放每个字符在 UNICODE 字符集中的序号。目前计算机一般使用 2 个字节(16 位)来存放一个序号(DBCS),因此,这种方式存放的字符也被称作宽字节字符。比如,字符串 "中文123" 在 Windows 2000 下,内存中实际存放的是 5 个序号:

2D  4E87  6531  0032  0033  0000  00中文123\0

当遇到多字节字符时,可以使用-n选项,-n用于告诉cut不要将多字节字符拆开。

例子如下:

[skypeGNU@localhost ~]$ cut -b2 test_cn.txt

� � � � � 

[skypeGNU@localhost ~]$ cut -b2 -n test_cn.txt      

这里什么也没有打印。大笑


cut有哪些缺陷和不足?

猜出来了吧?对,就是在处理多空格时。    

如果文件里面的某些域是由若干个空格来间隔的,那么用cut就有点麻烦了,因为cut只擅长处理“以一个字符间隔”的文本内容。

原创粉丝点击