Linux如何对文件进行分割和重组

吾爱主题阅读：413 2024-04-05 16:15:58 评论：0

csplit，split 和 cat 来重新整理文件，然后再将文件合并在一起。这些操作在任何文件类型下都有用：文本、图片、音频文件、ISO 镜像文件等。

使用 csplit 分割文件

csplit 将单个文件分割成多个文件。

[root@k8s-master-node1 test]# cat 1
1
2
3
4
5
6
[root@k8s-master-node1 test]#

它将文件 1 分为三个文件，以行号 2 和 5 作为分割点

[root@k8s-master-node1 test]# csplit 1 2 5
2
6
5
[root@k8s-master-node1 test]#

csplit 在当前目录下创建了三个新文件，并以字节为单位打印出新文件的大小。默认情况下，每个新文件名为 xx_nn：

[root@k8s-master-node1 test]# ls
1 xx00 xx01 xx02
[root@k8s-master-node1 test]#

分别查看内容

[root@k8s-master-node1 test]# cat xx00
1
[root@k8s-master-node1 test]# cat xx01
2
3
4
[root@k8s-master-node1 test]# cat xx02
5
6

[root@k8s-master-node1 test]#

如果要将文件分割成包含相同行数的多个文件如何操作呢？

可以指定行数，然后将重复次数放在在花括号中。此示例重复分割 4 次，并将剩下的转储到最后一个文件中

[root@k8s-master-node1 test]# csplit 1 1 {4}
0
2
2
2
2
5

可以使用星号通配符来告诉 csplit 尽可能多地重复分割。这听起来很酷，但是如果文件不能等分，则会失败（低版本的 csplit 不支持此参数）

[root@k8s-master-node1 test]# csplit 1 1 {*}
0
2
2
2
2
2
2
1
csplit: ‘1’: line number out of range on repetition 7
[root@k8s-master-node1 test]# cat 1 |wc -l
7
[root@k8s-master-node1 test]#

默认的行为是删除发生错误时的输出文件。

可以用 -k 选项来解决这个问题，当有错误时，它就不会删除输出文件。

另一个行为是每次运行 csplit 时，它将覆盖之前创建的文件，如果需要使用新的文件名来分别保存它们。可以使用使用 --prefix= prefix 来设置一个不同的文件前缀：

[root@k8s-master-node1 test]# csplit -k --prefix=mine 1 1 {*}
0
2
2
2
2
2
2
1
csplit: ‘1’: line number out of range on repetition 7
[root@k8s-master-node1 test]# ls
1 mine00 mine01 mine02 mine03 mine04 mine05 mine06 mine07
[root@k8s-master-node1 test]#

-n 可用于改变对文件进行编号的数字位数（默认是 2 位）：

[root@k8s-master-node1 test]# ls
1 mine0 mine1 mine2 mine3 mine4 mine5 mine6 mine7
[root@k8s-master-node1 test]#

csplit 中的 “c” 是上下文（context）的意思。也就是说可以根据任意匹配的方式或者巧妙的正则表达式来分割文件。

下面的例子将文件分为两部分。第一个文件在包含第一次出现 “3” 的前一行处结束，第二个文件则以包含 “3” 的行开头。

[root@k8s-master-node1 test]# csplit 1 /3/
4
9
[root@k8s-master-node1 test]# ls
1  xx00  xx01
[root@k8s-master-node1 test]# cat xx00
1
2
[root@k8s-master-node1 test]# cat xx01
3
4
5
6

[root@k8s-master-node1 test]#

在每次出现 “3” 时分割文件：

[root@k8s-master-node1 test]# cat 1
1
2
3
3
4
5
6
[root@k8s-master-node1 test]#
[root@k8s-master-node1 test]# csplit 1 /3/ {*}
4
2
9
[root@k8s-master-node1 test]# ls
1  xx00  xx01  xx02
[root@k8s-master-node1 test]# cat xx00
1
2
[root@k8s-master-node1 test]# cat xx01
3
[root@k8s-master-node1 test]# cat xx02
3
4
5
6

[root@k8s-master-node1 test]#

{} 里面*可以替换为具体的数字，表述第几次出现的时候开始切割

仅当内容以包含 “3” 的行开始时才复制，并且省略前面的所有内容：

[root@k8s-master-node1 test]# cat 1
1
2
1 3
4
5
6

[root@k8s-master-node1 test]# csplit 1 %3%
11
[root@k8s-master-node1 test]# ls
1  xx00
[root@k8s-master-node1 test]# cat xx00
1 3
4
5
6

[root@k8s-master-node1 test]#

将文件分割成不同大小

split 与 csplit 类似。它将文件分割成特定的大小，当您将大文件分割成小的多媒体文件或者使用网络传送时，速度便会快很多。

默认的大小为 1000 行

# split 1.mv
# ls -hl
266K Aug 21 16:58 xaa
267K Aug 21 16:58 xab
315K Aug 21 16:58 xac
[...]

它们分割出来的大小相似，但你可以指定任何你想要的大小。这个例子中是 10M 字节

# split -b 10M 1.mv

尺寸单位缩写为 K，M，G，T，P，E，Z，Y（1024 的幂）或者 KB，MB，GB 等等（1000 的幂）。

为文件名自定义前缀和后缀：

# split -a 3 --numeric-suffixes=9 --additional-suffix=mine 1.mv SB
240K Aug 21 17:44 SB009mine
214K Aug 21 17:44 SB010mine
220K Aug 21 17:44 SB011mine

-a 选项控制编号的数字位置。

--numeric-suffixes 设置编号的开始值。默认前缀为 x，你也可以通过在文件名后输入它来设置一个不同的前缀。