MySQL分区表和分桶表的操作详解

吾爱主题阅读：216 2023-05-12 14:51:00 评论：0

1.创建分区表

1 2 3 4 5 6 7 create table dept_partition( deptno int , dname string, loc int ) partitioned by (dt string) // 分区字段( date ) row format delimited fields terminated by '\t' ;

2.增删改查操作

2.1 插入数据

1）导入本地数据

1 2 3 4 -- 创建一个名字为dt='2022-06-14'的文件夹，在其中导入数据 load data local inpath '/opt/module/hive/datas/dept.txt' into table dept_partition partition(dt= '2022-06-14' );

分区表就是先创建文件夹，然后在文件夹中写入数据

换句话说，分区表就是将一张大表分成若干个文件夹进行管理

2）插入数据

1 2	`insert` `overwrite` `table` `dept_partition partition(dt=` `'2022-06-17'` `)` `select` `deptno, dname, loc` `from` `dept;`

1 2	`insert` `overwrite` `table` `dept_partition` `select` `deptno, dname, loc,` `'2022-06-18'` `from` `dept;`

2.2 操作数据

1）查看分区数

1	`show partitions dept_partition;`

2）查询指定分区

1	`select` `*` `from` `dept_partition` `where` `dt=` `'2022-06-14'` `;`

3）增加/删除分区

1 2	`alter` `table` `dept_partition` `add` `partition(dt=` `'2022-06-19'` `);` `alter` `table` `dept_partition` `drop` `partition(dt=` `'2022-06-19'` `);`

ps.也可以直接在liunx端输入命令增加分区

-- 将18号分区复制一份，命名为13号分区
hadoop fs -cp /user/hive/warehouse/dept_partition/dt=2022-06-18
/user/hive/warehouse/dept_partition/dt=2022-06-13

ps..如果直接在网页端新建文件夹,终端不会显示新建的分区，必须修复

1	`msck repair` `table` `dept_partition;`

3. 二级分区表

就是大文件夹套小文件夹

3.1 创建分区表

1 2 3 4 5 6 7 create table dept_partition2( deptno int , dname string, loc int ) partitioned by ( month string, day string) // month 为父目录， day 为子目录 row format delimited fields terminated by '\t' ;

3.2 插入数据

1 2	`load` `data` `local` `inpath` `'/opt/module/hive/datas/dept.txt'` `into` `table` `dept_partition2 partition(` `month` `=` `'2022-06'` `,` `day` `=` `'15'` `);`

1 2	`insert` `into` `dept_partition2 partition(` `month` `=` `'2022-06'` `,` `day` `=` `'15'` `)` `select` `deptno, dname, loc` `from` `dept;`

4.动态分区

普通数据无法直接转化为分区表，只能先新建新的分区表，再将旧数据插入这个新的分区表

1）创建分区表

1 2 3 4 5 6 7 create table emp_par( empno int , ename string, job string, salary decimal (16,2) ) partitioned by (deptno int ) row format delimited fields terminated by '\t' ;

2）然后将数据插入这张分区表

方式一：一个分区一个分区的插入

1 2	`insert` `into` `emp_par partition(deptno=10)` `select` `empno,ename,job,sal` `from` `emp` `where` `deptno=10; //然后是11，12...`

方式二：动态分区一次搞定

1 2	`insert` `overwrite` `table` `emp_par // 不用指定分区` `select` `empno,ename,job,sal,deptno` `from` `emp; //直接把deptno写到这里`

5.分桶表

核心语句：

1	`clustered` `by` `(a) sorted` `by` `(b)` `into` `4 buckets //按照a分了4个桶，桶内按照b排序`

5.1 新建分桶表

1 2 3 4 5 6 create table stu_buck( id int , name string ) clustered by (id) sorted by (id) into 4 buckets //根据id的hash值按4取模 row format delimited fields terminated by '\t' ;

查看

1	`select` `*` `from` `stu_buk`

可以发现分成了四个区

ps.分桶的意义:在取数的时候可以直接数据定位所在的桶，然后方便遍历，查询更高效

5.2 插入数据

1	`load` `data inpath` `'/datas/student.txt'` `into` `table` `stu_buck;`

ps.不能用本地模式,必须用hdfs模式

1 2	`insert` `overwrite` `table` `stu_buck` `select` `id,` `name` `from` `stu_ex;`

5.3 既分区有分桶

1 2 3 4 5 6 7 create table stu_par_buck( id int , name string ) partitioned by (dt string) // 先创建文件夹 clustered by (id) sorted by (id desc ) into 4 buckets //然后内部分桶 row format delimited fields terminated by '\t' ;