11.Hive DML数据操作

Leefs 2021-12-05 PM 1737℃ 0条

[TOC]

### 前言

本篇讲述通过Hive命令实现对数据的导出和导入操作。

### 一、数据导入

#### 1.1 向表中装载数据（Load）

##### **语法**

```sql
load data [local] inpath 'datapath' [overwrite] 
into table dbname [partition (partcol1=val1,…)]
```

| 参数       | 说明                                                         |
| ---------- | ------------------------------------------------------------ |
| load data  | 加载数据                                                     |
| local      | 从本地加载数据到hive表；如果不加local则从HDFS加载数据到hive表 |
| inpath     | 表示加载数据的路径                                           |
| overwriter | 表示覆盖表中的已有数据，如果不存在则表示追加                 |
| into table | 表示加载到哪张表                                             |
| dbname     | 表示具体的表                                                 |
| partition  | 表示上传到指定的分区                                         |

##### **案例实操**

**数据准备**

```basic
1001    zhanghua    12
1002    wangfang    13
1003    huixiaolin    12
1004    huojiaoyu    11
1005    huixiaofeng    12
```

**（1）创建student表**

```sql
create table student(id int,name string,age int) row format delimited fields terminated by '\t';
```

**（2）加载本地文件到Hive**

```sql
hive> load data local inpath '/home/datas/student.txt' into table student;
```

**（3）加载HDFS文件到hive中**

+ 上传文件到HDFS

```sql
  hive> dfs -put /home/datas/student.txt /user/hive/warehouse;
  ```

+ 加载HDFS数据

```sql
  hive> load data inpath '/user/hive/warehouse/student.txt' into table student;
  ```

注意：此处不保留原文件，相当于剪切操作，实际上只修改了HDFS元数据中目录存放地址。

+ 加载数据覆盖表中已有的数据

```sql
  load data inpath '/user/hive/warehouse/student.txt' overwrite into table student;
  ```

#### 1.2 通过查询语句向表中插入数据（Insert）

**（1）创建两张表**

```
create table student_instert01(id int,name string,age int) row format delimited fields terminated by '\t';
create table student_instert02(id int,name string,age int) row format delimited fields terminated by '\t';
```

**（2）基本插入数据**

```
hive> insert into table student_instert01 values(001,'wangwu',20),(002,'lufei',12);
```

values后面可以跟多条数据，直接通过逗号隔开。

**（3）基本模式插入（根据单张表查询结果）**

+ 将student表中查询结果插入到student_instert01表中

```sql
hive> insert overwrite table student_instert01 select id,name,age from student;
```

**（4）多插入模式（根据多张表查询结果）**

```sql
from student
insert into table student_instert01
select id,name,age
insert into table student_instert02
select id,name,age;
```

#### 1.3  查询语句中创建表并加载数据（`As Select`）

根据查询结果创建表（查询结果会添加到新创建的表中）

```hive
hive> create table if not exists student4 as select id, name,age from student;
```

这种方式只能创建管理表，不能创建外部表。

#### 1.4 创建表时通过`Location`指定加载数据路径

**（1）上传数据到 `hdfs` 上**

```sql
# 在hdfs上创建目录student
hive> dfs -mkdir /student;
#将服务器本地文件上传到hdfs的student目录下
hive> dfs -put /home/datas/student.txt /student;
```

![11.Hive DML数据操作01.jpg](https://lilinchao.com/usr/uploads/2021/12/2951324653.jpg)

**（2）创建表，并指定在 `hdfs` 上的位置**

```sql
create external table if not exists student5(id int,name string,age int)
row format delimited fields terminated by '\t'
location '/student';
```

**（3）查询数据**

```sql
hive> select * from student5;
```

#### 1.5 `Import`数据到指定Hive表中

注意：先用export导出后（导出的数据目录里面附带有元数据)，再import数据导入。

**（1）通过export命令导出student表中数据到/student2目录**

```sql
export table student to '/student2';
```

![11.Hive DML数据操作02.jpg](https://lilinchao.com/usr/uploads/2021/12/715580392.jpg)

说明：如果没有该目录会自动创建，导出的数据中多出一份`_metadata`元数据信息

**（2）创建表student6**

```
create table student6(id int,name string,age int) row format delimited fields terminated by '\t';
```

**（3）import向表中导入数据**

```sql
# 使用import命令将/student2表中的数据导入到student6表中
import table student6 from '/student2';
```

### 二、数据导出

#### 2.1 Insert导出

**（1）将查询的结果导出到本地**

```sql
hive> insert overwrite local directory '/home/hadoop/datas' select * from student;
```

如果报如下错误，是因为hive所处的用户权限不够，切换到该用户创建目录下即可。

![11.Hive DML数据操作03.jpg](https://lilinchao.com/usr/uploads/2021/12/1097761338.jpg)

+ **查看导出结果**

![11.Hive DML数据操作04.jpg](https://lilinchao.com/usr/uploads/2021/12/1318737529.jpg)

这样的分隔符肯定不是我们想要的。

**（2）将查询的结果格式化导出到本地**

```sql
# 导出结果使用\t分隔符分隔
hive> insert overwrite local directory '/home/hadoop/datas'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from student;
```

+ **查看导出结果**

![11.Hive DML数据操作05.jpg](https://lilinchao.com/usr/uploads/2021/12/4230745641.jpg)

**（3）将查询的结果导出到 HDFS 上(没有 local)**

```sql
# 将student表数据通过\t分隔导出到hdfs的/student3目录
hive> insert overwrite directory '/student3'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from student;
```

*注：虽然同是HDFS，但不是copy操作*

#### 2.2 Hadoop命令导出到本地

```sql
# 将student表中数据导出到服务器本地/home/hadoop/datas/student.txt文件下
hive> dfs -get /user/hive/warehouse/student/student.txt /home/hadoop/datas/student.txt;
```

#### 2.3 Hive Shell命令导出

**基本语法**

> hive -f/-e 执行语句或者脚本 > file(自己创建)

注意：如果追加使用`>>`

```shell
[hadoop@hadoop001 hive]$ bin/hive -e 'select * from default.student;' >
/home/hadoop/datas/student4.txt;
```

#### 2.4 Export导出到HDFS上

```sql
hive> export table default.student to '/student4';
```

export 和import主要用于两个`Hadoop`平台集群之间 Hive 表迁移。

### 三、清除表中数据（Truncate）

```sql
hive> truncate table student;
```

*注意：Truncate 只能删除管理表，不能删除外部表中数据*

标签: Hadoop, Hive

非特殊说明，本博所有文章均为博主原创。

如若转载，请注明出处：https://lilinchao.com/archives/1717.html

上一篇 10.Hive DDL数据定义

下一篇 12.Hive经典练习题

11.Hive DML数据操作

评论已关闭

栏目分类

标签云

友情链接申请

11.Hive DML数据操作

评论已关闭

 栏目分类

标签云

友情链接申请

栏目分类

标签云

友情链接申请