当前位置：首页 > 建站教程 >

小白也能看懂的Pandas实操演示教程(上)

GG网络技术分享 2025-03-18 16:09 6

文章来源于AI派作者AAA奔雷手

今天主要带大家来实操学习下Pandas，因为篇幅原因，分为了两部分，本篇为上。

1 数据结构的简介

pandas中有两类非常重要的数据结构，就是序列Series和数据框DataFrame.Series类似于NumPy中的一维数组，可以使用一维数组的可用函数和方法，而且还可以通过索引标签的方式获取数据，还具有索引的自动对齐功能；DataFrame类似于numpy中的二维数组，同样可以使用numpy数组的函数和方法，还具有一些其它灵活的使用。

1.1 Series的创建三种方法

通过一维数组创建序列m

import pandas as pd
import numpy as np
arr1=np.arange(10)
print("数组arr1：",arr1)
print("arr1的数据类型：",type(arr1))
s1=pd.Series(arr1)
print("序列s1:
",s1)
print("s1的数据类型：",type(s1))

数组arr1： [0 1 2 3 4 5 6 7 8 9]

arr1的数据类型：

序列s1:

0 0

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

dtype: int32

s1的数据类型：

通过字典的方式创建序列

dict1={a:1,b:2,c:3,d:4,e:5}
print("字典dict1：",dict1)
print("dict1的数据类型：",type(dict1))
s2=pd.Series(dict1)
print("序列s2：",s2)
print("s2的数据类型：",type(s2))

字典dict1： {a: 1, b: 2, c: 3, d: 4, e: 5}

dict1的数据类型：

序列s2：a 1

b 2

c 3

d 4

e 5

dtype: int64

s2的数据类型：

通过已有DataFrame创建

由于涉及到了DataFrame的概念，所以等后面介绍了DataFrame之后补充下如何通过已有的DataFrame来创建Series。

1.2 DataFrame的创建三种方法

通过二维数组创建数据框

print("第一种方法创建DataFrame")
arr2=np.array(np.arange(12)).reshape(4,3)
print("数组2：",arr2)
print("数组2的类型",type(arr2))
df1=pd.DataFrame(arr2)
print("数据框1：
",df1)
print("数据框1的类型：",type(df1))

第一种方法创建DataFrame
数组2： [[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
数组2的类型 数据框1：
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
数据框1的类型：

通过字典列表的方式创建数据框

print("第二种方法创建DataFrame")
dict2={a:[1,2,3,4],b:[5,6,7,8],c:[9,10,11,12],d:[13,14,15,16]}
print("字典2-字典列表：",dict2)
print("字典2的类型",type(dict2))
df2=pd.DataFrame(dict2)
print("数据框2：
",df2)
print("数据框2的类型：",type(df2))

第二种方法创建DataFrame
字典2-字典列表： {a: [1, 2, 3, 4], b: [5, 6, 7, 8], c: [9, 10, 11, 12], d: [13, 14, 15, 16]}
字典2的类型 数据框2：
a b c d
0 1 5 9 13
1 2 6 10 14
2 3 7 11 15
3 4 8 12 16
数据框2的类型：

通过嵌套字典的方式创建数据框

dict3={one:{a:1,b:2,c:3,d:4},
two:{a:5,b:6,c:7,d:8},
three:{a:9,b:10,c:11,d:12}}
print("字典3-嵌套字典：",dict3)
print("字典3的类型",type(dict3))
df3=pd.DataFrame(dict3)
print("数据框3：
",df3)
print("数据框3的类型：",type(df3))

字典3-嵌套字典： {one: {a: 1, b: 2, c: 3, d: 4}, two: {a: 5, b: 6, c: 7, d: 8}, three: {a: 9, b: 10, c: 11, d: 12}}
字典3的类型 数据框3：
one three two
a 1 9 5
b 2 10 6
c 3 11 7
d 4 12 8
数据框3的类型：

有了DataFrame之后，这里补充下如何通过DataFrame来创建Series。

s3=df3[one] 直接拿出数据框3中第一列
print("序列3：
",s3)
print("序列3的类型：",type(s3))
print("------------------------------------------------")
s4=df3.iloc[0] df3[a] 直接拿出数据框3中第一行--iloc
print("序列4：
",s4)
print("序列4的类型：",type(s4))

序列3：

a 1

b 2

c 3

d 4

Name: one, dtype: int64

序列3的类型：

------------------------------------------------

序列4：

one 1

three 9

two 5

Name: a, dtype: int64

序列4的类型：

2 数据索引index

无论数据框还是序列，最左侧始终有一个非原始数据对象，这个就是接下来要介绍的数据索引。通过索引获取目标数据，对数据进行一系列的操作。

2.1 通过索引值或索引标签获取数据

s5=pd.Series(np.array([1,2,3,4,5,6]))
print(s5) 如果不给序列一个指定索引值，序列会自动生成一个从0开始的自增索引

0 1

1 2

2 3

3 4

4 5

5 6

dtype: int32

通过index属性获取序列的索引值

s5.index

RangeIndex(start=0, stop=6, step=1)

为index重新赋值

s5.index=[a,b,c,d,e,f] 
s5

a 1

b 2

c 3

d 4

e 5

f 6

dtype: int32

通过索引获取数据

s5[3]
4

s5[e]
5

s5[[1,3,5]]
b 2
d 4
f 6
dtype: int32

s5[:4]
a 1
b 2
c 3
d 4
dtype: int32

s5[c:]
c 3
d 4
e 5
f 6
dtype: int32

s5[b:e] 通过索引标签获取数据，末端标签的数据也是返回的，

b 2

c 3

d 4

e 5

dtype: int32

2.2 自动化对齐

当对两个
s6=pd.Series(np.array([10,15,20,30,55,80]),index=[a,b,c,d,e,f])
print("序列6：",s6)
s7=pd.Series(np.array([12,11,13,15,14,16]),index=[a,c,g,b,d,f])
print("序列7：",s7)
print(s6 s7) s6中不存在g索引，s7中不存在e索引，所以数据运算会产生两个缺失值NaN。
可以注意到这里的算术运算自动实现了两个序列的自动对齐
对于数据框的对齐，不仅是行索引的自动对齐，同时也会对列索引进行自动对齐，数据框相当于二维数组的推广
print(s6/s7)

序列6： a 10

b 15

c 20

d 30

e 55

f 80

dtype: int32

序列7： a 12

c 11

g 13

b 15

d 14

f 16

dtype: int32

a 22.0

b 30.0

c 31.0

d 44.0

e NaN

f 96.0

g NaN

dtype: float64

a 0.833333

b 1.000000

c 1.818182

d 2.142857

e NaN

f 5.000000

g NaN

dtype: float64

3 pandas查询数据

通过布尔索引有针对的选取原数据的子集，指定行，指定列等。

test_data=pd.read_csv(test_set.csv)
test_data.drop([ID],inplace=True,axis=1)
test_data.head()

非数值值特征数值化

test_data[job],jnum=pd.factorize(test_data[job])
test_data[job]=test_data[job] 1
test_data[marital],jnum=pd.factorize(test_data[marital])
test_data[marital]=test_data[marital] 1
test_data[education],jnum=pd.factorize(test_data[education])
test_data[education]=test_data[education] 1
test_data[default],jnum=pd.factorize(test_data[default])
test_data[default]=test_data[default] 1
test_data[housing],jnum=pd.factorize(test_data[housing])
test_data[housing]=test_data[housing] 1
test_data[loan],jnum=pd.factorize(test_data[loan])
test_data[loan]=test_data[loan] 1
test_data[contact],jnum=pd.factorize(test_data[contact])
test_data[contact]=test_data[contact] 1
test_data[month],jnum=pd.factorize(test_data[month])
test_data[month]=test_data[month] 1
test_data[poutcome],jnum=pd.factorize(test_data[poutcome])
test_data[poutcome]=test_data[poutcome] 1
test_data.head()

查询数据的前5行

test_data.head()

查询数据的末尾5行

test_data.tail()

查询指定的行

test_data.iloc[[0,2,4,5,7]]

查询指定的列

test_data[[age,job,marital]].head()

查询指定的行和列

test_data.loc[[0,2,4,5,7],[age,job,marital]]

查询年龄为51的信息

通过布尔索引实现数据的自己查询
test_data[test_data[age]==51].head()

查询工作为5以上的年龄在51的信息

test_data[(test_data[age]==51)

标签：

上一篇： PANDAS：新手教程一
下一篇：别找了，这是Pandas最详细教程了

为您推荐

提交需求或反馈

Demand feedback

首页
电话
客服

QQ在线客服

售前技术支持

关注微信
顶部

建站教程

小白也能看懂的Pandas实操演示教程(上)

为您推荐

提交需求或反馈

产品中心

H5单页免费源码

免费源码

联系我们

QQ在线客服

关注微信