Python基础(十一) | 超详细的Pandas库三万字总结(上)
⭐本专栏旨在对Python的基础语法进行详解,精炼地总结语法中的重点,详解难点,面向零基础及入门的学习者,通过专栏的学习可以熟练掌握python编程,同时为后续的数据分析,机器学习及深度学习的代码能力打下坚实的基础。
🔥本文已收录于Python基础系列专栏: Python基础系列教程 欢迎订阅,持续更新。
引子
Numpy 在向量化的数值计算中表现优异
但是在处理更灵活、复杂的数据任务:
如为数据添加标签、处理缺失值、分组和透视表等方面
Numpy显得力不从心
而基于Numpy构建的Pandas库,提供了使得数据分析变得更快更简单的高级数据结构和操作工具
11.1 对象创建
11.1.1 Pandas Series对象
Series 是带标签数据的一维数组
Series对象的创建
通用结构: pd.Series(data, index=index, dtype=dtype)
data:数据,可以是列表,字典或Numpy数组
index:索引,为可选参数
dtype: 数据类型,为可选参数
1、用列表创建
- index缺省,默认为整数序列
import pandas as pd
data = pd.Series([1.5, 3, 4.5, 6])
data
0 1.5
1 3.0
2 4.5
3 6.0
dtype: float64
- 增加index
data = pd.Series([1.5, 3, 4.5, 6], index=["a", "b", "c", "d"])
data
a 1.5
b 3.0
c 4.5
d 6.0
dtype: float64
-
增加数据类型
缺省则从传入的数据自动判断
data = pd.Series([1, 2, 3, 4], index=["a", "b", "c", "d"])
data
a 1
b 2
c 3
d 4
dtype: int64
data = pd.Series([1, 2, 3, 4], index=["a", "b", "c", "d"], dtype="float")
data
a 1.0
b 2.0
c 3.0
d 4.0
dtype: float64
注意:数据支持多种类型
- 混合后数据类型变为object
data = pd.Series([1, 2, "3", 4], index=["a", "b", "c", "d"])
data
a 1
b 2
c 3
d 4
dtype: object
data["a"]
1
data["c"]
'3'
数据类型可被强制改变
data = pd.Series([1, 2, "3", 4], index=["a", "b", "c", "d"], dtype=float)
data
a 1.0
b 2.0
c 3.0
d 4.0
dtype: float64
data["c"]
3.0
不能转为浮点数则会报错
data = pd.Series([1, 2, "a", 4], index=["a", "b", "c", "d"], dtype=float)
data
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9236/4046912764.py in <module>
----> 1 data = pd.Series([1, 2, "a", 4], index=["a", "b", "c", "d"], dtype=float)
2 data
NameError: name 'pd' is not defined
2、用一维numpy数组创建
import numpy as np
x = np.arange(5)
pd.Series(x)
0 0
1 1
2 2
3 3
4 4
dtype: int32
3、用字典创建
- 默认以键为index 值为data
population_dict = {"BeiJing": 2154,
"ShangHai": 2424,
"ShenZhen": 1303,
"HangZhou": 981 }
population = pd.Series(population_dict)
population
BeiJing 2154
ShangHai 2424
ShenZhen 1303
HangZhou 981
dtype: int64
- 字典创建,如果指定index,则会到字典的键中筛选,找不到的,值设为NaN
population = pd.Series(population_dict, index=["BeiJing", "HangZhou", "c", "d"])
population
BeiJing 2154.0
HangZhou 981.0
c NaN
d NaN
dtype: float64
4、data为标量的情况
pd.Series(5, index=[100, 200, 300])
100 5
200 5
300 5
dtype: int64
11.1.2 Pandas DataFrame对象
DataFrame 是带标签数据的多维数组
DataFrame对象的创建
通用结构: pd.DataFrame(data, index=index, columns=columns)
data:数据,可以是列表,字典或Numpy数组
index:索引,为可选参数
columns: 列标签,为可选参数
1、通过Series对象创建
population_dict = {"BeiJing": 2154,
"ShangHai": 2424,
"ShenZhen": 1303,
"HangZhou": 981 }
population = pd.Series(population_dict)
pd.DataFrame(population)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
0 | |
---|---|
BeiJing | 2154 |
ShangHai | 2424 |
ShenZhen | 1303 |
HangZhou | 981 |
pd.DataFrame(population, columns=["population"])
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
population | |
---|---|
BeiJing | 2154 |
ShangHai | 2424 |
ShenZhen | 1303 |
HangZhou | 981 |
2、通过Series对象字典创建
GDP_dict = {"BeiJing": 30320,
"ShangHai": 32680,
"ShenZhen": 24222,
"HangZhou": 13468 }
GDP = pd.Series(GDP_dict)
GDP
BeiJing 30320
ShangHai 32680
ShenZhen 24222
HangZhou 13468
dtype: int64
pd.DataFrame({"population": population,
"GDP": GDP})
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
population | GDP | |
---|---|---|
BeiJing | 2154 | 30320 |
ShangHai | 2424 | 32680 |
ShenZhen | 1303 | 24222 |
HangZhou | 981 | 13468 |
注意:数量不够的会自动补齐
pd.DataFrame({"population": population,
"GDP": GDP,
"country": "China"})
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
population | GDP | country | |
---|---|---|---|
BeiJing | 2154 | 30320 | China |
ShangHai | 2424 | 32680 | China |
ShenZhen | 1303 | 24222 | China |
HangZhou | 981 | 13468 | China |
3、通过字典列表对象创建
- 字典索引作为index,字典键作为columns
import numpy as np
import pandas as pd
data = [{"a": i, "b": 2*i} for i in range(3)]
data
[{'a': 0, 'b': 0}, {'a': 1, 'b': 2}, {'a': 2, 'b': 4}]
data = pd.DataFrame(data)
data
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
a | b | |
---|---|---|
0 | 0 | 0 |
1 | 1 | 2 |
2 | 2 | 4 |
行的标签没有排,因此行从0开始,列的标签延续。
- 从中取出一列数据
data1 = data["a"].copy()
data1
0 0
1 1
2 2
Name: a, dtype: int64
data1[0] = 10
data1
0 10
1 1
2 2
Name: a, dtype: int64
data
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
a | b | |
---|---|---|
0 | 0 | 0 |
1 | 1 | 2 |
2 | 2 | 4 |
- 不存在的键,会默认值为NaN
data = [{"a": 1, "b":1},{"b": 3, "c":4}]
data
[{'a': 1, 'b': 1}, {'b': 3, 'c': 4}]
pd.DataFrame(data)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
a | b | c | |
---|---|---|---|
0 | 1.0 | 1 | NaN |
1 | NaN | 3 | 4.0 |
4、通过Numpy二维数组创建
data = np.random.randint(10, size=(3, 2))
data
array([[1, 6],
[2, 9],
[4, 0]])
pd.DataFrame(data, columns=["foo", "bar"], index=["a", "b", "c"])
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
foo | bar | |
---|---|---|
a | 1 | 6 |
b | 2 | 9 |
c | 4 | 0 |
11.2 DataFrame性质
1、属性
data = pd.DataFrame({"pop": population, "GDP": GDP})
data
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
pop | GDP | |
---|---|---|
BeiJing | 2154 | 30320 |
ShangHai | 2424 | 32680 |
ShenZhen | 1303 | 24222 |
HangZhou | 981 | 13468 |
(1)df.values 返回numpy数组表示的数据
data.values
array([[ 2154, 30320],
[ 2424, 32680],
[ 1303, 24222],
[ 981, 13468]], dtype=int64)
(2)df.index 返回行索引
data.index
Index(['BeiJing', 'ShangHai', 'ShenZhen', 'HangZhou'], dtype='object')
(3)df.columns 返回列索引
data.columns
Index(['pop', 'GDP'], dtype='object')
(4)df.shape 形状
data.shape
(4, 2)
(5) pd.size 大小
data.size
8
(6)pd.dtypes 返回每列数据类型
data.dtypes
pop int64
GDP int64
dtype: object
2、索引
data
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
pop | GDP | |
---|---|---|
BeiJing | 2154 | 30320 |
ShangHai | 2424 | 32680 |
ShenZhen | 1303 | 24222 |
HangZhou | 981 | 13468 |
(1)获取列
- 字典式
data["pop"]
BeiJing 2154
ShangHai 2424
ShenZhen 1303
HangZhou 981
Name: pop, dtype: int64
data[["GDP", "pop"]]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
GDP | pop | |
---|---|---|
BeiJing | 30320 | 2154 |
ShangHai | 32680 | 2424 |
ShenZhen | 24222 | 1303 |
HangZhou | 13468 | 981 |
- 对象属性式
data.GDP
BeiJing 30320
ShangHai 32680
ShenZhen 24222
HangZhou 13468
Name: GDP, dtype: int64
(2)获取行
- 绝对索引 df.loc
data.loc["BeiJing"]
pop 2154
GDP 30320
Name: BeiJing, dtype: int64
data.loc[["BeiJing", "HangZhou"]]
pop | GDP | |
---|---|---|
BeiJing | 2154 | 30320 |
HangZhou | 981 | 13468 |
- 相对索引 df.iloc
data
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
pop | GDP | |
---|---|---|
BeiJing | 2154 | 30320 |
ShangHai | 2424 | 32680 |
ShenZhen | 1303 | 24222 |
HangZhou | 981 | 13468 |
data.iloc[0]
pop 2154
GDP 30320
Name: BeiJing, dtype: int64
data.iloc[[1, 3]]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
pop | GDP | |
---|---|---|
ShangHai | 2424 | 32680 |
HangZhou | 981 | 13468 |
(3)获取标量
data
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
pop | GDP | |
---|---|---|
BeiJing | 2154 | 30320 |
ShangHai | 2424 | 32680 |
ShenZhen | 1303 | 24222 |
HangZhou | 981 | 13468 |
data.loc["BeiJing", "GDP"]
30320
data.iloc[0, 1]
30320
data.values[0][1]
30320
(4)Series对象的索引
type(data.GDP)
pandas.core.series.Series
GDP
BeiJing 30320
ShangHai 32680
ShenZhen 24222
HangZhou 13468
dtype: int64
GDP["BeiJing"]
30320
3、切片
dates = pd.date_range(start='2019-01-01', periods=6)
dates
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
'2019-01-05', '2019-01-06'],
dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=["A", "B", "C", "D"])
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 |
(1)行切片
df["2019-01-01": "2019-01-03"]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 |
df.loc["2019-01-01": "2019-01-03"]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 |
df.iloc[0: 3]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 |
注意:这里的3是取不到的。
(2)列切片
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 |
df.loc[:, "A": "C"]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | |
---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 |
df.iloc[:, 0: 3]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | |
---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 |
(3)多种多样的取值
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 |
- 行、列同时切片
df.loc["2019-01-02": "2019-01-03", "C":"D"]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
C | D | |
---|---|---|
2019-01-02 | 1.080779 | -2.294395 |
2019-01-03 | 1.102248 | 1.207726 |
df.iloc[1: 3, 2:]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
C | D | |
---|---|---|
2019-01-02 | 1.080779 | -2.294395 |
2019-01-03 | 1.102248 | 1.207726 |
- 行切片,列分散取值
df.loc["2019-01-04": "2019-01-06", ["A", "C"]]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | C | |
---|---|---|
2019-01-04 | 0.305088 | -0.978434 |
2019-01-05 | 0.313383 | 0.163155 |
2019-01-06 | 0.250613 | -0.858240 |
df.iloc[3:, [0, 2]]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | C | |
---|---|---|
2019-01-04 | 0.305088 | -0.978434 |
2019-01-05 | 0.313383 | 0.163155 |
2019-01-06 | 0.250613 | -0.858240 |
- 行分散取值,列切片
df.loc[["2019-01-02", "2019-01-06"], "C": "D"]
上面这种方式是行不通的。
df.iloc[[1, 5], 0: 3]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | |
---|---|---|---|
2019-01-02 | -0.234414 | -1.194674 | 1.080779 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 |
- 行、列均分散取值
df.loc[["2019-01-04", "2019-01-06"], ["A", "D"]]
同样,上面这种方式是行不通的。
df.iloc[[1, 5], [0, 3]]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | D | |
---|---|---|
2019-01-02 | -0.234414 | -2.294395 |
2019-01-06 | 0.250613 | -1.573342 |
4、布尔索引
相当于numpy当中的掩码操作。
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 |
df > 0
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | False | False | True | False |
2019-01-02 | False | False | True | False |
2019-01-03 | False | True | True | True |
2019-01-04 | True | True | False | True |
2019-01-05 | True | True | True | False |
2019-01-06 | True | False | False | False |
df[df > 0]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-01 | NaN | NaN | 0.925984 | NaN |
2019-01-02 | NaN | NaN | 1.080779 | NaN |
2019-01-03 | NaN | 0.058118 | 1.102248 | 1.207726 |
2019-01-04 | 0.305088 | 0.535920 | NaN | 0.177251 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | NaN |
2019-01-06 | 0.250613 | NaN | NaN | NaN |
可以观察到,为true的部分都被取到了,而false没有。
df.A > 0
2019-01-01 False
2019-01-02 False
2019-01-03 False
2019-01-04 True
2019-01-05 True
2019-01-06 True
Freq: D, Name: A, dtype: bool
df[df.A > 0]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | |
---|---|---|---|---|
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 |
- isin()方法
df2 = df.copy()
df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three']
df2
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | E | |
---|---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 | one |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 | one |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 | two |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 | three |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 | four |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 | three |
ind = df2["E"].isin(["two", "four"])
ind
2019-01-01 False
2019-01-02 False
2019-01-03 True
2019-01-04 False
2019-01-05 True
2019-01-06 False
Freq: D, Name: E, dtype: bool
df2[ind]
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | E | |
---|---|---|---|---|---|
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 | two |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 | four |
(5)赋值
df
- DataFrame 增加新列
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range('20190101', periods=6))
s1
2019-01-01 1
2019-01-02 2
2019-01-03 3
2019-01-04 4
2019-01-05 5
2019-01-06 6
Freq: D, dtype: int64
df["E"] = s1
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | E | |
---|---|---|---|---|---|
2019-01-01 | -0.935378 | -0.190742 | 0.925984 | -0.818969 | 1 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 | 2 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 | 3 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 | 4 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 | 5 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 | 6 |
- 修改赋值
df.loc["2019-01-01", "A"] = 0
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | E | |
---|---|---|---|---|---|
2019-01-01 | 0.000000 | -0.190742 | 0.925984 | -0.818969 | 1 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 | 2 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 | 3 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 | 4 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 | 5 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 | 6 |
df.iloc[0, 1] = 0
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | E | |
---|---|---|---|---|---|
2019-01-01 | 0.000000 | 0.000000 | 0.925984 | -0.818969 | 1 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | -2.294395 | 2 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 1.207726 | 3 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 0.177251 | 4 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | -0.296649 | 5 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | -1.573342 | 6 |
df["D"] = np.array([5]*len(df)) # 可简化成df["D"] = 5
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | E | |
---|---|---|---|---|---|
2019-01-01 | 0.000000 | 0.000000 | 0.925984 | 5 | 1 |
2019-01-02 | -0.234414 | -1.194674 | 1.080779 | 5 | 2 |
2019-01-03 | -0.141572 | 0.058118 | 1.102248 | 5 | 3 |
2019-01-04 | 0.305088 | 0.535920 | -0.978434 | 5 | 4 |
2019-01-05 | 0.313383 | 0.234041 | 0.163155 | 5 | 5 |
2019-01-06 | 0.250613 | -0.904400 | -0.858240 | 5 | 6 |
- 修改index和columns
df.index = [i for i in range(len(df))]
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
A | B | C | D | E | |
---|---|---|---|---|---|
0 | 0.000000 | 0.000000 | 0.925984 | 5 | 1 |
1 | -0.234414 | -1.194674 | 1.080779 | 5 | 2 |
2 | -0.141572 | 0.058118 | 1.102248 | 5 | 3 |
3 | 0.305088 | 0.535920 | -0.978434 | 5 | 4 |
4 | 0.313383 | 0.234041 | 0.163155 | 5 | 5 |
5 | 0.250613 | -0.904400 | -0.858240 | 5 | 6 |
df.columns = [i for i in range(df.shape[1])]
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 0.000000 | 0.000000 | 0.925984 | 5 | 1 |
1 | -0.234414 | -1.194674 | 1.080779 | 5 | 2 |
2 | -0.141572 | 0.058118 | 1.102248 | 5 | 3 |
3 | 0.305088 | 0.535920 | -0.978434 | 5 | 4 |
4 | 0.313383 | 0.234041 | 0.163155 | 5 | 5 |
5 | 0.250613 | -0.904400 | -0.858240 | 5 | 6 |
- 点赞
- 收藏
- 关注作者
评论(0)