简介
- Pandas是Python的一个数据分析包,该工具为解决数据分析任务而创建。
- Pandas纳入大量库和标准数据模型,提供高效的操作数据集所需的工具。
- Pandas提供大量能使我们快速便捷地处理数据的函数和方法。
- Pandas是字典形式,基于NumPy创建,让NumPy为中心的应用变得更加简单。
- 对于金融行业的用户,pandas提供了大量适用于金融数据的高性能时间序列功能和工具。
- Excel 2007及其以后的版本的最大行数是1048576,最大列数是16384,超过这个规模的数据可使用Pandas
1 | import pandas as pd |
Series
1 | s = pd.Series([1,3,5,np.nan,6,8]) |
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
DataFrame
1 | dates = pd.date_range('20170101', periods=6) |
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
'2017-01-05', '2017-01-06'],
dtype='datetime64[ns]', freq='D')
1 | df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) |
A | B | C | D | |
---|---|---|---|---|
2017-01-01 | -0.119313 | -0.165029 | -1.065599 | 0.224712 |
2017-01-02 | -0.423993 | 0.705954 | -0.349143 | -0.208349 |
2017-01-03 | 0.259465 | 0.896735 | 0.033814 | -1.135184 |
2017-01-04 | -1.568006 | 0.685092 | -0.931491 | 0.323591 |
2017-01-05 | 0.050029 | 1.938641 | -0.193642 | 0.199692 |
2017-01-06 | -0.041061 | -0.845943 | 1.542987 | 0.354031 |
1 | df.head(3) |
A | B | C | D | |
---|---|---|---|---|
2017-01-01 | 0.197225 | -0.666883 | 0.855806 | 0.253851 |
2017-01-02 | -0.291572 | 0.011976 | -1.502192 | 0.051144 |
2017-01-03 | -0.670941 | 1.815718 | 0.503071 | 0.361448 |
1 | df.tail(3) |
A | B | C | D | |
---|---|---|---|---|
2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 |
2017-01-05 | -0.646332 | 1.261231 | 0.058086 | -1.643476 |
2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 |
1 | df2 = pd.DataFrame({ 'A' : 1., |
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
1 | df2.dtypes |
A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
1 | # 列的序号名字 |
Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')
1 | # 行的序号 |
Int64Index([0, 1, 2, 3], dtype='int64')
1 | # 把每个值进行打印出来 |
array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
dtype=object)
1 | # 数字总结 |
A | C | D | |
---|---|---|---|
count | 4.0 | 4.0 | 4.0 |
mean | 1.0 | 1.0 | 3.0 |
std | 0.0 | 0.0 | 0.0 |
min | 1.0 | 1.0 | 3.0 |
25% | 1.0 | 1.0 | 3.0 |
50% | 1.0 | 1.0 | 3.0 |
75% | 1.0 | 1.0 | 3.0 |
max | 1.0 | 1.0 | 3.0 |
1 | # 翻转数据 |
0 | 1 | 2 | 3 | |
---|---|---|---|---|
A | 1 | 1 | 1 | 1 |
B | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 |
C | 1 | 1 | 1 | 1 |
D | 3 | 3 | 3 | 3 |
E | test | train | test | train |
F | foo | foo | foo | foo |
1 | # axis等于1按列进行排序 如ABCDEFG 然后ascending倒叙进行显示 |
F | E | D | C | B | A | |
---|---|---|---|---|---|---|
0 | foo | test | 3 | 1.0 | 2013-01-02 | 1.0 |
1 | foo | train | 3 | 1.0 | 2013-01-02 | 1.0 |
2 | foo | test | 3 | 1.0 | 2013-01-02 | 1.0 |
3 | foo | train | 3 | 1.0 | 2013-01-02 | 1.0 |
1 | #axis等于0按行进行排序 如0123 然后ascending倒叙进行显示 |
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
1 | #按值进行排序 |
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
选择
1 | df['A'] |
2017-01-01 0.549772
2017-01-02 0.969333
2017-01-03 0.106553
2017-01-04 0.465035
2017-01-05 -0.646332
2017-01-06 0.375105
Freq: D, Name: A, dtype: float64
1 | #切片选择 |
A | B | C | D | |
---|---|---|---|---|
2017-01-01 | 0.549772 | -0.677954 | 0.616294 | 1.032977 |
2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 |
2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 |
1 | df['20170102':'20170104'] |
A | B | C | D | |
---|---|---|---|---|
2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 |
2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 |
2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 |
1 | # loc方法为根据行标签label选择 |
A 0.549772
B -0.677954
C 0.616294
D 1.032977
Name: 2017-01-01 00:00:00, dtype: float64
1 | df.loc[dates[0],'A'] |
0.5497720064165944
1 | df.loc[:,['A','B']] |
A | B | |
---|---|---|
2017-01-01 | 0.549772 | -0.677954 |
2017-01-02 | 0.969333 | -0.067681 |
2017-01-03 | 0.106553 | -1.707697 |
2017-01-04 | 0.465035 | 0.465077 |
2017-01-05 | -0.646332 | 1.261231 |
2017-01-06 | 0.375105 | 0.247912 |
1 | df.loc['20170102':'20170104',['A','B']] |
A | B | |
---|---|---|
2017-01-02 | 0.969333 | -0.067681 |
2017-01-03 | 0.106553 | -1.707697 |
2017-01-04 | 0.465035 | 0.465077 |
1 | # iloc方法为根据position选择 |
A 0.465035
B 0.465077
C -0.018313
D 1.338603
Name: 2017-01-04 00:00:00, dtype: float64
1 | df.iloc[3:5,0:2] |
A | B | |
---|---|---|
2017-01-04 | 0.465035 | 0.465077 |
2017-01-05 | -0.646332 | 1.261231 |
1 | df.iloc[[1,2,4],[0,2]] |
A | C | |
---|---|---|
2017-01-02 | 0.969333 | -0.784801 |
2017-01-03 | 0.106553 | 0.668063 |
2017-01-05 | -0.646332 | 0.058086 |
1 | df.iloc[1,1] |
-0.06768075680943383
1 | df.iat[1,1] |
-0.06768075680943383
1 | # 根据布尔表达式筛选 |
A | B | C | D | |
---|---|---|---|---|
2017-01-01 | 0.549772 | NaN | 0.616294 | 1.032977 |
2017-01-02 | 0.969333 | NaN | NaN | 0.800007 |
2017-01-03 | 0.106553 | NaN | 0.668063 | 0.345128 |
2017-01-04 | 0.465035 | 0.465077 | NaN | 1.338603 |
2017-01-05 | NaN | 1.261231 | 0.058086 | NaN |
2017-01-06 | 0.375105 | 0.247912 | 0.998861 | NaN |
1 | df[df.A > 0] |
A | B | C | D | |
---|---|---|---|---|
2017-01-01 | 0.549772 | -0.677954 | 0.616294 | 1.032977 |
2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 |
2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 |
2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 |
2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 |
1 | # 使用isin |
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
设置数据
1 | s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20170102', periods=6)) |
2017-01-02 1
2017-01-03 2
2017-01-04 3
2017-01-05 4
2017-01-06 5
2017-01-07 6
Freq: D, dtype: int64
1 | df['F'] = s1 |
A | B | C | D | F | |
---|---|---|---|---|---|
2017-01-01 | 0.549772 | -0.677954 | 0.616294 | 1.032977 | NaN |
2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 1.0 |
2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 2.0 |
2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 3.0 |
2017-01-05 | -0.646332 | 1.261231 | 0.058086 | -1.643476 | 4.0 |
2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 | 5.0 |
1 | df.at[dates[0],'A'] = 0 |
A | B | C | D | F | |
---|---|---|---|---|---|
2017-01-01 | 0.000000 | -0.677954 | 0.616294 | 1.032977 | NaN |
2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 1.0 |
2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 2.0 |
2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 3.0 |
2017-01-05 | -0.646332 | 1.261231 | 0.058086 | -1.643476 | 4.0 |
2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 | 5.0 |
1 | df.iat[0,1] = 0 |
A | B | C | D | F | |
---|---|---|---|---|---|
2017-01-01 | 0.000000 | 0.000000 | 0.616294 | 1.032977 | NaN |
2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 1.0 |
2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 2.0 |
2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 3.0 |
2017-01-05 | -0.646332 | 1.261231 | 0.058086 | -1.643476 | 4.0 |
2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 | 5.0 |
1 | # 使用numpy数组 |
A | B | C | D | F | |
---|---|---|---|---|---|
2017-01-01 | 0.000000 | 0.000000 | 0.616294 | 5 | NaN |
2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 5 | 1.0 |
2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 5 | 2.0 |
2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 5 | 3.0 |
2017-01-05 | -0.646332 | 1.261231 | 0.058086 | 5 | 4.0 |
2017-01-06 | 0.375105 | 0.247912 | 0.998861 | 5 | 5.0 |
1 | # where语句 |
A | B | C | D | F | |
---|---|---|---|---|---|
2017-01-01 | 0.000000 | 0.000000 | -0.616294 | -5 | NaN |
2017-01-02 | -0.969333 | -0.067681 | -0.784801 | -5 | -1.0 |
2017-01-03 | -0.106553 | -1.707697 | -0.668063 | -5 | -2.0 |
2017-01-04 | -0.465035 | -0.465077 | -0.018313 | -5 | -3.0 |
2017-01-05 | -0.646332 | -1.261231 | -0.058086 | -5 | -4.0 |
2017-01-06 | -0.375105 | -0.247912 | -0.998861 | -5 | -5.0 |
处理丢失数据
1 | df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E']) |
A | B | C | D | F | E | |
---|---|---|---|---|---|---|
2017-01-01 | 0.000000 | 0.000000 | -0.616294 | -5 | NaN | 1.0 |
2017-01-02 | -0.969333 | -0.067681 | -0.784801 | -5 | -1.0 | 1.0 |
2017-01-03 | -0.106553 | -1.707697 | -0.668063 | -5 | -2.0 | NaN |
2017-01-04 | -0.465035 | -0.465077 | -0.018313 | -5 | -3.0 | NaN |
1 | df1.dropna(how='any') |
A | B | C | D | F | E | |
---|---|---|---|---|---|---|
2017-01-02 | -0.969333 | -0.067681 | -0.784801 | -5 | -1.0 | 1.0 |
1 | df1.fillna(value=5) |
A | B | C | D | F | E | |
---|---|---|---|---|---|---|
2017-01-01 | 0.000000 | 0.000000 | -0.616294 | -5 | 5.0 | 1.0 |
2017-01-02 | -0.969333 | -0.067681 | -0.784801 | -5 | -1.0 | 1.0 |
2017-01-03 | -0.106553 | -1.707697 | -0.668063 | -5 | -2.0 | 5.0 |
2017-01-04 | -0.465035 | -0.465077 | -0.018313 | -5 | -3.0 | 5.0 |
1 | pd.isna(df1) |
A | B | C | D | F | E | |
---|---|---|---|---|---|---|
2017-01-01 | False | False | False | False | True | False |
2017-01-02 | False | False | False | False | False | False |
2017-01-03 | False | False | False | False | False | True |
2017-01-04 | False | False | False | False | False | True |
运算
1 | # 统计 |
A | B | C | D | |
---|---|---|---|---|
2017-01-01 | 0 | 2 | 2 | 6 |
2017-01-02 | 5 | 4 | 9 | 5 |
2017-01-03 | 2 | 4 | 4 | 5 |
2017-01-04 | 8 | 4 | 0 | 2 |
2017-01-05 | 5 | 1 | 5 | 9 |
2017-01-06 | 8 | 5 | 6 | 6 |
1 | df.mean() |
A 4.666667
B 3.333333
C 4.333333
D 5.500000
dtype: float64
1 | df.mean(1) |
2017-01-01 2.50
2017-01-02 5.75
2017-01-03 3.75
2017-01-04 3.50
2017-01-05 5.00
2017-01-06 6.25
Freq: D, dtype: float64
1 | s = pd.Series([1,3,5,np.nan,6,8], index=dates) |
2017-01-01 1.0
2017-01-02 3.0
2017-01-03 5.0
2017-01-04 NaN
2017-01-05 6.0
2017-01-06 8.0
Freq: D, dtype: float64
1 | s=s.shift(2) |
2017-01-01 NaN
2017-01-02 NaN
2017-01-03 1.0
2017-01-04 3.0
2017-01-05 5.0
2017-01-06 NaN
Freq: D, dtype: float64
1 | df.sub(s, axis='index') |
A | B | C | D | |
---|---|---|---|---|
2017-01-01 | NaN | NaN | NaN | NaN |
2017-01-02 | NaN | NaN | NaN | NaN |
2017-01-03 | 1.0 | 3.0 | 3.0 | 4.0 |
2017-01-04 | 5.0 | 1.0 | -3.0 | -1.0 |
2017-01-05 | 0.0 | -4.0 | 0.0 | 4.0 |
2017-01-06 | NaN | NaN | NaN | NaN |
1 | # apply函数 |
A | B | C | D | |
---|---|---|---|---|
2017-01-01 | 0 | 2 | 2 | 6 |
2017-01-02 | 5 | 6 | 11 | 11 |
2017-01-03 | 7 | 10 | 15 | 16 |
2017-01-04 | 15 | 14 | 15 | 18 |
2017-01-05 | 20 | 15 | 20 | 27 |
2017-01-06 | 28 | 20 | 26 | 33 |
1 | df.apply(lambda x: x.max() - x.min()) |
A 8
B 4
C 9
D 7
dtype: int64
1 | #计数 |
0 3
1 0
2 3
3 0
4 0
5 0
6 2
7 6
8 2
9 6
dtype: int64
1 | s.value_counts() |
0 4
6 2
3 2
2 2
dtype: int64
1 | #字符串方法 |
0 A
1 B
2 C
3 Aaba
4 Baca
5 NaN
6 CABA
7 dog
8 cat
dtype: object
1 | s.str.lower() |
0 a
1 b
2 c
3 aaba
4 baca
5 NaN
6 caba
7 dog
8 cat
dtype: object
合并
分为concat拼接、merge连接、append附加
1 | #concat |
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | -0.085320 | 1.206620 | 1.618956 | -0.020682 |
1 | -0.026634 | -1.078752 | -0.996088 | -0.252160 |
2 | 0.245989 | 0.846779 | 1.552243 | -1.210550 |
3 | 0.043176 | 1.536280 | -0.664456 | 1.274851 |
4 | -0.248457 | 0.061227 | 0.055422 | -0.191773 |
5 | -1.047112 | 0.389239 | 1.542080 | -0.545886 |
6 | -0.669885 | -0.376372 | 1.258315 | 0.035060 |
7 | 0.529041 | 1.427947 | 0.151411 | -1.840725 |
8 | 0.435908 | -0.023168 | 0.172505 | 1.062092 |
9 | -0.171858 | -0.599918 | 1.004990 | 0.233463 |
1 | pieces = [df[:3], df[3:7], df[7:]] |
1 | pieces[0] |
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | -0.085320 | 1.206620 | 1.618956 | -0.020682 |
1 | -0.026634 | -1.078752 | -0.996088 | -0.252160 |
2 | 0.245989 | 0.846779 | 1.552243 | -1.210550 |
1 | pieces[1] |
0 | 1 | 2 | 3 | |
---|---|---|---|---|
3 | 0.043176 | 1.536280 | -0.664456 | 1.274851 |
4 | -0.248457 | 0.061227 | 0.055422 | -0.191773 |
5 | -1.047112 | 0.389239 | 1.542080 | -0.545886 |
6 | -0.669885 | -0.376372 | 1.258315 | 0.035060 |
1 | pieces[2] |
0 | 1 | 2 | 3 | |
---|---|---|---|---|
7 | 0.529041 | 1.427947 | 0.151411 | -1.840725 |
8 | 0.435908 | -0.023168 | 0.172505 | 1.062092 |
9 | -0.171858 | -0.599918 | 1.004990 | 0.233463 |
1 | pd.concat(pieces) |
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | -0.085320 | 1.206620 | 1.618956 | -0.020682 |
1 | -0.026634 | -1.078752 | -0.996088 | -0.252160 |
2 | 0.245989 | 0.846779 | 1.552243 | -1.210550 |
3 | 0.043176 | 1.536280 | -0.664456 | 1.274851 |
4 | -0.248457 | 0.061227 | 0.055422 | -0.191773 |
5 | -1.047112 | 0.389239 | 1.542080 | -0.545886 |
6 | -0.669885 | -0.376372 | 1.258315 | 0.035060 |
7 | 0.529041 | 1.427947 | 0.151411 | -1.840725 |
8 | 0.435908 | -0.023168 | 0.172505 | 1.062092 |
9 | -0.171858 | -0.599918 | 1.004990 | 0.233463 |
1 | #merge |
1 | left |
key | lval | |
---|---|---|
0 | foo | 1 |
1 | foo | 2 |
1 | right |
key | rval | |
---|---|---|
0 | foo | 4 |
1 | foo | 5 |
1 | pd.merge(left, right, on='key') |
key | lval | rval | |
---|---|---|---|
0 | foo | 1 | 4 |
1 | foo | 1 | 5 |
2 | foo | 2 | 4 |
3 | foo | 2 | 5 |
1 | #append |
A | B | C | D | |
---|---|---|---|---|
0 | -0.279258 | -0.352347 | -1.866664 | -0.756606 |
1 | 0.007247 | -1.684264 | -0.671554 | 1.103426 |
2 | 0.716867 | -1.119185 | -0.062760 | -0.424570 |
3 | 0.346070 | -0.160277 | 0.480741 | -3.739629 |
4 | -1.947606 | -0.931807 | 1.361200 | -0.944480 |
5 | 1.063839 | 0.042823 | -0.485761 | 0.014667 |
6 | -0.171306 | -1.320569 | 0.861081 | -0.821154 |
7 | -0.008423 | -1.027952 | 0.233170 | 0.620665 |
1 | s = df.iloc[3] |
A | B | C | D | |
---|---|---|---|---|
0 | -0.279258 | -0.352347 | -1.866664 | -0.756606 |
1 | 0.007247 | -1.684264 | -0.671554 | 1.103426 |
2 | 0.716867 | -1.119185 | -0.062760 | -0.424570 |
3 | 0.346070 | -0.160277 | 0.480741 | -3.739629 |
4 | -1.947606 | -0.931807 | 1.361200 | -0.944480 |
5 | 1.063839 | 0.042823 | -0.485761 | 0.014667 |
6 | -0.171306 | -1.320569 | 0.861081 | -0.821154 |
7 | -0.008423 | -1.027952 | 0.233170 | 0.620665 |
8 | 0.346070 | -0.160277 | 0.480741 | -3.739629 |
分组
1 | df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'], |
A | B | C | D | |
---|---|---|---|---|
0 | foo | one | -0.375015 | -0.371934 |
1 | bar | one | -0.837523 | 0.733000 |
2 | foo | two | -0.248089 | -1.269897 |
3 | bar | three | 0.320737 | 0.305018 |
4 | foo | two | -2.005969 | -1.319570 |
5 | bar | two | -1.011349 | -0.159672 |
6 | foo | one | 2.018282 | 1.061732 |
7 | foo | three | -0.642915 | -0.626796 |
1 | df.groupby('A').sum() |
C | D | |
---|---|---|
A | ||
bar | -1.528135 | 0.878347 |
foo | -1.253705 | -2.526466 |
1 | df.groupby(['A','B']).sum() |
C | D | ||
---|---|---|---|
A | B | ||
bar | one | -0.837523 | 0.733000 |
three | 0.320737 | 0.305018 | |
two | -1.011349 | -0.159672 | |
foo | one | 1.643267 | 0.689798 |
three | -0.642915 | -0.626796 | |
two | -2.254058 | -2.589467 |
pivot table
1 | df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3, |
A | B | C | D | E | |
---|---|---|---|---|---|
0 | one | A | foo | 0.390424 | -0.611496 |
1 | one | B | foo | 0.261406 | -1.746241 |
2 | two | C | foo | 1.013178 | -1.047726 |
3 | three | A | bar | 0.874502 | -0.251756 |
4 | one | B | bar | 0.990922 | 0.956082 |
5 | one | C | bar | 0.041987 | 1.418088 |
6 | two | A | foo | 0.487429 | 0.315077 |
7 | three | B | foo | 0.072805 | -0.097474 |
8 | one | C | foo | -0.666952 | 1.655099 |
9 | one | A | bar | 1.647375 | -1.670791 |
10 | two | B | bar | -0.907300 | -1.481746 |
11 | three | C | bar | 0.214953 | -0.420732 |
1 | pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C']) |
C | bar | foo | |
---|---|---|---|
A | B | ||
one | A | 1.647375 | 0.390424 |
B | 0.990922 | 0.261406 | |
C | 0.041987 | -0.666952 | |
three | A | 0.874502 | NaN |
B | NaN | 0.072805 | |
C | 0.214953 | NaN | |
two | A | NaN | 0.487429 |
B | -0.907300 | NaN | |
C | NaN | 1.013178 |
plotting
1 | import numpy as np |
导出和导入,支持csv、hdf5和excel
1 | df.to_csv('file/foo.csv') |
Unnamed: 0 | A | B | C | D | E | |
---|---|---|---|---|---|---|
0 | 0 | one | A | foo | 0.390424 | -0.611496 |
1 | 1 | one | B | foo | 0.261406 | -1.746241 |
2 | 2 | two | C | foo | 1.013178 | -1.047726 |
3 | 3 | three | A | bar | 0.874502 | -0.251756 |
4 | 4 | one | B | bar | 0.990922 | 0.956082 |
5 | 5 | one | C | bar | 0.041987 | 1.418088 |
6 | 6 | two | A | foo | 0.487429 | 0.315077 |
7 | 7 | three | B | foo | 0.072805 | -0.097474 |
8 | 8 | one | C | foo | -0.666952 | 1.655099 |
9 | 9 | one | A | bar | 1.647375 | -1.670791 |
10 | 10 | two | B | bar | -0.907300 | -1.481746 |
11 | 11 | three | C | bar | 0.214953 | -0.420732 |
1 | #excel导入导出,导出需要安装openpyxl库,导入需要安装xlrd库 |
读取MySQL
1 | import pandas as pd |