简介
- Pandas是Python的一个数据分析包,该工具为解决数据分析任务而创建。
 - Pandas纳入大量库和标准数据模型,提供高效的操作数据集所需的工具。
 - Pandas提供大量能使我们快速便捷地处理数据的函数和方法。
 - Pandas是字典形式,基于NumPy创建,让NumPy为中心的应用变得更加简单。
 - 对于金融行业的用户,pandas提供了大量适用于金融数据的高性能时间序列功能和工具。
 - Excel 2007及其以后的版本的最大行数是1048576,最大列数是16384,超过这个规模的数据可使用Pandas
 
1  | import pandas as pd  | 
Series
1  | s = pd.Series([1,3,5,np.nan,6,8])  | 
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64
DataFrame
1  | dates = pd.date_range('20170101', periods=6)  | 
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
               '2017-01-05', '2017-01-06'],
              dtype='datetime64[ns]', freq='D')
1  | df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-01 | -0.119313 | -0.165029 | -1.065599 | 0.224712 | 
| 2017-01-02 | -0.423993 | 0.705954 | -0.349143 | -0.208349 | 
| 2017-01-03 | 0.259465 | 0.896735 | 0.033814 | -1.135184 | 
| 2017-01-04 | -1.568006 | 0.685092 | -0.931491 | 0.323591 | 
| 2017-01-05 | 0.050029 | 1.938641 | -0.193642 | 0.199692 | 
| 2017-01-06 | -0.041061 | -0.845943 | 1.542987 | 0.354031 | 
1  | df.head(3)  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-01 | 0.197225 | -0.666883 | 0.855806 | 0.253851 | 
| 2017-01-02 | -0.291572 | 0.011976 | -1.502192 | 0.051144 | 
| 2017-01-03 | -0.670941 | 1.815718 | 0.503071 | 0.361448 | 
1  | df.tail(3)  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 
| 2017-01-05 | -0.646332 | 1.261231 | 0.058086 | -1.643476 | 
| 2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 | 
1  | df2 = pd.DataFrame({ 'A' : 1.,  | 
| A | B | C | D | E | F | |
|---|---|---|---|---|---|---|
| 0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo | 
| 1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo | 
| 2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo | 
| 3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo | 
1  | df2.dtypes  | 
A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object
1  | # 列的序号名字  | 
Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')
1  | # 行的序号  | 
Int64Index([0, 1, 2, 3], dtype='int64')
1  | # 把每个值进行打印出来  | 
array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)
1  | # 数字总结  | 
| A | C | D | |
|---|---|---|---|
| count | 4.0 | 4.0 | 4.0 | 
| mean | 1.0 | 1.0 | 3.0 | 
| std | 0.0 | 0.0 | 0.0 | 
| min | 1.0 | 1.0 | 3.0 | 
| 25% | 1.0 | 1.0 | 3.0 | 
| 50% | 1.0 | 1.0 | 3.0 | 
| 75% | 1.0 | 1.0 | 3.0 | 
| max | 1.0 | 1.0 | 3.0 | 
1  | # 翻转数据  | 
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| A | 1 | 1 | 1 | 1 | 
| B | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 | 
| C | 1 | 1 | 1 | 1 | 
| D | 3 | 3 | 3 | 3 | 
| E | test | train | test | train | 
| F | foo | foo | foo | foo | 
1  | # axis等于1按列进行排序 如ABCDEFG 然后ascending倒叙进行显示  | 
| F | E | D | C | B | A | |
|---|---|---|---|---|---|---|
| 0 | foo | test | 3 | 1.0 | 2013-01-02 | 1.0 | 
| 1 | foo | train | 3 | 1.0 | 2013-01-02 | 1.0 | 
| 2 | foo | test | 3 | 1.0 | 2013-01-02 | 1.0 | 
| 3 | foo | train | 3 | 1.0 | 2013-01-02 | 1.0 | 
1  | #axis等于0按行进行排序 如0123 然后ascending倒叙进行显示  | 
| A | B | C | D | E | F | |
|---|---|---|---|---|---|---|
| 3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo | 
| 2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo | 
| 1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo | 
| 0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo | 
1  | #按值进行排序  | 
| A | B | C | D | E | F | |
|---|---|---|---|---|---|---|
| 0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo | 
| 2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo | 
| 1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo | 
| 3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo | 
选择
1  | df['A']  | 
2017-01-01    0.549772
2017-01-02    0.969333
2017-01-03    0.106553
2017-01-04    0.465035
2017-01-05   -0.646332
2017-01-06    0.375105
Freq: D, Name: A, dtype: float64
1  | #切片选择  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-01 | 0.549772 | -0.677954 | 0.616294 | 1.032977 | 
| 2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 
| 2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 
1  | df['20170102':'20170104']  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 
| 2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 
| 2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 
1  | # loc方法为根据行标签label选择  | 
A    0.549772
B   -0.677954
C    0.616294
D    1.032977
Name: 2017-01-01 00:00:00, dtype: float64
1  | df.loc[dates[0],'A']  | 
0.5497720064165944
1  | df.loc[:,['A','B']]  | 
| A | B | |
|---|---|---|
| 2017-01-01 | 0.549772 | -0.677954 | 
| 2017-01-02 | 0.969333 | -0.067681 | 
| 2017-01-03 | 0.106553 | -1.707697 | 
| 2017-01-04 | 0.465035 | 0.465077 | 
| 2017-01-05 | -0.646332 | 1.261231 | 
| 2017-01-06 | 0.375105 | 0.247912 | 
1  | df.loc['20170102':'20170104',['A','B']]  | 
| A | B | |
|---|---|---|
| 2017-01-02 | 0.969333 | -0.067681 | 
| 2017-01-03 | 0.106553 | -1.707697 | 
| 2017-01-04 | 0.465035 | 0.465077 | 
1  | # iloc方法为根据position选择  | 
A    0.465035
B    0.465077
C   -0.018313
D    1.338603
Name: 2017-01-04 00:00:00, dtype: float64
1  | df.iloc[3:5,0:2]  | 
| A | B | |
|---|---|---|
| 2017-01-04 | 0.465035 | 0.465077 | 
| 2017-01-05 | -0.646332 | 1.261231 | 
1  | df.iloc[[1,2,4],[0,2]]  | 
| A | C | |
|---|---|---|
| 2017-01-02 | 0.969333 | -0.784801 | 
| 2017-01-03 | 0.106553 | 0.668063 | 
| 2017-01-05 | -0.646332 | 0.058086 | 
1  | df.iloc[1,1]  | 
-0.06768075680943383
1  | df.iat[1,1]  | 
-0.06768075680943383
1  | # 根据布尔表达式筛选  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-01 | 0.549772 | NaN | 0.616294 | 1.032977 | 
| 2017-01-02 | 0.969333 | NaN | NaN | 0.800007 | 
| 2017-01-03 | 0.106553 | NaN | 0.668063 | 0.345128 | 
| 2017-01-04 | 0.465035 | 0.465077 | NaN | 1.338603 | 
| 2017-01-05 | NaN | 1.261231 | 0.058086 | NaN | 
| 2017-01-06 | 0.375105 | 0.247912 | 0.998861 | NaN | 
1  | df[df.A > 0]  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-01 | 0.549772 | -0.677954 | 0.616294 | 1.032977 | 
| 2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 
| 2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 
| 2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 
| 2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 | 
1  | # 使用isin  | 
| A | B | C | D | E | F | |
|---|---|---|---|---|---|---|
| 1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo | 
| 3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo | 
设置数据
1  | s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20170102', periods=6))  | 
2017-01-02    1
2017-01-03    2
2017-01-04    3
2017-01-05    4
2017-01-06    5
2017-01-07    6
Freq: D, dtype: int64
1  | df['F'] = s1  | 
| A | B | C | D | F | |
|---|---|---|---|---|---|
| 2017-01-01 | 0.549772 | -0.677954 | 0.616294 | 1.032977 | NaN | 
| 2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 1.0 | 
| 2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 2.0 | 
| 2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 3.0 | 
| 2017-01-05 | -0.646332 | 1.261231 | 0.058086 | -1.643476 | 4.0 | 
| 2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 | 5.0 | 
1  | df.at[dates[0],'A'] = 0  | 
| A | B | C | D | F | |
|---|---|---|---|---|---|
| 2017-01-01 | 0.000000 | -0.677954 | 0.616294 | 1.032977 | NaN | 
| 2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 1.0 | 
| 2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 2.0 | 
| 2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 3.0 | 
| 2017-01-05 | -0.646332 | 1.261231 | 0.058086 | -1.643476 | 4.0 | 
| 2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 | 5.0 | 
1  | df.iat[0,1] = 0  | 
| A | B | C | D | F | |
|---|---|---|---|---|---|
| 2017-01-01 | 0.000000 | 0.000000 | 0.616294 | 1.032977 | NaN | 
| 2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 0.800007 | 1.0 | 
| 2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 0.345128 | 2.0 | 
| 2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 1.338603 | 3.0 | 
| 2017-01-05 | -0.646332 | 1.261231 | 0.058086 | -1.643476 | 4.0 | 
| 2017-01-06 | 0.375105 | 0.247912 | 0.998861 | -0.114123 | 5.0 | 
1  | # 使用numpy数组  | 
| A | B | C | D | F | |
|---|---|---|---|---|---|
| 2017-01-01 | 0.000000 | 0.000000 | 0.616294 | 5 | NaN | 
| 2017-01-02 | 0.969333 | -0.067681 | -0.784801 | 5 | 1.0 | 
| 2017-01-03 | 0.106553 | -1.707697 | 0.668063 | 5 | 2.0 | 
| 2017-01-04 | 0.465035 | 0.465077 | -0.018313 | 5 | 3.0 | 
| 2017-01-05 | -0.646332 | 1.261231 | 0.058086 | 5 | 4.0 | 
| 2017-01-06 | 0.375105 | 0.247912 | 0.998861 | 5 | 5.0 | 
1  | # where语句  | 
| A | B | C | D | F | |
|---|---|---|---|---|---|
| 2017-01-01 | 0.000000 | 0.000000 | -0.616294 | -5 | NaN | 
| 2017-01-02 | -0.969333 | -0.067681 | -0.784801 | -5 | -1.0 | 
| 2017-01-03 | -0.106553 | -1.707697 | -0.668063 | -5 | -2.0 | 
| 2017-01-04 | -0.465035 | -0.465077 | -0.018313 | -5 | -3.0 | 
| 2017-01-05 | -0.646332 | -1.261231 | -0.058086 | -5 | -4.0 | 
| 2017-01-06 | -0.375105 | -0.247912 | -0.998861 | -5 | -5.0 | 
处理丢失数据
1  | df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])  | 
| A | B | C | D | F | E | |
|---|---|---|---|---|---|---|
| 2017-01-01 | 0.000000 | 0.000000 | -0.616294 | -5 | NaN | 1.0 | 
| 2017-01-02 | -0.969333 | -0.067681 | -0.784801 | -5 | -1.0 | 1.0 | 
| 2017-01-03 | -0.106553 | -1.707697 | -0.668063 | -5 | -2.0 | NaN | 
| 2017-01-04 | -0.465035 | -0.465077 | -0.018313 | -5 | -3.0 | NaN | 
1  | df1.dropna(how='any')  | 
| A | B | C | D | F | E | |
|---|---|---|---|---|---|---|
| 2017-01-02 | -0.969333 | -0.067681 | -0.784801 | -5 | -1.0 | 1.0 | 
1  | df1.fillna(value=5)  | 
| A | B | C | D | F | E | |
|---|---|---|---|---|---|---|
| 2017-01-01 | 0.000000 | 0.000000 | -0.616294 | -5 | 5.0 | 1.0 | 
| 2017-01-02 | -0.969333 | -0.067681 | -0.784801 | -5 | -1.0 | 1.0 | 
| 2017-01-03 | -0.106553 | -1.707697 | -0.668063 | -5 | -2.0 | 5.0 | 
| 2017-01-04 | -0.465035 | -0.465077 | -0.018313 | -5 | -3.0 | 5.0 | 
1  | pd.isna(df1)  | 
| A | B | C | D | F | E | |
|---|---|---|---|---|---|---|
| 2017-01-01 | False | False | False | False | True | False | 
| 2017-01-02 | False | False | False | False | False | False | 
| 2017-01-03 | False | False | False | False | False | True | 
| 2017-01-04 | False | False | False | False | False | True | 
运算
1  | # 统计  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-01 | 0 | 2 | 2 | 6 | 
| 2017-01-02 | 5 | 4 | 9 | 5 | 
| 2017-01-03 | 2 | 4 | 4 | 5 | 
| 2017-01-04 | 8 | 4 | 0 | 2 | 
| 2017-01-05 | 5 | 1 | 5 | 9 | 
| 2017-01-06 | 8 | 5 | 6 | 6 | 
1  | df.mean()  | 
A    4.666667
B    3.333333
C    4.333333
D    5.500000
dtype: float64
1  | df.mean(1)  | 
2017-01-01    2.50
2017-01-02    5.75
2017-01-03    3.75
2017-01-04    3.50
2017-01-05    5.00
2017-01-06    6.25
Freq: D, dtype: float64
1  | s = pd.Series([1,3,5,np.nan,6,8], index=dates)  | 
2017-01-01    1.0
2017-01-02    3.0
2017-01-03    5.0
2017-01-04    NaN
2017-01-05    6.0
2017-01-06    8.0
Freq: D, dtype: float64
1  | s=s.shift(2)  | 
2017-01-01    NaN
2017-01-02    NaN
2017-01-03    1.0
2017-01-04    3.0
2017-01-05    5.0
2017-01-06    NaN
Freq: D, dtype: float64
1  | df.sub(s, axis='index')  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-01 | NaN | NaN | NaN | NaN | 
| 2017-01-02 | NaN | NaN | NaN | NaN | 
| 2017-01-03 | 1.0 | 3.0 | 3.0 | 4.0 | 
| 2017-01-04 | 5.0 | 1.0 | -3.0 | -1.0 | 
| 2017-01-05 | 0.0 | -4.0 | 0.0 | 4.0 | 
| 2017-01-06 | NaN | NaN | NaN | NaN | 
1  | # apply函数  | 
| A | B | C | D | |
|---|---|---|---|---|
| 2017-01-01 | 0 | 2 | 2 | 6 | 
| 2017-01-02 | 5 | 6 | 11 | 11 | 
| 2017-01-03 | 7 | 10 | 15 | 16 | 
| 2017-01-04 | 15 | 14 | 15 | 18 | 
| 2017-01-05 | 20 | 15 | 20 | 27 | 
| 2017-01-06 | 28 | 20 | 26 | 33 | 
1  | df.apply(lambda x: x.max() - x.min())  | 
A    8
B    4
C    9
D    7
dtype: int64
1  | #计数  | 
0    3
1    0
2    3
3    0
4    0
5    0
6    2
7    6
8    2
9    6
dtype: int64
1  | s.value_counts()  | 
0    4
6    2
3    2
2    2
dtype: int64
1  | #字符串方法  | 
0       A
1       B
2       C
3    Aaba
4    Baca
5     NaN
6    CABA
7     dog
8     cat
dtype: object
1  | s.str.lower()  | 
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object
合并
分为concat拼接、merge连接、append附加
1  | #concat  | 
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | -0.085320 | 1.206620 | 1.618956 | -0.020682 | 
| 1 | -0.026634 | -1.078752 | -0.996088 | -0.252160 | 
| 2 | 0.245989 | 0.846779 | 1.552243 | -1.210550 | 
| 3 | 0.043176 | 1.536280 | -0.664456 | 1.274851 | 
| 4 | -0.248457 | 0.061227 | 0.055422 | -0.191773 | 
| 5 | -1.047112 | 0.389239 | 1.542080 | -0.545886 | 
| 6 | -0.669885 | -0.376372 | 1.258315 | 0.035060 | 
| 7 | 0.529041 | 1.427947 | 0.151411 | -1.840725 | 
| 8 | 0.435908 | -0.023168 | 0.172505 | 1.062092 | 
| 9 | -0.171858 | -0.599918 | 1.004990 | 0.233463 | 
1  | pieces = [df[:3], df[3:7], df[7:]]  | 
1  | pieces[0]  | 
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | -0.085320 | 1.206620 | 1.618956 | -0.020682 | 
| 1 | -0.026634 | -1.078752 | -0.996088 | -0.252160 | 
| 2 | 0.245989 | 0.846779 | 1.552243 | -1.210550 | 
1  | pieces[1]  | 
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 3 | 0.043176 | 1.536280 | -0.664456 | 1.274851 | 
| 4 | -0.248457 | 0.061227 | 0.055422 | -0.191773 | 
| 5 | -1.047112 | 0.389239 | 1.542080 | -0.545886 | 
| 6 | -0.669885 | -0.376372 | 1.258315 | 0.035060 | 
1  | pieces[2]  | 
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 7 | 0.529041 | 1.427947 | 0.151411 | -1.840725 | 
| 8 | 0.435908 | -0.023168 | 0.172505 | 1.062092 | 
| 9 | -0.171858 | -0.599918 | 1.004990 | 0.233463 | 
1  | pd.concat(pieces)  | 
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 0 | -0.085320 | 1.206620 | 1.618956 | -0.020682 | 
| 1 | -0.026634 | -1.078752 | -0.996088 | -0.252160 | 
| 2 | 0.245989 | 0.846779 | 1.552243 | -1.210550 | 
| 3 | 0.043176 | 1.536280 | -0.664456 | 1.274851 | 
| 4 | -0.248457 | 0.061227 | 0.055422 | -0.191773 | 
| 5 | -1.047112 | 0.389239 | 1.542080 | -0.545886 | 
| 6 | -0.669885 | -0.376372 | 1.258315 | 0.035060 | 
| 7 | 0.529041 | 1.427947 | 0.151411 | -1.840725 | 
| 8 | 0.435908 | -0.023168 | 0.172505 | 1.062092 | 
| 9 | -0.171858 | -0.599918 | 1.004990 | 0.233463 | 
1  | #merge  | 
1  | left  | 
| key | lval | |
|---|---|---|
| 0 | foo | 1 | 
| 1 | foo | 2 | 
1  | right  | 
| key | rval | |
|---|---|---|
| 0 | foo | 4 | 
| 1 | foo | 5 | 
1  | pd.merge(left, right, on='key')  | 
| key | lval | rval | |
|---|---|---|---|
| 0 | foo | 1 | 4 | 
| 1 | foo | 1 | 5 | 
| 2 | foo | 2 | 4 | 
| 3 | foo | 2 | 5 | 
1  | #append  | 
| A | B | C | D | |
|---|---|---|---|---|
| 0 | -0.279258 | -0.352347 | -1.866664 | -0.756606 | 
| 1 | 0.007247 | -1.684264 | -0.671554 | 1.103426 | 
| 2 | 0.716867 | -1.119185 | -0.062760 | -0.424570 | 
| 3 | 0.346070 | -0.160277 | 0.480741 | -3.739629 | 
| 4 | -1.947606 | -0.931807 | 1.361200 | -0.944480 | 
| 5 | 1.063839 | 0.042823 | -0.485761 | 0.014667 | 
| 6 | -0.171306 | -1.320569 | 0.861081 | -0.821154 | 
| 7 | -0.008423 | -1.027952 | 0.233170 | 0.620665 | 
1  | s = df.iloc[3]  | 
| A | B | C | D | |
|---|---|---|---|---|
| 0 | -0.279258 | -0.352347 | -1.866664 | -0.756606 | 
| 1 | 0.007247 | -1.684264 | -0.671554 | 1.103426 | 
| 2 | 0.716867 | -1.119185 | -0.062760 | -0.424570 | 
| 3 | 0.346070 | -0.160277 | 0.480741 | -3.739629 | 
| 4 | -1.947606 | -0.931807 | 1.361200 | -0.944480 | 
| 5 | 1.063839 | 0.042823 | -0.485761 | 0.014667 | 
| 6 | -0.171306 | -1.320569 | 0.861081 | -0.821154 | 
| 7 | -0.008423 | -1.027952 | 0.233170 | 0.620665 | 
| 8 | 0.346070 | -0.160277 | 0.480741 | -3.739629 | 
分组
1  | df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],  | 
| A | B | C | D | |
|---|---|---|---|---|
| 0 | foo | one | -0.375015 | -0.371934 | 
| 1 | bar | one | -0.837523 | 0.733000 | 
| 2 | foo | two | -0.248089 | -1.269897 | 
| 3 | bar | three | 0.320737 | 0.305018 | 
| 4 | foo | two | -2.005969 | -1.319570 | 
| 5 | bar | two | -1.011349 | -0.159672 | 
| 6 | foo | one | 2.018282 | 1.061732 | 
| 7 | foo | three | -0.642915 | -0.626796 | 
1  | df.groupby('A').sum()  | 
| C | D | |
|---|---|---|
| A | ||
| bar | -1.528135 | 0.878347 | 
| foo | -1.253705 | -2.526466 | 
1  | df.groupby(['A','B']).sum()  | 
| C | D | ||
|---|---|---|---|
| A | B | ||
| bar | one | -0.837523 | 0.733000 | 
| three | 0.320737 | 0.305018 | |
| two | -1.011349 | -0.159672 | |
| foo | one | 1.643267 | 0.689798 | 
| three | -0.642915 | -0.626796 | |
| two | -2.254058 | -2.589467 | 
pivot table
1  | df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,  | 
| A | B | C | D | E | |
|---|---|---|---|---|---|
| 0 | one | A | foo | 0.390424 | -0.611496 | 
| 1 | one | B | foo | 0.261406 | -1.746241 | 
| 2 | two | C | foo | 1.013178 | -1.047726 | 
| 3 | three | A | bar | 0.874502 | -0.251756 | 
| 4 | one | B | bar | 0.990922 | 0.956082 | 
| 5 | one | C | bar | 0.041987 | 1.418088 | 
| 6 | two | A | foo | 0.487429 | 0.315077 | 
| 7 | three | B | foo | 0.072805 | -0.097474 | 
| 8 | one | C | foo | -0.666952 | 1.655099 | 
| 9 | one | A | bar | 1.647375 | -1.670791 | 
| 10 | two | B | bar | -0.907300 | -1.481746 | 
| 11 | three | C | bar | 0.214953 | -0.420732 | 
1  | pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])  | 
| C | bar | foo | |
|---|---|---|---|
| A | B | ||
| one | A | 1.647375 | 0.390424 | 
| B | 0.990922 | 0.261406 | |
| C | 0.041987 | -0.666952 | |
| three | A | 0.874502 | NaN | 
| B | NaN | 0.072805 | |
| C | 0.214953 | NaN | |
| two | A | NaN | 0.487429 | 
| B | -0.907300 | NaN | |
| C | NaN | 1.013178 | 
plotting
1  | import numpy as np  | 

导出和导入,支持csv、hdf5和excel
1  | df.to_csv('file/foo.csv')  | 
| Unnamed: 0 | A | B | C | D | E | |
|---|---|---|---|---|---|---|
| 0 | 0 | one | A | foo | 0.390424 | -0.611496 | 
| 1 | 1 | one | B | foo | 0.261406 | -1.746241 | 
| 2 | 2 | two | C | foo | 1.013178 | -1.047726 | 
| 3 | 3 | three | A | bar | 0.874502 | -0.251756 | 
| 4 | 4 | one | B | bar | 0.990922 | 0.956082 | 
| 5 | 5 | one | C | bar | 0.041987 | 1.418088 | 
| 6 | 6 | two | A | foo | 0.487429 | 0.315077 | 
| 7 | 7 | three | B | foo | 0.072805 | -0.097474 | 
| 8 | 8 | one | C | foo | -0.666952 | 1.655099 | 
| 9 | 9 | one | A | bar | 1.647375 | -1.670791 | 
| 10 | 10 | two | B | bar | -0.907300 | -1.481746 | 
| 11 | 11 | three | C | bar | 0.214953 | -0.420732 | 
1  | #excel导入导出,导出需要安装openpyxl库,导入需要安装xlrd库  | 
读取MySQL
1  | import pandas as pd  |