Skip to content
ChannelCMT edited this page Jun 3, 2019 · 1 revision

Panel

pandas的panel是由items, major_xs, minor_xs构成的三维数据,方便管理三维数据。 multiIndex:一种多索引的数据格式,可将三维数据直观展示出来

import warnings
warnings.filterwarnings('ignore')

1、 字典转Panel

from datetime import datetime
import pandas as pd
symbol=['600036.XSHG','600050.XSHG','601318.XSHG']
data_dict = {}
for s in symbol:
    data =  pd.read_excel('sz50.xlsx',sheetname=s, index_col='datetime')
    data_dict[s] = data.loc['2017-03-21':'2017-05-10']
PN = pd.Panel(data_dict)
print(PN)
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 34 (major_axis) x 5 (minor_axis)
Items axis: 600036.XSHG to 601318.XSHG
Major_axis axis: 2017-03-21 15:00:00 to 2017-05-10 15:00:00
Minor_axis axis: close to volume

2、 显示数据 to_frame()

MI = PN.to_frame()
print(MI.head())
print(round(MI.head(),1))
                            600036.XSHG   600050.XSHG  601318.XSHG
datetime            minor                                         
2017-03-21 15:00:00 close         72.36  9.270000e+00        81.28
                    high          72.67  9.360000e+00        81.28
                    low           71.97  9.210000e+00        80.34
                    open          72.36  9.320000e+00        80.70
                    volume  40184305.00  1.473474e+08  96990107.00
                            600036.XSHG  600050.XSHG  601318.XSHG
datetime            minor                                        
2017-03-21 15:00:00 close          72.4          9.3         81.3
                    high           72.7          9.4         81.3
                    low            72.0          9.2         80.3
                    open           72.4          9.3         80.7
                    volume   40184305.0  147347370.0   96990107.0

3、 修改顺序 transpose()

MI_tp = PN.transpose(2,1,0)
print(MI_tp)
MI_tp = PN.transpose(2,1,0).to_frame()
print(MI_tp.head())
<class 'pandas.core.panel.Panel'>
Dimensions: 5 (items) x 34 (major_axis) x 3 (minor_axis)
Items axis: close to volume
Major_axis axis: 2017-03-21 15:00:00 to 2017-05-10 15:00:00
Minor_axis axis: 600036.XSHG to 601318.XSHG
                                 close   high    low   open       volume
datetime            minor                                               
2017-03-21 15:00:00 600036.XSHG  72.36  72.67  71.97  72.36   40184305.0
                    600050.XSHG   9.27   9.36   9.21   9.32  147347370.0
                    601318.XSHG  81.28  81.28  80.34  80.70   96990107.0
2017-03-22 15:00:00 600036.XSHG  71.51  72.36  71.31  72.32   46855076.0
                    600050.XSHG   9.26   9.36   8.97   9.21  231158696.0

修改名称

PN_rename = PN.rename(items={'600036.XSHG':'ZSYH','600050.XSHG':'ZGLT','601318.XSHG':'ZGPA'})
print(PN_rename)
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 34 (major_axis) x 5 (minor_axis)
Items axis: ZSYH to ZGPA
Major_axis axis: 2017-03-21 15:00:00 to 2017-05-10 15:00:00
Minor_axis axis: close to volume

resample抽样取周线

PN_RE = PN.transpose(2,1,0).resample('W-MON',axis=1).last()
print(PN_RE)
<class 'pandas.core.panel.Panel'>
Dimensions: 5 (items) x 8 (major_axis) x 3 (minor_axis)
Items axis: close to volume
Major_axis axis: 2017-03-27 00:00:00 to 2017-05-15 00:00:00
Minor_axis axis: 600036.XSHG to 601318.XSHG
print(PN_RE.to_frame().head())
                        close   high    low   open       volume
datetime   minor                                               
2017-03-27 600036.XSHG  73.67  74.56  72.51  72.51   80019192.0
           600050.XSHG   9.11   9.32   9.08   9.25  175576529.0
           601318.XSHG  82.31  83.50  81.75  81.77  122855895.0
2017-04-03 600036.XSHG  74.09  74.29  73.40  73.40   33962146.0
           600050.XSHG   9.03   9.10   8.85   8.85  125608109.0

4、 访问数据

访问items的数据

print(PN_RE['close'].head())
            600036.XSHG  600050.XSHG  601318.XSHG
datetime                                         
2017-03-27        73.67         9.11        82.31
2017-04-03        74.09         9.03        82.85
2017-04-10        73.01          NaN        81.61
2017-04-17        72.90          NaN        80.67
2017-04-24        73.05          NaN        81.01

访问major的数据

print(PN_RE.major_xs('2017-04-10'))
             close   high    low   open      volume
600036.XSHG  73.01  73.21  72.16  73.05  32313532.0
600050.XSHG    NaN    NaN    NaN    NaN         NaN
601318.XSHG  81.61  82.22  81.48  81.88  48440496.0

访问minor的数据

print(PN_RE.minor_xs('600036.XSHG').head())
            close   high    low   open      volume
datetime                                          
2017-03-27  73.67  74.56  72.51  72.51  80019192.0
2017-04-03  74.09  74.29  73.40  73.40  33962146.0
2017-04-10  73.01  73.21  72.16  73.05  32313532.0
2017-04-17  72.90  72.94  71.78  72.43  40747304.0
2017-04-24  73.05  73.36  72.55  72.90  37143301.0

访问loc使用名称索引

print(PN_RE.loc[:,'2017-03-21':'2017-04-10',:].to_frame())
                        close   high    low   open       volume
datetime   minor                                               
2017-03-27 600036.XSHG  73.67  74.56  72.51  72.51   80019192.0
           600050.XSHG   9.11   9.32   9.08   9.25  175576529.0
           601318.XSHG  82.31  83.50  81.75  81.77  122855895.0
2017-04-03 600036.XSHG  74.09  74.29  73.40  73.40   33962146.0
           600050.XSHG   9.03   9.10   8.85   8.85  125608109.0
           601318.XSHG  82.85  83.18  81.93  82.15   63797819.0
2017-04-10 600036.XSHG  73.01  73.21  72.16  73.05   32313532.0
           601318.XSHG  81.61  82.22  81.48  81.88   48440496.0

访问iloc使用位置索引

print(PN_RE.iloc[2:,2:,2:])
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 6 (major_axis) x 1 (minor_axis)
Items axis: low to volume
Major_axis axis: 2017-04-10 00:00:00 to 2017-05-15 00:00:00
Minor_axis axis: 601318.XSHG to 601318.XSHG
print(PN_RE.ix[0:3,-1,'601318.XSHG'])
close    89.36
high     89.63
low      85.18
Name: 601318.XSHG, dtype: float64

5、 处理缺失值

print(PN.isnull().values.any())
True
if PN.isnull().values.any():
    PN.fillna(method='ffill',inplace=True)
print(PN.isnull().values.any())
False

6、 多维数据计算与合并

计算每只股票的5日均线,并且合并成DataFrame

import talib.abstract as ta

df_ma = pd.DataFrame({name: ta.MA(value, 5) for name, value in PN.iteritems()})
print(df_ma.tail())
                     600036.XSHG  600050.XSHG  601318.XSHG
datetime                                                  
2017-05-04 15:00:00       73.158         9.03       84.886
2017-05-05 15:00:00       73.012         9.03       84.626
2017-05-08 15:00:00       72.758         9.03       84.544
2017-05-09 15:00:00       72.580         9.03       84.580
2017-05-10 15:00:00       72.650         9.03       85.430

计算每只股票的macd, 然后合并成MultiIndex

pn_macd = pd.Panel({name: ta.MACD(value) for name, value in PN.iteritems()})
df_macd = pn_macd.transpose(2,1,0).to_frame().head()
print(df_macd)
                                     macd  macdsignal  macdhist
datetime            minor                                      
2017-05-10 15:00:00 600036.XSHG -0.067689   -0.105095  0.037406
                    600050.XSHG -0.019533   -0.027102  0.007569
                    601318.XSHG  1.090898    0.361969  0.728928

用stack()将DataFrame转换multiIndex,再将两个multiIndex合并

df_macd['ma'] = df_ma.stack()
print(df_macd)
                                     macd  macdsignal  macdhist     ma
datetime            minor                                             
2017-05-10 15:00:00 600036.XSHG -0.067689   -0.105095  0.037406  72.65
                    600050.XSHG -0.019533   -0.027102  0.007569   9.03
                    601318.XSHG  1.090898    0.361969  0.728928  85.43
Clone this wiki locally