Skip to content
ChannelCMT edited this page Jun 21, 2019 · 2 revisions

Series

Series是pandas的一种存储结构,一维数组,它可以包含任何数据类型的标签。我们主要使用它们来处理时间序列数据

1、 创建series Series([ ])

创建一个series,获取series的名称和索引

import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
s = pd.Series([1, 2, np.nan, 4, 5])
print(s)
print(s.name)
print(s.index)
0    1.0
1    2.0
2    NaN
3    4.0
4    5.0
dtype: float64
None
RangeIndex(start=0, stop=5, step=1)

2、 添加名称,修改索引

s.name = "Price Series"
print("series name:",s.name)
new_index = pd.date_range("20160101",periods=len(s), freq="D")
s.index = new_index
print("new index:",s.index)
print (s)
series name: Price Series
new index: DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05'],
              dtype='datetime64[ns]', freq='D')
2016-01-01    1.0
2016-01-02    2.0
2016-01-03    NaN
2016-01-04    4.0
2016-01-05    5.0
Freq: D, Name: Price Series, dtype: float64

3、 访问序列元素

系列的访问通常使用 iloc[ ]loc[ ] 的方法。我们使用iloc[]来访问元素的整数索引和我们使用loc[]来访问序列的索引

iloc

访问单个整数索引

print (s)
print("First element of the series: ", s.iloc[0])
print("Last element of the series: ", s.iloc[len(s)-1])
2016-01-01    1.0
2016-01-02    2.0
2016-01-03    NaN
2016-01-04    4.0
2016-01-05    5.0
Freq: D, Name: Price Series, dtype: float64
First element of the series:  1.0
Last element of the series:  5.0

iloc

访问范围的整数索引,从0到5,间隔2

print (s)
print(s.iloc[0:5:2])
2016-01-01    1.0
2016-01-02    2.0
2016-01-03    NaN
2016-01-04    4.0
2016-01-05    5.0
Freq: D, Name: Price Series, dtype: float64
2016-01-01    1.0
2016-01-03    NaN
2016-01-05    5.0
Freq: 2D, Name: Price Series, dtype: float64

loc

访问单个与范围的序列

print (s)
print(s.loc['20160101'])
print(s.loc['20160102':'20160104'])
2016-01-01    1.0
2016-01-02    2.0
2016-01-03    NaN
2016-01-04    4.0
2016-01-05    5.0
Freq: D, Name: Price Series, dtype: float64
1.0
2016-01-02    2.0
2016-01-03    NaN
2016-01-04    4.0
Freq: D, Name: Price Series, dtype: float64

4、 布尔索引

除了上述访问方法,您可以使用布尔过滤序列数组。比较序列与标准是否一致。当与您设定的任何条件相比,这次你返回另一个序列中,回填满了布尔值。

print (s)
print(s < 3)
print(s.loc[s < 3])
print(s.loc[(s < 3) & (s > 1)])
2016-01-01    1.0
2016-01-02    2.0
2016-01-03    NaN
2016-01-04    4.0
2016-01-05    5.0
Freq: D, Name: Price Series, dtype: float64
2016-01-01     True
2016-01-02     True
2016-01-03    False
2016-01-04    False
2016-01-05    False
Freq: D, Name: Price Series, dtype: bool
2016-01-01    1.0
2016-01-02    2.0
Freq: D, Name: Price Series, dtype: float64
2016-01-02    2.0
Freq: D, Name: Price Series, dtype: float64

缺失数据 当我们处理实际数据,有一个非常现实的遭遇缺失值的可能性。pandas提供我们处理它们的方法,我们有两个处理缺失数据的主要手段,一个是fillna,另一个是dropna。

5、 使用Series处理时间序列

sz50.xlsx数据链接:https://pan.baidu.com/s/1LO26_BDnFUFtVXB3lRZZPA

提取码:t2zu

读取excel数据并进行抽样resample()

import pandas as pd

data = pd.read_excel('sz50.xlsx', sheetname=0, index_col='datetime')
print(data)
                      close    high     low    open    volume
datetime                                                     
2017-01-03 15:00:00  115.99  117.06  115.14  115.43  16232125
2017-01-04 15:00:00  116.28  116.42  115.21  115.99  29656234
2017-01-05 15:00:00  116.07  116.64  115.64  116.07  26436646
2017-01-06 15:00:00  115.21  116.07  114.86  116.07  17195598
2017-01-09 15:00:00  115.35  115.99  114.86  115.64  14908745
2017-01-10 15:00:00  115.28  115.64  114.93  115.21   7996636
2017-01-11 15:00:00  115.07  115.64  115.00  115.64   9166532
2017-01-12 15:00:00  114.78  115.35  114.71  115.21   8295650
2017-01-13 15:00:00  115.85  115.99  114.64  114.64  19024943
2017-01-16 15:00:00  117.92  118.20  114.64  115.57  53249124
2017-01-17 15:00:00  116.85  117.77  116.56  117.21  12555292
2017-01-18 15:00:00  117.42  117.85  116.49  116.92  11478663
2017-01-19 15:00:00  117.77  118.49  116.99  116.99  12180687
2017-01-20 15:00:00  118.06  118.63  117.49  118.06  14285968
2017-01-23 15:00:00  117.99  118.84  117.56  118.63  14615740
2017-01-24 15:00:00  118.91  118.91  118.06  118.06  14985241
2017-01-25 15:00:00  118.91  119.20  118.27  118.84  11284869
2017-01-26 15:00:00  119.41  119.91  118.27  118.84   8602907
2017-02-03 15:00:00  118.42  119.98  118.34  119.77   8171489
2017-02-06 15:00:00  118.63  119.48  118.63  119.27  13455250
2017-02-07 15:00:00  118.77  119.20  118.42  118.56  14757284
2017-02-08 15:00:00  118.63  118.84  117.77  118.42  11238767
2017-02-09 15:00:00  119.06  119.41  118.13  118.77  11393034
2017-02-10 15:00:00  119.48  119.91  118.91  119.34  13983062
2017-02-13 15:00:00  119.98  120.34  119.48  120.20  19992372
2017-02-14 15:00:00  119.34  120.20  119.20  120.12  12987135
2017-02-15 15:00:00  119.98  120.55  119.27  119.77  25687112
2017-02-16 15:00:00  119.48  120.41  119.34  120.20  16325732
2017-02-17 15:00:00  118.56  119.77  118.13  119.48  13863642
2017-02-20 15:00:00  120.55  120.91  118.34  118.34  29915560
...                     ...     ...     ...     ...       ...
2017-10-10 15:00:00  122.81  122.81  121.78  122.44  13475400
2017-10-11 15:00:00  122.44  122.91  122.16  122.34   9654900
2017-10-12 15:00:00  122.34  122.72  121.59  122.34   8363600
2017-10-13 15:00:00  121.31  122.62  121.22  122.16  11271700
2017-10-16 15:00:00  122.25  122.44  121.31  121.59  11832600
2017-10-17 15:00:00  121.78  122.44  121.41  122.16   7934100
2017-10-18 15:00:00  122.53  122.72  121.22  121.87  22599700
2017-10-19 15:00:00  123.09  123.37  121.69  122.25  28931900
2017-10-20 15:00:00  121.97  122.81  121.97  122.53   8716900
2017-10-23 15:00:00  120.37  122.16  120.28  122.06  15590300
2017-10-24 15:00:00  120.56  121.41  120.19  120.37  12571800
2017-10-25 15:00:00  120.94  121.31  120.19  120.56  10200400
2017-10-26 15:00:00  120.19  120.75  119.81  120.75  12938000
2017-10-27 15:00:00  120.47  121.31  120.19  120.37  15482700
2017-10-30 15:00:00  119.06  120.19  118.03  120.19  37086800
2017-10-31 15:00:00  118.22  118.69  117.94  118.22   9330200
2017-11-01 15:00:00  117.56  119.25  117.47  118.12  16948000
2017-11-02 15:00:00  117.47  117.75  116.53  117.37  23219200
2017-11-03 15:00:00  117.94  118.12  116.53  117.47  15786000
2017-11-06 15:00:00  116.91  117.56  116.72  117.56   9785200
2017-11-07 15:00:00  117.56  118.12  116.34  116.91  19003800
2017-11-08 15:00:00  117.94  118.87  117.19  117.47  18500100
2017-11-09 15:00:00  117.66  118.41  117.47  117.84   8739900
2017-11-10 15:00:00  118.41  118.41  116.81  117.56  24748600
2017-11-13 15:00:00  120.00  120.47  118.41  118.59  41250100
2017-11-14 15:00:00  118.12  119.72  117.94  119.62  17172100
2017-11-15 15:00:00  118.12  118.41  117.66  117.84  14029600
2017-11-16 15:00:00  116.16  117.75  116.06  117.75  18042800
2017-11-17 15:00:00  119.81  120.00  116.25  116.25  53475100
2017-11-20 15:00:00  120.47  120.56  118.22  118.97  29413900

[215 rows x 5 columns]

只保留data中的close,获取data的数据类型与前5个值:

Series = data.close
Series.head()
datetime
2017-01-03 15:00:00    115.99
2017-01-04 15:00:00    116.28
2017-01-05 15:00:00    116.07
2017-01-06 15:00:00    115.21
2017-01-09 15:00:00    115.35
Name: close, dtype: float64

用resample给每个月的最后一天抽样。

monthly_prices = Series.resample('M').last()
print(monthly_prices.head(5))
datetime
2017-01-31    119.41
2017-02-28    118.06
2017-03-31    114.00
2017-04-30    108.30
2017-05-31    120.37
Freq: M, Name: close, dtype: float64
monthly_prices_med = Series.resample('M').median()
monthly_prices_med.head(5)
datetime
2017-01-31    116.565
2017-02-28    118.985
2017-03-31    115.500
2017-04-30    109.800
2017-05-31    107.700
Freq: M, Name: close, dtype: float64

6、 缺失数据处理

当我们处理实际数据,有一个非常现实的遭遇缺失值的可能性。pandas提供我们处理它们的方法,我们有两个处理缺失数据的主要手段,一个是fillna,另一个是dropna。

from datetime import datetime
data_s= Series.loc[datetime(2017,1,1):datetime(2017,1,10)]
data_r=data_s.resample('D').mean() #插入每一天
print(data_r.head(10))
datetime
2017-01-03    115.99
2017-01-04    116.28
2017-01-05    116.07
2017-01-06    115.21
2017-01-07       NaN
2017-01-08       NaN
2017-01-09    115.35
Freq: D, Name: close, dtype: float64

使用dropna()方法删除缺失值

print(data_r.head(10).dropna())  #去掉缺失值
datetime
2017-01-03    115.99
2017-01-04    116.28
2017-01-05    116.07
2017-01-06    115.21
2017-01-09    115.35
Name: close, dtype: float64

填写缺失的数据 fillna()

print(data_r.head(10).fillna(method='ffill'))  #填写缺失的天为前一天的价格。
datetime
2017-01-03    115.99
2017-01-04    116.28
2017-01-05    116.07
2017-01-06    115.21
2017-01-07    115.21
2017-01-08    115.21
2017-01-09    115.35
Freq: D, Name: close, dtype: float64
Clone this wiki locally