-
Notifications
You must be signed in to change notification settings - Fork 54
Series
ChannelCMT edited this page Jun 21, 2019
·
2 revisions
Series是pandas的一种存储结构,一维数组,它可以包含任何数据类型的标签。我们主要使用它们来处理时间序列数据
创建一个series,获取series的名称和索引
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
s = pd.Series([1, 2, np.nan, 4, 5])
print(s)
print(s.name)
print(s.index)
0 1.0
1 2.0
2 NaN
3 4.0
4 5.0
dtype: float64
None
RangeIndex(start=0, stop=5, step=1)
s.name = "Price Series"
print("series name:",s.name)
new_index = pd.date_range("20160101",periods=len(s), freq="D")
s.index = new_index
print("new index:",s.index)
print (s)
series name: Price Series
new index: DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05'],
dtype='datetime64[ns]', freq='D')
2016-01-01 1.0
2016-01-02 2.0
2016-01-03 NaN
2016-01-04 4.0
2016-01-05 5.0
Freq: D, Name: Price Series, dtype: float64
系列的访问通常使用 iloc[ ] 和 loc[ ] 的方法。我们使用iloc[]来访问元素的整数索引和我们使用loc[]来访问序列的索引
访问单个整数索引
print (s)
print("First element of the series: ", s.iloc[0])
print("Last element of the series: ", s.iloc[len(s)-1])
2016-01-01 1.0
2016-01-02 2.0
2016-01-03 NaN
2016-01-04 4.0
2016-01-05 5.0
Freq: D, Name: Price Series, dtype: float64
First element of the series: 1.0
Last element of the series: 5.0
访问范围的整数索引,从0到5,间隔2
print (s)
print(s.iloc[0:5:2])
2016-01-01 1.0
2016-01-02 2.0
2016-01-03 NaN
2016-01-04 4.0
2016-01-05 5.0
Freq: D, Name: Price Series, dtype: float64
2016-01-01 1.0
2016-01-03 NaN
2016-01-05 5.0
Freq: 2D, Name: Price Series, dtype: float64
访问单个与范围的序列
print (s)
print(s.loc['20160101'])
print(s.loc['20160102':'20160104'])
2016-01-01 1.0
2016-01-02 2.0
2016-01-03 NaN
2016-01-04 4.0
2016-01-05 5.0
Freq: D, Name: Price Series, dtype: float64
1.0
2016-01-02 2.0
2016-01-03 NaN
2016-01-04 4.0
Freq: D, Name: Price Series, dtype: float64
除了上述访问方法,您可以使用布尔过滤序列数组。比较序列与标准是否一致。当与您设定的任何条件相比,这次你返回另一个序列中,回填满了布尔值。
print (s)
print(s < 3)
print(s.loc[s < 3])
print(s.loc[(s < 3) & (s > 1)])
2016-01-01 1.0
2016-01-02 2.0
2016-01-03 NaN
2016-01-04 4.0
2016-01-05 5.0
Freq: D, Name: Price Series, dtype: float64
2016-01-01 True
2016-01-02 True
2016-01-03 False
2016-01-04 False
2016-01-05 False
Freq: D, Name: Price Series, dtype: bool
2016-01-01 1.0
2016-01-02 2.0
Freq: D, Name: Price Series, dtype: float64
2016-01-02 2.0
Freq: D, Name: Price Series, dtype: float64
缺失数据 当我们处理实际数据,有一个非常现实的遭遇缺失值的可能性。pandas提供我们处理它们的方法,我们有两个处理缺失数据的主要手段,一个是fillna,另一个是dropna。
sz50.xlsx数据链接:https://pan.baidu.com/s/1LO26_BDnFUFtVXB3lRZZPA
提取码:t2zu
读取excel数据并进行抽样resample()
import pandas as pd
data = pd.read_excel('sz50.xlsx', sheetname=0, index_col='datetime')
print(data)
close high low open volume
datetime
2017-01-03 15:00:00 115.99 117.06 115.14 115.43 16232125
2017-01-04 15:00:00 116.28 116.42 115.21 115.99 29656234
2017-01-05 15:00:00 116.07 116.64 115.64 116.07 26436646
2017-01-06 15:00:00 115.21 116.07 114.86 116.07 17195598
2017-01-09 15:00:00 115.35 115.99 114.86 115.64 14908745
2017-01-10 15:00:00 115.28 115.64 114.93 115.21 7996636
2017-01-11 15:00:00 115.07 115.64 115.00 115.64 9166532
2017-01-12 15:00:00 114.78 115.35 114.71 115.21 8295650
2017-01-13 15:00:00 115.85 115.99 114.64 114.64 19024943
2017-01-16 15:00:00 117.92 118.20 114.64 115.57 53249124
2017-01-17 15:00:00 116.85 117.77 116.56 117.21 12555292
2017-01-18 15:00:00 117.42 117.85 116.49 116.92 11478663
2017-01-19 15:00:00 117.77 118.49 116.99 116.99 12180687
2017-01-20 15:00:00 118.06 118.63 117.49 118.06 14285968
2017-01-23 15:00:00 117.99 118.84 117.56 118.63 14615740
2017-01-24 15:00:00 118.91 118.91 118.06 118.06 14985241
2017-01-25 15:00:00 118.91 119.20 118.27 118.84 11284869
2017-01-26 15:00:00 119.41 119.91 118.27 118.84 8602907
2017-02-03 15:00:00 118.42 119.98 118.34 119.77 8171489
2017-02-06 15:00:00 118.63 119.48 118.63 119.27 13455250
2017-02-07 15:00:00 118.77 119.20 118.42 118.56 14757284
2017-02-08 15:00:00 118.63 118.84 117.77 118.42 11238767
2017-02-09 15:00:00 119.06 119.41 118.13 118.77 11393034
2017-02-10 15:00:00 119.48 119.91 118.91 119.34 13983062
2017-02-13 15:00:00 119.98 120.34 119.48 120.20 19992372
2017-02-14 15:00:00 119.34 120.20 119.20 120.12 12987135
2017-02-15 15:00:00 119.98 120.55 119.27 119.77 25687112
2017-02-16 15:00:00 119.48 120.41 119.34 120.20 16325732
2017-02-17 15:00:00 118.56 119.77 118.13 119.48 13863642
2017-02-20 15:00:00 120.55 120.91 118.34 118.34 29915560
... ... ... ... ... ...
2017-10-10 15:00:00 122.81 122.81 121.78 122.44 13475400
2017-10-11 15:00:00 122.44 122.91 122.16 122.34 9654900
2017-10-12 15:00:00 122.34 122.72 121.59 122.34 8363600
2017-10-13 15:00:00 121.31 122.62 121.22 122.16 11271700
2017-10-16 15:00:00 122.25 122.44 121.31 121.59 11832600
2017-10-17 15:00:00 121.78 122.44 121.41 122.16 7934100
2017-10-18 15:00:00 122.53 122.72 121.22 121.87 22599700
2017-10-19 15:00:00 123.09 123.37 121.69 122.25 28931900
2017-10-20 15:00:00 121.97 122.81 121.97 122.53 8716900
2017-10-23 15:00:00 120.37 122.16 120.28 122.06 15590300
2017-10-24 15:00:00 120.56 121.41 120.19 120.37 12571800
2017-10-25 15:00:00 120.94 121.31 120.19 120.56 10200400
2017-10-26 15:00:00 120.19 120.75 119.81 120.75 12938000
2017-10-27 15:00:00 120.47 121.31 120.19 120.37 15482700
2017-10-30 15:00:00 119.06 120.19 118.03 120.19 37086800
2017-10-31 15:00:00 118.22 118.69 117.94 118.22 9330200
2017-11-01 15:00:00 117.56 119.25 117.47 118.12 16948000
2017-11-02 15:00:00 117.47 117.75 116.53 117.37 23219200
2017-11-03 15:00:00 117.94 118.12 116.53 117.47 15786000
2017-11-06 15:00:00 116.91 117.56 116.72 117.56 9785200
2017-11-07 15:00:00 117.56 118.12 116.34 116.91 19003800
2017-11-08 15:00:00 117.94 118.87 117.19 117.47 18500100
2017-11-09 15:00:00 117.66 118.41 117.47 117.84 8739900
2017-11-10 15:00:00 118.41 118.41 116.81 117.56 24748600
2017-11-13 15:00:00 120.00 120.47 118.41 118.59 41250100
2017-11-14 15:00:00 118.12 119.72 117.94 119.62 17172100
2017-11-15 15:00:00 118.12 118.41 117.66 117.84 14029600
2017-11-16 15:00:00 116.16 117.75 116.06 117.75 18042800
2017-11-17 15:00:00 119.81 120.00 116.25 116.25 53475100
2017-11-20 15:00:00 120.47 120.56 118.22 118.97 29413900
[215 rows x 5 columns]
只保留data中的close,获取data的数据类型与前5个值:
Series = data.close
Series.head()
datetime
2017-01-03 15:00:00 115.99
2017-01-04 15:00:00 116.28
2017-01-05 15:00:00 116.07
2017-01-06 15:00:00 115.21
2017-01-09 15:00:00 115.35
Name: close, dtype: float64
用resample给每个月的最后一天抽样。
monthly_prices = Series.resample('M').last()
print(monthly_prices.head(5))
datetime
2017-01-31 119.41
2017-02-28 118.06
2017-03-31 114.00
2017-04-30 108.30
2017-05-31 120.37
Freq: M, Name: close, dtype: float64
monthly_prices_med = Series.resample('M').median()
monthly_prices_med.head(5)
datetime
2017-01-31 116.565
2017-02-28 118.985
2017-03-31 115.500
2017-04-30 109.800
2017-05-31 107.700
Freq: M, Name: close, dtype: float64
当我们处理实际数据,有一个非常现实的遭遇缺失值的可能性。pandas提供我们处理它们的方法,我们有两个处理缺失数据的主要手段,一个是fillna,另一个是dropna。
from datetime import datetime
data_s= Series.loc[datetime(2017,1,1):datetime(2017,1,10)]
data_r=data_s.resample('D').mean() #插入每一天
print(data_r.head(10))
datetime
2017-01-03 115.99
2017-01-04 116.28
2017-01-05 116.07
2017-01-06 115.21
2017-01-07 NaN
2017-01-08 NaN
2017-01-09 115.35
Freq: D, Name: close, dtype: float64
使用dropna()方法删除缺失值
print(data_r.head(10).dropna()) #去掉缺失值
datetime
2017-01-03 115.99
2017-01-04 116.28
2017-01-05 116.07
2017-01-06 115.21
2017-01-09 115.35
Name: close, dtype: float64
填写缺失的数据 fillna()
print(data_r.head(10).fillna(method='ffill')) #填写缺失的天为前一天的价格。
datetime
2017-01-03 115.99
2017-01-04 116.28
2017-01-05 116.07
2017-01-06 115.21
2017-01-07 115.21
2017-01-08 115.21
2017-01-09 115.35
Freq: D, Name: close, dtype: float64
-
python基础
-
python进阶
-
数据格式处理
-
数据计算与展示
-
因子横截面排序分析
-
信号时间序列分析
-
CTA策略类型
-
附录:因子算法