Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

燃果—zuolinwei—it-support-engineer #798

Open
jm528 opened this issue Feb 25, 2022 · 0 comments
Open

燃果—zuolinwei—it-support-engineer #798

jm528 opened this issue Feb 25, 2022 · 0 comments

Comments

@jm528
Copy link

jm528 commented Feb 25, 2022

#!/usr/bin/env python3

-- coding: utf-8 --

create date: 2022.02.24

author: zuolinwei

import re
import json

loginfo_list = []
logformat_list = []

整理源日志文件格式,将断裂日志重新拼接,并将日志时间生成为小时窗口格式加入到行首(文本处理),临时放置于内存中

with open("interview_data_set", "r") as f:
for line in f.readlines():
if "last message repeated 1 time" in line:
loginfo_list.append(loginfo_list[-1])
continue
elif line.startswith('\t'):
loginfo_list.remove(loginfo_list[-1])
mark = mark + line
mark = mark.replace("\n", " ")
mark = mark.replace("\t", "")
loginfo_list.append(mark)
continue
else:
ttime = line.split(" ")[2].strip().split(":")[0].strip()
stime = str(ttime) + "00"
etime = str("%02d" % (int(ttime) + 1)) + "00"
rtime = "%s-%s" % (stime, etime)
line = rtime + " " + line
loginfo_list.append(line)
mark = line

对内存中的日志文件进行循环处理,筛选出5个关键信息(关键信息的位置并不十分明确,根据自己判断),再将其重新拼接,用于之后的数量统计

时间窗口: 根据原行首的时间进行转换,无视日期,开始小时取小时数,分数为固定的00,结束小时为开始小时之后一小时,分数为固定的00(也可以转换为时间格式再重组)

设备名称: 取重新格式化日志后的第5列内容

进程ID: 将重新格式化日志以中括号和小括号为分隔,拆分出许多内容,逐个判断是否为数字,取最后一个数字作为进程ID号(判断方式不确定)

进程名称: 取重新格式化日志后的第6列内容,括号之前的文本部分作为进程名称(判断方式不确定)

错误描述: 取重新格式化日志,以"):"或"]:"为分隔符,取最后一列内容作为错误描述(判断方式不确定)

for line in loginfo_list:
line_list = line.split(" ")
timeWindow = line_list[0].strip()
deviceName = line_list[4].strip()
numlist = re.split('[([]).]', line_list[5] + line_list[6])
processId = None
for i in numlist:
try:
int(i)
except ValueError:
pass
else:
processId = i
processName = line_list[5].split("[")[0]
description = re.split('):|]:', line)[-1]
logstr = "%s|%s|%s|%s|%s" % (timeWindow, deviceName, processId, processName, description)
logformat_list.append(logstr)

count_dict = {}
for i in logformat_list:
if i in count_dict:
count_dict[i] += 1
else:
count_dict[i] = 1

result_list = []
for i in count_dict:
d = {"timeWindow": i.split("|")[0].strip(), "deviceName": i.split("|")[1].strip(),
"processId": i.split("|")[2].strip(), "processName": i.split("|")[3].strip(),
"description": i.split("|")[4].strip(), "numberOfOccurrence": count_dict[i]}
result_list.append(d)
result_dict = {"Data": result_list}
result_json = json.dumps(result_dict)
print(result_json)
r=requests.post('https://foo.com/bar',data=d)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant