We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好,过程中碰到一些文章解析问题: 1.有的老网站的author和publish_date都被放在一个元素内了,解析出来的作者和发布日期都为【发布日期:2021-06-11 作者:招生办 来源: 继续教育学院 点击:451】 case:https://jxjy.gdou.edu.cn/info/1176/2828.htm
2.内容获取不正确: case:http://sce.stu.edu.cn/show/article/1060.html
3.文章页面内容中如果包含附件(doc、pdf)链接,能否将它们的内容放在一个额外的fields中返回呢? case:https://jxjy.scau.edu.cn/2024/0514/c4910a374510/page.htm 比如底部含有2个附件,放在类似如下结构中
{ 'files': [ { 'title': '附件1', 'link': ... }, { 'title': '附件2', 'link': ... } ] }
The text was updated successfully, but these errors were encountered:
感觉 文章 能支持自定义fields会比较好
Sorry, something went wrong.
No branches or pull requests
你好,过程中碰到一些文章解析问题:
1.有的老网站的author和publish_date都被放在一个元素内了,解析出来的作者和发布日期都为【发布日期:2021-06-11 作者:招生办 来源: 继续教育学院 点击:451】
case:https://jxjy.gdou.edu.cn/info/1176/2828.htm
2.内容获取不正确:
case:http://sce.stu.edu.cn/show/article/1060.html
3.文章页面内容中如果包含附件(doc、pdf)链接,能否将它们的内容放在一个额外的fields中返回呢?
case:https://jxjy.scau.edu.cn/2024/0514/c4910a374510/page.htm
比如底部含有2个附件,放在类似如下结构中
The text was updated successfully, but these errors were encountered: