Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

海底捞针官方题解round与fit_intercept #31

Open
0rzx opened this issue Oct 23, 2019 · 2 comments
Open

海底捞针官方题解round与fit_intercept #31

0rzx opened this issue Oct 23, 2019 · 2 comments
Labels
bug Something isn't working

Comments

@0rzx
Copy link

0rzx commented Oct 23, 2019

  1. 生成均值图像用的是np.ndarray.astype(np.uint8),做的是向下取整。而题解中却用round四舍五入来解释

    averaged = np.mean(targets, axis=0).astype(np.uint8)

    https://github.com/ustclug/hackergame2019-writeups/blame/6fbe69e0666bdb9f16a375b6cfb5395aafcefd1c/official/2077_%E6%B5%B7%E5%BA%95%E6%8D%9E%E9%92%88/README.md#L20-L25

  2. 题目中拟合的形式是y=k0*x0+k1*x1+...+k49999*x49999,不存在偏置项b,那么用下面不带偏置项的回归器应该更科学一点

reg = linear_model.Lasso(alpha=1, positive=True, fit_intercept=False)

但是,实际效果却变差了很多,通不过官方给的测试。运行一个更弱的条件,发现有四张正确图片的系数是0

[x for x in choices if x not in np.argwhere(reg.coef_>0).reshape(-1)] # [4303, 24496, 36462, 39326]

这真是一个玄学题目😂

@suquark
Copy link
Contributor

suquark commented Oct 23, 2019

对,的确是floor,不是round。。。然而不知道为啥我写题解时却发现行为和round一样。。。惊出了我一身冷汗。。。这样我就可以放心地改回去了

@suquark
Copy link
Contributor

suquark commented Oct 23, 2019

@0rzx 这里还有一个比较科学的解释:

在题目的生成源代码里面运行:

print((np.mean(targets, axis=0).astype(np.uint8) - np.mean(targets, axis=0)).mean())
print(images.mean() - images[choices].mean())

会得到

-0.47768702651515166
2.2311801805161053

虽然由于取整,带来了一个大约是 0.5 的负的 bias,但是由于样本选择问题,导致平均像素值低了2.2311801805161053,后者对于50000个样本而言是一个巨大的bias。如果此时intercept设为0,那么显然 Lasso 不得不将这个 bias 加权到其它图片上面,因而不能得到解,甚至导致正确答案的权重严重偏低。

@volltin volltin added the bug Something isn't working label Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants