-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove 1.1MB (85%) of binary size by not including iostream #41
Conversation
Ok. I actually think we should be able to do away with iostream entirely... this is a good idea. |
Don’t get so worked up! :-) |
Although I mean everything I wrote above, I sort of wrote that in jest -- trying to be expressive while concise. Maybe the balance went a bit too far to the expressive side :-) Also, to make it clear, in no way was I directing this at any of fast_float's code. It is entirely about how something like iostream has such a prominent place in C++. |
@biojppm I also dislike streams in C++. |
Merging. I will issue a release. |
Now that this is merged, I must say that I don't really understand your analysis. I agree that it is best not to include iostream, so I don't think that including iostream should ever result in megabytes of binary. This does not make sense to me. No need to explain though because I am not arguing in the least against the PR. |
I'm happy to explain -- I think it's important to understand. The gist of the analysis is this: compare different alternatives of reading a float/double with regard to compilation time and (more significantly) binary size. This is accomplished by compiling for each alternative function a single main with the form: int main()
{
char buf[BUFSIZE];
while(fgets(buf, BUFSIZE, stdin))
{
fputs(buf, stdout);
READ_FROM_BUF(buf);
}
} For example, the fast_float read is compiled with #include <c4/ext/fast_float.hpp>
#include <cstring>
double doit(const char *s)
{
double result;
fast_float::from_chars(s, s+strlen(s), result);
return result;
}
#define READ_FROM_BUF(s) (void) doit(s) whereas stringstream is compiled with #include <sstream>
float doit(const char *s)
{
std::stringstream ss;
ss << s;
float val;
ss >> val;
return val;
}
#define READ_FROM_BUF(s) (void) doit(s) And of course the baseline executable with no float conversion is compiled with this: #define READ_FROM_BUF(s) You can see the entire file in here. This is compiled with the appropriate preprocessor definitions: see the relevant cmake for that. Then I did this for all combinations of (g++,clang++) x (Release,Debug) x (x86,x86_64). It is a lot of work if done manually, but I have a tool to cleanly automate this. The results above are a cross-panel of each function across the different builds after they complete. Finally, when diving deeper in the symbols present in the executable, I'm using Bloaty McBloatface. Let me know if something in particular is unclear. |
I do not understand. Sorry. I understand how including the header might increase the compilation time. So let us leave that aside. I do not understand what you mean by binary bloat. We have binary executables as part of the project, the test files. They do not use nearly 1 MB each. They use the library, evidently. They also use iostream. Look...
So I don't understand what you are measuring. |
Which compiler+version is that? |
Can you just do this for me...
(Adjust accordingly if you are under Visual Studio.) |
Indeed, those are the sizes I'm seeing as well. |
[24/11/20 22:08:32]--(jobs:1)--(~/proj/fast_float) (fast_float/(HEAD detached at caade69))
[jpmag@mozart] 3032$ ll build/linux-x86_64-gxx*/tests/
build/linux-x86_64-gxx10.2-Debug/tests/:
total 3.9M
2.0M -rwxr-xr-x 1 jpmag jpmag 2.0M Nov 24 22:02 basictest*
4.0K drwxr-xr-x 15 jpmag jpmag 4.0K Nov 24 22:07 CMakeFiles/
4.0K -rw-r--r-- 1 jpmag jpmag 1.5K Nov 24 21:58 cmake_install.cmake
8.0K -rw-r--r-- 1 jpmag jpmag 4.2K Nov 24 21:58 CTestTestfile.cmake
128K -rwxr-xr-x 1 jpmag jpmag 126K Nov 24 22:02 example_test*
100K -rwxr-xr-x 1 jpmag jpmag 99K Nov 24 22:02 exhaustive32*
140K -rwxr-xr-x 1 jpmag jpmag 137K Nov 24 22:02 exhaustive32_64*
108K -rwxr-xr-x 1 jpmag jpmag 106K Nov 24 22:02 exhaustive32_midpoint*
100K -rwxr-xr-x 1 jpmag jpmag 99K Nov 24 22:02 long_exhaustive32*
100K -rwxr-xr-x 1 jpmag jpmag 99K Nov 24 22:02 long_exhaustive32_64*
104K -rwxr-xr-x 1 jpmag jpmag 104K Nov 24 22:02 long_random64*
32K -rw-r--r-- 1 jpmag jpmag 31K Nov 24 21:58 Makefile
168K -rwxr-xr-x 1 jpmag jpmag 165K Nov 24 22:02 powersoffive_hardround*
104K -rwxr-xr-x 1 jpmag jpmag 104K Nov 24 22:02 random64*
148K -rwxr-xr-x 1 jpmag jpmag 145K Nov 24 22:02 random_string*
148K -rwxr-xr-x 1 jpmag jpmag 145K Nov 24 22:02 short_random_string*
460K -rwxr-xr-x 1 jpmag jpmag 457K Nov 24 22:02 string_test*
build/linux-x86_64-gxx10.2-Release/tests/:
total 1.2M
584K -rwxr-xr-x 1 jpmag jpmag 582K Nov 24 22:02 basictest*
4.0K drwxr-xr-x 15 jpmag jpmag 4.0K Nov 24 22:07 CMakeFiles/
4.0K -rw-r--r-- 1 jpmag jpmag 1.5K Nov 24 21:58 cmake_install.cmake
8.0K -rw-r--r-- 1 jpmag jpmag 4.2K Nov 24 21:58 CTestTestfile.cmake
36K -rwxr-xr-x 1 jpmag jpmag 35K Nov 24 22:02 example_test*
36K -rwxr-xr-x 1 jpmag jpmag 35K Nov 24 22:02 exhaustive32*
40K -rwxr-xr-x 1 jpmag jpmag 40K Nov 24 22:02 exhaustive32_64*
40K -rwxr-xr-x 1 jpmag jpmag 40K Nov 24 22:02 exhaustive32_midpoint*
40K -rwxr-xr-x 1 jpmag jpmag 39K Nov 24 22:02 long_exhaustive32*
40K -rwxr-xr-x 1 jpmag jpmag 39K Nov 24 22:02 long_exhaustive32_64*
40K -rwxr-xr-x 1 jpmag jpmag 39K Nov 24 22:02 long_random64*
32K -rw-r--r-- 1 jpmag jpmag 31K Nov 24 21:58 Makefile
48K -rwxr-xr-x 1 jpmag jpmag 46K Nov 24 22:02 powersoffive_hardround*
40K -rwxr-xr-x 1 jpmag jpmag 39K Nov 24 22:02 random64*
44K -rwxr-xr-x 1 jpmag jpmag 44K Nov 24 22:02 random_string*
44K -rwxr-xr-x 1 jpmag jpmag 44K Nov 24 22:02 short_random_string*
140K -rwxr-xr-x 1 jpmag jpmag 138K Nov 24 22:02 string_test* |
and similar for clang |
Right. So of course, the debug builds are fat, but that's fine. Now you might think "35KB is a lot" but fast_float is not itself responsible for all of the 35KB. Only maybe half. (Let us be clear: it is still a good idea to remove unneeded headers.) |
Yes, the sizes are reasonable. I even compiled the equivalent to my test inside and the results are still small: Head: caade69 Merge pull request #28 from lemire/dlemire/aqrit_magic
Tags: v0.2.0 (44), v0.3.0 (3)
Staged changes (4)
modified tests/CMakeLists.txt
@@ -40,3 +40,7 @@ fast_float_add_cpp_test(long_random64)
fast_float_add_cpp_test(random64)
fast_float_add_cpp_test(basictest)
fast_float_add_cpp_test(example_test)
+
+fast_float_add_cpp_test(bloat_baseline)
+fast_float_add_cpp_test(bloat_iostream)
+fast_float_add_cpp_test(bloat_fastfloat)
new file tests/bloat_baseline.cpp
@@ -0,0 +1,12 @@
+#include <cstdio>
+
+int main()
+{
+ #define BUFSIZE 128
+ char buf[BUFSIZE];
+ while(fgets(buf, BUFSIZE, stdin))
+ {
+ fputs(buf, stdout);
+ (void) 0;
+ }
+}
new file tests/bloat_fastfloat.cpp
@@ -0,0 +1,21 @@
+#include <cstdio>
+#include <cstring>
+#include <fast_float/fast_float.h>
+
+float doit(const char *s)
+{
+ float result;
+ fast_float::from_chars(s, s+strlen(s), result);
+ return result;
+}
+
+int main()
+{
+ #define BUFSIZE 128
+ char buf[BUFSIZE];
+ while(fgets(buf, BUFSIZE, stdin))
+ {
+ fputs(buf, stdout);
+ (void) doit(buf);
+ }
+}
new file tests/bloat_iostream.cpp
@@ -0,0 +1,23 @@
+#include <cstdio>
+#include <sstream>
+
+
+float doit(const char *s)
+{
+ std::stringstream ss;
+ ss << s;
+ float val;
+ ss >> val;
+ return val;
+}
+
+int main()
+{
+ #define BUFSIZE 128
+ char buf[BUFSIZE];
+ while(fgets(buf, BUFSIZE, stdin))
+ {
+ fputs(buf, stdout);
+ (void) doit(buf);
+ }
+} resulting in this: [24/11/20 22:19:26]--(jobs:2)--(~/proj/fast_float) (fast_float/(HEAD detached at caade69))
[jpmag@mozart] 3037$ ll build/linux-x86_64-*/tests/bloat*
20K -rwxr-xr-x 1 jpmag jpmag 20K Nov 24 22:19 build/linux-x86_64-clangxx11.0-Debug/tests/bloat_baseline*
92K -rwxr-xr-x 1 jpmag jpmag 90K Nov 24 22:19 build/linux-x86_64-clangxx11.0-Debug/tests/bloat_fastfloat*
32K -rwxr-xr-x 1 jpmag jpmag 30K Nov 24 22:19 build/linux-x86_64-clangxx11.0-Debug/tests/bloat_iostream*
20K -rwxr-xr-x 1 jpmag jpmag 17K Nov 24 22:19 build/linux-x86_64-clangxx11.0-Release/tests/bloat_baseline*
36K -rwxr-xr-x 1 jpmag jpmag 35K Nov 24 22:19 build/linux-x86_64-clangxx11.0-Release/tests/bloat_fastfloat*
20K -rwxr-xr-x 1 jpmag jpmag 18K Nov 24 22:19 build/linux-x86_64-clangxx11.0-Release/tests/bloat_iostream*
24K -rwxr-xr-x 1 jpmag jpmag 21K Nov 24 22:19 build/linux-x86_64-gxx10.2-Debug/tests/bloat_baseline*
96K -rwxr-xr-x 1 jpmag jpmag 94K Nov 24 22:19 build/linux-x86_64-gxx10.2-Debug/tests/bloat_fastfloat*
36K -rwxr-xr-x 1 jpmag jpmag 35K Nov 24 22:19 build/linux-x86_64-gxx10.2-Debug/tests/bloat_iostream*
20K -rwxr-xr-x 1 jpmag jpmag 17K Nov 24 22:19 build/linux-x86_64-gxx10.2-Release/tests/bloat_baseline*
36K -rwxr-xr-x 1 jpmag jpmag 34K Nov 24 22:19 build/linux-x86_64-gxx10.2-Release/tests/bloat_fastfloat*
20K -rwxr-xr-x 1 jpmag jpmag 19K Nov 24 22:19 build/linux-x86_64-gxx10.2-Release/tests/bloat_iostream* |
Ok. In any case, your PR was good no matter what. |
I double checked: the bloated sizes I reported previously are definitely correct, and absolutely they go away when I remove the include of iostream. How to reconcile this? Given that the input code is actually the same but the sizes differ, we must be led to think it comes down to how the files are compiled. Indeed the compilation lines differ between the projects, and the main difference is dynamic vs static linking. But I am getting ahead of myself - let's watch the movie without spoilers. Here's first the complete line from inside fast_float: # output stripped for clarity
[24/11/20 22:35:37]--(jobs:2)--(~/proj/fast_float) (fast_float/(HEAD detached at caade69))
[jpmag@mozart] 3041$ ( cd build/linux-x86_64-gxx10.2-Debug/tests ; make VERBOSE=1 -B bloat_fastfloat && echo && ll CMakeFiles/bloat_fastfloat.dir/*.o bloat_fastfloat )
[ 50%] Building CXX object tests/CMakeFiles/bloat_baseline.dir/bloat_baseline.cpp.o
/usr/bin/g++ -I/opt/jpmag/proj/fast_float/include -I/opt/jpmag/proj/fast_float/build/linux-x86_64-gxx10.2-Debug/_deps/doctest-src -m64 -g -Werror -Wall -Wextra -Weffc++ -Wsign-compare -Wshadow -Wwrite-strings -Wpointer-arith -Winit-self -Wconversion -Wsign-conversion -std=gnu++11 -o CMakeFiles/bloat_baseline.dir/bloat_baseline.cpp.o -c /opt/jpmag/proj/fast_float/tests/bloat_baseline.cpp
[100%] Linking CXX executable bloat_baseline
/usr/bin/g++ -m64 -g CMakeFiles/bloat_baseline.dir/bloat_baseline.cpp.o -o bloat_baseline
96K -rwxr-xr-x 1 jpmag jpmag 94K Nov 24 22:58 bloat_fastfloat*
100K -rw-r--r-- 1 jpmag jpmag 97K Nov 24 22:58 CMakeFiles/bloat_fastfloat.dir/bloat_fastfloat.cpp.o Now from my project (using the same fast_float commit as above, pre-merge, # output stripped for clarity
[24/11/20 23:01:35]--(jobs:0)--(/opt/jpmag/proj/c4core) (c4core.git/master)
[jpmag@mozart] 3018$ ( cd build/linux-x86_64-gxx10.2-Debug/bm/float ; make VERBOSE=1 -B c4core-bm-readfloat-fast_float_f && echo && ll CMakeFiles/*fast_float_f.dir/*o c4core-bm-readfloat-fast_float_f )
/usr/bin/g++ -DC4FLOAT_FASTFLOAT_F=1 -I/opt/jpmag/proj/c4core/src -m64 -g -Werror -pedantic-errors -fstrict-aliasing -Wall -Wextra -pedantic -Wshadow -Wnon-virtual-dtor -Wcast-align -Wunused -Woverloaded-virtual -Wpedantic -Wconversion -Wsign-conversion -Wdouble-promotion -Wfloat-equal -Wformat=2 -Wlogical-op -Wuseless-cast -std=c++11 -o CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -c /opt/jpmag/proj/c4core/bm/float/read.cpp
[100%] Linking CXX executable c4core-bm-readfloat-fast_float_f
/usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o c4core-bm-readfloat-fast_float_f -static-libgcc -static-libstdc++
1.4M -rwxr-xr-x 1 jpmag jpmag 1.4M Nov 24 23:01 c4core-bm-readfloat-fast_float_f*
100K -rw-r--r-- 1 jpmag jpmag 97K Nov 24 23:01 CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o Let me paste both again, but stripping out flags for warnings and includes:
In the compile line the only relevant difference is that your project uses compiler extensions, However, the final result is dramatically different, and I am convinced the difference comes down to the link step. If you notice, I am requesting a static link of the standard library through [24/11/20 23:34:44]--(jobs:0)--(/opt/jpmag/proj/c4core/build/linux-x86_64-gxx10.2-Debug/bm/float)
[jpmag@mozart] 3034$ (set -x ; exe=c4core-bm-readfloat-fast_float_f ; /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o $exe ; ll $exe CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o ; ldd $exe ; readelf -s $exe )
+ exe=c4core-bm-readfloat-fast_float_f
+ /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o c4core-bm-readfloat-fast_float_f
+ ls --color=auto --color=auto -lFhs c4core-bm-readfloat-fast_float_f CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
96K -rwxr-xr-x 1 jpmag jpmag 94K Nov 24 23:35 c4core-bm-readfloat-fast_float_f*
100K -rw-r--r-- 1 jpmag jpmag 97K Nov 24 23:30 CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
+ ldd c4core-bm-readfloat-fast_float_f
linux-vdso.so.1 (0x00007ffe738e5000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fbbc39e3000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007fbbc389d000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fbbc3883000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fbbc36ba000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fbbc3c2a000)
+ readelf -s c4core-bm-readfloat-fast_float_f
Symbol table '.dynsym' contains 17 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC WEAK DEFAULT UND [...]@GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __[...]@GLIBC_2.4 (3)
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fputs@GLIBC_2.2.5 (2)
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBCXX_3.4 (4)
8: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fgets@GLIBC_2.2.5 (2)
9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _[...]@CXXABI_1.3 (5)
10: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
11: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
12: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
13: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
14: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBCXX_3.4 (4)
15: 000000000000a070 8 OBJECT GLOBAL DEFAULT 25 [...]@GLIBC_2.2.5 (2)
16: 000000000000a080 8 OBJECT GLOBAL DEFAULT 25 stdin@GLIBC_2.2.5 (2) Notice that now the executable size is exactly the same as yours! When we link dynamically we are ignoring all the standard code that our executable is bringing in. That's why I started my analysis with static flags. Now let's add [24/11/20 23:35:03]--(jobs:0)--(/opt/jpmag/proj/c4core/build/linux-x86_64-gxx10.2-Debug/bm/float)
[jpmag@mozart] 3035$ (set -x ; exe=c4core-bm-readfloat-fast_float_f ; /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o $exe -static-libgcc -static-libstdc++ ; ll $exe CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o ; ldd $exe ; readelf -s $exe )
+ exe=c4core-bm-readfloat-fast_float_f
+ /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o c4core-bm-readfloat-fast_float_f -static-libgcc -static-libstdc++
+ ls --color=auto --color=auto -lFhs c4core-bm-readfloat-fast_float_f CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
1.4M -rwxr-xr-x 1 jpmag jpmag 1.4M Nov 24 23:36 c4core-bm-readfloat-fast_float_f*
100K -rw-r--r-- 1 jpmag jpmag 97K Nov 24 23:30 CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
+ ldd c4core-bm-readfloat-fast_float_f
linux-vdso.so.1 (0x00007ffcee192000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007efd06888000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007efd066bf000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007efd06b2b000)
+ readelf -s c4core-bm-readfloat-fast_float_f
Symbol table '.dynsym' contains 111 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_addUserComm[...]
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_memcpyRtWn
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
....
12: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
13: 0000000000000000 0 FUNC GLOBAL DEFAULT UND close@GLIBC_2.2.5 (2)
14: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
15: 0000000000000000 0 FUNC GLOBAL DEFAULT UND ioctl@GLIBC_2.2.5 (2)
16: 0000000000000000 0 FUNC GLOBAL DEFAULT UND abort@GLIBC_2.2.5 (2)
17: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
18: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
....
27: 0000000000000000 0 FUNC GLOBAL DEFAULT UND read@GLIBC_2.2.5 (2)
28: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
...
34: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fgets@GLIBC_2.2.5 (2)
...
37: 0000000000000000 0 NOTYPE WEAK DEFAULT UND pthread_once
38: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
39: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
40: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
41: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ZGTtdlPv
42: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
43: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
44: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fputc@GLIBC_2.2.5 (2)
45: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free@GLIBC_2.2.5 (2)
46: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
47: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
48: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
49: 0000000000000000 0 FUNC WEAK DEFAULT UND [...]@GLIBC_2.2.5 (2)
50: 0000000000000000 0 FUNC GLOBAL DEFAULT UND wctob@GLIBC_2.2.5 (2)
51: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __[...]@GLIBC_2.3 (3)
...
57: 0000000000000000 0 FUNC GLOBAL DEFAULT UND iconv@GLIBC_2.2.5 (2)
58: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_RU8
59: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
60: 0000000000000000 0 FUNC GLOBAL DEFAULT UND poll@GLIBC_2.2.5 (2)
...
64: 0000000000000000 0 FUNC GLOBAL DEFAULT UND putwc@GLIBC_2.2.5 (2)
65: 0000000000000000 0 FUNC GLOBAL DEFAULT UND putc@GLIBC_2.2.5 (2)
66: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
67: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
68: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fread@GLIBC_2.2.5 (2)
69: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
70: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_memcpyRnWt
71: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
...
77: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __pthread_key_create
78: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getwc@GLIBC_2.2.5 (2)
...
106: 0000000000000000 0 FUNC GLOBAL DEFAULT UND fputs@GLIBC_2.2.5 (2)
107: 00000000000fa1c8 8 OBJECT GLOBAL DEFAULT 29 [...]@GLIBC_2.2.5 (2)
108: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (2)
109: 00000000000fa1c0 8 OBJECT GLOBAL DEFAULT 29 stdin@GLIBC_2.2.5 (2)
110: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __[...]@GLIBC_2.4 (5) So indeed the executable is 1.4MB. But it is still bringing in dynamic symbols. That's because I forgot to [24/11/20 23:36:05]--(jobs:0)--(/opt/jpmag/proj/c4core/build/linux-x86_64-gxx10.2-Debug/bm/float)
[jpmag@mozart] 3036$ (set -x ; exe=c4core-bm-readfloat-fast_float_f ; /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o $exe -static -static-libgcc -static-libstdc++ ; ll $exe CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o ; ldd $exe ; readelf -s $exe )
+ exe=c4core-bm-readfloat-fast_float_f
+ /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o c4core-bm-readfloat-fast_float_f -static -static-libgcc -static-libstdc++
+ ls --color=auto --color=auto -lFhs c4core-bm-readfloat-fast_float_f CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
2.3M -rwxr-xr-x 1 jpmag jpmag 2.3M Nov 24 23:37 c4core-bm-readfloat-fast_float_f*
100K -rw-r--r-- 1 jpmag jpmag 97K Nov 24 23:30 CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
+ ldd c4core-bm-readfloat-fast_float_f
not a dynamic executable
+ readelf -s c4core-bm-readfloat-fast_float_f
# .dynsym is empty Notice that this now uses no dynamic symbols anymore. So the executable size with all the required code in it is actually 2.3MB, even worse than the initial 1.4MB I was reporting. The static vs dynamic question is the origin of this confusion. Dynamic sizes can be misleading. One final point: the commit under analysis in this post still has the include. Let's try now with the tip of this MR: First, dynamic (no static flags): [24/11/20 23:57:33]--(jobs:0)--(/opt/jpmag/proj/c4core/build/linux-x86_64-gxx10.2-Debug/bm/float)
[jpmag@mozart] 3041$ (set -x ; exe=c4core-bm-readfloat-fast_float_f ; /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o $exe ; ll $exe CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o )
+ exe=c4core-bm-readfloat-fast_float_f
+ /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o c4core-bm-readfloat-fast_float_f
+ ls --color=auto --color=auto -lFhs c4core-bm-readfloat-fast_float_f CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
96K -rwxr-xr-x 1 jpmag jpmag 93K Nov 24 23:57 c4core-bm-readfloat-fast_float_f*
96K -rw-r--r-- 1 jpmag jpmag 95K Nov 24 23:56 CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o So the dynamic executable without the include is the same size as with the include at 93KB (maybe 1KB shorter, didn't check). No effect from removing the header. Now let's add [24/11/20 23:57:30]--(jobs:0)--(/opt/jpmag/proj/c4core/build/linux-x86_64-gxx10.2-Debug/bm/float)
[jpmag@mozart] 3040$ (set -x ; exe=c4core-bm-readfloat-fast_float_f ; /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o $exe -static-libgcc -static-libstdc++ ; ll $exe CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o )
+ exe=c4core-bm-readfloat-fast_float_f
+ /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o c4core-bm-readfloat-fast_float_f -static-libgcc -static-libstdc++
+ ls --color=auto --color=auto -lFhs c4core-bm-readfloat-fast_float_f CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
212K -rwxr-xr-x 1 jpmag jpmag 209K Nov 24 23:57 c4core-bm-readfloat-fast_float_f*
96K -rw-r--r-- 1 jpmag jpmag 95K Nov 24 23:56 CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o Now the executable went from 1.4MB with include to 209KB without include, as reported at the beginning of this MR. Finally, full static with [24/11/20 23:57:08]--(jobs:0)--(/opt/jpmag/proj/c4core/build/linux-x86_64-gxx10.2-Debug/bm/float)
[jpmag@mozart] 3039$ (set -x ; exe=c4core-bm-readfloat-fast_float_f ; /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o $exe -static -static-libgcc -static-libstdc++ ; ll $exe CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o )
+ exe=c4core-bm-readfloat-fast_float_f
+ /usr/bin/g++ -m64 -g CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o -o c4core-bm-readfloat-fast_float_f -static -static-libgcc -static-libstdc++
+ ls --color=auto --color=auto -lFhs c4core-bm-readfloat-fast_float_f CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o
936K -rwxr-xr-x 1 jpmag jpmag 934K Nov 24 23:57 c4core-bm-readfloat-fast_float_f*
96K -rw-r--r-- 1 jpmag jpmag 95K Nov 24 23:56 CMakeFiles/c4core-bm-readfloat-fast_float_f.dir/read.cpp.o So the final static, all-in, executable size went from 2.3MB with include to 934K without include - a difference of ~1.2MB consistent with the case above, which together with the previous set of flags reliably proves that the include of iostreams was costing ~1.2MB of binary size. For a moment there I was also confused, but I think this explains the differences we were observing. I definitely learned something while trying to figure this. |
I knew it was. But for a moment I was worried I would not be able to convincingly prove that it was. Bold statements like that of the title require sure proof. |
@lemire let me know if there's still something not clear. I'm curious to investigate bloaty's analysis on the several programs above, and how tweaks to the code (eg |
Picking up on the discussion started on #23 about large binaries: it is definitely real. This is what I'm seeing on the size harness that I detailed on that discussion:
Notice that this happens with both g++ and clang++, for x86 and x86_64 and also for Debug and Release. Notice also that the baseline executable consisting of the
while(fgets()) { fputs() }
is rarely above 20KB.When you point out that the fast_float code is small, you are right. But there is an
#include <iostream>
, and that is usually reason enough to cause bloated binaries. It brings a mountain of code: 30K lines and 713K characters, together with exceptions, new()s, delete()s, etc:Let's look at the sizes for iostream:
Don't these sizes look suspiciously similar to fast_float above? Let's check:
A lot of entries suspiciously related to stream/string. So let's see what happens if we remove these:
... and as I expected the result is now this:
So the size went down from 1.4MB to 0.2MB. The new clang size of 200KB is still high, but we can take a look at that at a later occasion. Let's take a look at the new binary:
So that was it. streams was our culprit.
This is actually not a surprise; I've seen it before. But unfortunately, for most people this will likely come as surprise, even if they have a faint idea of the cost of streams. They should have no place in code that is intended to be lean and fast. They are the exact opposite of that and should be, to paraphrase goto, "streams considered evil". The headers are heavy, the binaries are heavy, and the code is slow. They certainly do not follow C++'s mantra of not paying for what's not used. Streams stand to C++ as slavery once did to society: they are widely used and they may seem an integral part of daily life, but they are evil, and with many people you run a risk of being taken for a lunatic if you point out how evil streams are. Like with slavery, status quo is very strong.
I will now stop the rant, collect myself and press the submit button :-)