-
Notifications
You must be signed in to change notification settings - Fork 2
/
01-intro.txt
434 lines (393 loc) · 20.6 KB
/
01-intro.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
Introduction
============
Credits: lecture notes judiciously adapted from MIT 6.858 by Nickolai Zeldovich & James Mickens
Administrivia
Lectures will be Thu 8:80-10:30, in SUD 03.
In some exceptions, like today, Mon 14:00-16:00 in SUD 19. Check the schedule.
For most lectures there are assigned readings. Make sure you prepare the
reading before coming to class.
Interrupt, ask questions, point out mistakes.
In-class exercises will regularly take place on Monday.
Exercises will be given and a sketch of their solution will be presented
during the session.
We will supply solution sheet afterwards but not for all exercises.
Presence is strongly encouraged.
There will be also practical lab sessions. These require that you come with a laptop.
You can work in groups.
Before Lab 1 -> Go to www.root-me.org -> Create an account
Final exam counts 65% of grade. Will be a written exam of 3 hours. Exam is closed-book.
Assignments: 3 homework + 2 programming projects
Homeworks are lightweight exercises, similar to exam questions.
Projects involve some programming effort.
Projects will look like real-world systems, in some respects:
Many interacting parts written in different languages.
Will look at/write x86 asm, C, Python, Javascript, ..
1st project 1 out on 11 Feb: buffer overflows. Start early.
During 1st lab session, tutorials on how to get started with project 1.
You have a total of 72 late hours to use throughout the semester.
You are encouraged to discuss the assignments with your classmates.
However, any solution and code you submit must be your own work.
You may not share code with others or copy code from outside sources,
except where the assignment specifically allows it.
You may use books or online resources to help solve assignments,
but you must always credit all such sources in your writeup.
Cheating, plagiarism, and any form of dishonesty will be handled with
maximum severity.
If you are ever in doubt about whether an action on your part may
constitute unacceptable collaboration, please ask the course staff before.
What's required?
Background in computer networks and basic knowledges in programming (C,
Python, JavaScript) and computer systems (UNIX systems, basics of computer
architecture) are expected.
Course website has references to textbooks in case you need to feel some gaps.
Students who are not sure of their background should get in touch.
What's not required?
There are no texbooks. Course website gives a few references to recommended
textbooks.
One TAs: Xiao Chen.
Sign up for Piazza (https://piazza.com/uclouvain.be/spring2016/ingi2347/home).
Use it to ask questions about assignments, see what others are stuck on, etc.
We will post any important announcements there.
Certain lectures will be based on MIT 6.858, which was video taped and is
available if you need to catch up. http://css.csail.mit.edu/6.858/2014/schedule.html
Questions before we proceed?
Warning about security work/research on UCL (and in general).
Know the rules: http://www.uclouvain.be/11212.html.
You will learn how to attack systems, so that you know how to defend them.
None of this is in any way an invitation to undertake these in any fashion
other than with informed consent of all involved parties.
Just because something is technically possible, doesn't mean it's legal.
Ask course staff for advice if in doubt.
What is security?
Achieving some goal in the presence of an adversary.
Many systems are connected to the internet, which has adversaries.
Thus, design of many systems might need to address security.
i.e., will the system work when there's an adversary?
High-level plan for thinking about security:
Policy: the goal you want to achieve.
e.g. only Alice should read file F.
Common goals: confidentiality - no unathorized disclosure of information
integrity - no unathorized information modification
availability - information or system remains available despite attacks
Threat model: assumptions about what the attacker could do.
e.g. can guess passwords, cannot physically grab file server.
Better to err on the side of assuming attacker can do something.
Mechanism: knobs that your system provides to help uphold policy.
e.g. user accounts, passwords, file permissions, encryption.
Resulting goal: no way for adversary within threat model to violate policy.
Note that goal has nothing to say about mechanism.
Why is security hard? Negative goal.
Need to guarantee policy, assuming the threat model.
Difficult to think of all possible ways that attacker might break in.
Realistic threat models are open-ended (almost negative models).
Contrast: easy to check whether a positive goal is upheld,
e.g., Alice can actually read file F.
Weakest link matters.
Iterative process: design, update threat model as necessary, etc.
What's the point if we can't achieve perfect security?
In this class, we'll push the boundary of each system to see when it breaks.
Each system will likely have some breaking point leading to compromise.
Doesn't necessarily mean the system is not useful: depends on context.
Important to understand what a system can do, and what a system cannot.
In reality, must manage security risk vs benefit.
More secure systems means less risk (or consequence) of some compromises.
Insecure system may require manual auditing to check for attacks, etc.
Higher cost of attack means more adversaries will be deterred.
Better security often makes new functionality practical and safe.
Suppose you want to run some application on your system.
Large companies sometimes prohibit users from installing software that
hasn't been approved on their desktops, partly due to security.
Javascript in the browser is isolated, making it ok (for the most part)
to run new code/applications without manual inspection/approval.
(or virtual machines, or Native Client, or better OS isolation mechanisms)
Similarly, VPNs make it practical to mitigate risk of allowing employees
to connect to a corporate network from anywhere on the Internet.
What goes wrong #1: problems with the policy.
Example: Sarah Palin's email account.
[ Ref: http://en.wikipedia.org/wiki/Sarah_Palin_email_hack ]
Yahoo email accounts have a username, password, and security questions.
User can log in by supplying username and password.
If user forgets password, can reset by answering security Qs.
Security questions can sometimes be easier to guess than password.
Some adversary guessed Sarah Palin's high school, birthday, etc.
Policy amounts to: can log in with either password or security Qs.
(no way to enforce "Only if user forgets password, then ...")
Example: Mat Honan's accounts at Amazon, Apple, Google, etc.
[ Ref: http://www.wired.com/gadgetlab/2012/08/apple-amazon-mat-honan-hacking/all/ ]
Gmail password reset: send a verification link to a backup email address.
Google helpfully prints part of the backup email address.
Mat Honan's backup address was his Apple @me.com account.
Apple password reset: need billing address, last 4 digits of credit card.
Address can be easy, but how to get 4 digits of user's credit card number?
Amazon: can add a credit card to an account, no password required.
Amazon password reset: provide any one of user's credit card numbers.
Amazon: will not print credit card numbers.. But will print last 4 digits!
Example: Twitter's @N account hijacking.
[ Ref: https://medium.com/p/24eb09e026dd ]
Can be hard for legitimate user to prove they own an account!
How to solve?
Think hard about implications of policy statements.
Some policy checking tools can help, but need a way to specify what's bad.
Difficult in distributed systems: don't know what everyone is doing.
What goes wrong #2: problems with threat model / assumptions.
Example: human factors not accounted for.
Phishing attacks.
User gets email asking to renew email account, transfer money, or ...
Tech support gets call from convincing-sounding user to reset password.
"Rubberhose cryptanalysis".
Example: computational assumptions change over time.
MIT's Kerberos system used 56-bit DES keys, since mid-1980's.
At the time, seemed fine to assume adversary can't check all 2^56 keys.
No longer reasonable: now costs about $100.
[ Ref: https://www.cloudcracker.com/dictionaries.html ]
Several years ago, a students project at MIT showed can get any key in a day.
Example: assuming your hardware is trustworthy.
If NSA is your adversary, turns out to not be a good assumption.
[ Ref: https://www.schneier.com/blog/archives/2013/12/more_about_the.html ]
Example: all SSL certificate CAs are fully trusted.
To connect to an SSL-enabled web site, web browser verifies certificate.
Certificate is a combination of server's host name and cryptographic key,
signed by some trusted certificate authority (CA).
Long list (hundreds) of certificate authorities trusted by most browsers.
If any CA is compromised, adversary can intercept SSL connections with
a "fake" certificate for any server host name.
In 2011, two CAs were compromised, issued fake certs for many domains
(google, yahoo, tor, ...), apparently used in Iran (?).
[ Ref: http://en.wikipedia.org/wiki/DigiNotar ]
[ Ref: http://en.wikipedia.org/wiki/Comodo_Group ]
In 2012, a CA inadvertently issued a root certificate valid for any domain.
[ Ref: http://www.h-online.com/security/news/item/Trustwave-issued-a-man-in-the-middle-certificate-1429982.html ]
Example: machines disconnected from the Internet are secure?
Stuxnet worm spread via specially-constructed files on USB drives.
Show video: Stuxnet: Anatomy of a Computer Virus (watch at https://vimeo.com/25118844)
Example: assuming good randomness for cryptography.
Need high-quality randomness to generate the keys that can't be guessed.
Problem: embedded devices, virtual machines may not have much randomness.
As a result, many keys are similar or susceptible to guessing attacks.
[ https://factorable.net/weakkeys12.extended.pdf ]
Example: subverting military OS security.
In the 80's, military encouraged research into secure OS'es.
One unexpected way in which OS'es were compromised:
adversary gained access to development systems, modified OS code.
Example: subverting firewalls.
Adversaries can connect to an unsecured wireless behind firewall
Adversaries can trick user behind firewall to disable firewall
Might suffice just to click on link http://firewall/?action=disable
Or maybe buy an ad on CNN.com pointing to that URL (effectively)?
How to solve?
More explicit threat models, to understand possible weaknesses.
Simpler, more general threat models.
Better designs may eliminate / lessen reliance on certain assumptions.
E.g., alternative trust models that don't have fully-trusted CAs.
E.g., authentication mechanisms that aren't susceptible to phishing.
What goes wrong #3: problems with the mechanism -- bugs.
Bugs in the security mechanism (e.g., OS kernel) lead to vulnerabilities.
If application is enforcing security, app-level bugs lead to vulnerabilities.
Example: Apple's iCloud password-guessing rate limits.
[ Ref: https://github.com/hackappcom/ibrute ]
People often pick weak passwords; can often guess w/ few attempts (1K-1M).
Most services, including Apple's iCloud, rate-limit login attempts.
Apple's iCloud service has many APIs.
One API (the "Find my iPhone" service) forgot to implement rate-limiting.
Adversary could make many attempts at guessing passwords.
Probably as fast as they can send packets: >>M/day.
Example: Missing access control checks in Citigroup's credit card web site.
[ Ref: http://www.nytimes.com/2011/06/14/technology/14security.html ]
Citigroup allowed credit card users to access their accounts online.
Login page asks for username and password.
If username and password OK, redirected to account info page.
The URL of the account info page included some numbers.
Turns out the numbers were (related to) the user's account number.
Worse yet, server didn't check that you were logged into that account.
Adversary tried different numbers, got different people's account info.
Possibly a wrong threat model: doesn't match the real world?
System is secure if adversary browses the web site through browser.
System not secure if adversary synthesizes new URLs on their own.
Hard to say if developers had wrong threat model, or buggy mechanism.
Example: Android's Java SecureRandom weakness leads to Bitcoin theft.
[ Ref: https://bitcoin.org/en/alert/2013-08-11-android ]
Bitcoins can be spent by anyone that knows the owner's private key.
Many Bitcoin wallet apps on Android used Java's SecureRandom API.
Turns out the system was sometimes forgetting to seed the PRNG!
As a result, some Bitcoin keys turned out to be easy to guess.
Adversaries searched for guessable keys, spent any corresponding bitcoins.
Example: bugs in sandbox (NaCl, Javascript, Java runtime).
Allows adversary to escape isolation, do operations they weren't supposed to.
Example: Moxie Marlinspike's SSL certificate name checking bug
[ Ref: http://hackaday.com/2009/07/29/black-hat-2009-breaking-ssl-with-null-characters/ ]
SSL certificate: string size, followed by the domain name
Null byte vs. length-encoding.
Example: buffer overflows (see below).
Case study: buffer overflows.
Consider a web server.
Often times, the web server's code is responsible for security.
E.g., checking which URLs can be accessed, checking SSL client certs, ..
Thus, bugs in the server's code can lead to security compromises.
What's the threat model, policy?
Assume that adversary can connect to web server, supply any inputs.
Policy is a bit fuzzy: only perform operations intended by programmer?
E.g., don't want adversary to steal data, bypass checks, install backdoors.
What the code does: overly specific -> if there is a bug, the system does what the wrong code does.
Threat model: adversary doesn't have access to machine but can send any request it wants.
Web server software is the mechanism: it enforces the policy.
Consider the following simplified example code from, say, a web server:
int read_req(void) {
char buf[128];
int i;
gets(buf);
i = atoi(buf);
return i;
}
Demo to go along with the discussion below:
% make
% ./readreq
1234
% ./readreq
12341234123412341234
% ./readreq
AAAAAAAAAAAA....AAAA
% gdb ./readreq
b read_req
r
info reg
disas $eip
print &buf[0]
print &i
x $ebp
x $ebp+4
disas 0x080484b8
next
AAAAAAA...AAA
print &buf[0]
x $ebp
x $ebp+4
next
print &buf[0] ## why just 128 bytes now?
x $ebp
x $ebp+4
disas $eip
nexti
nexti
disas $eip
info reg
x $esp
stepi
info reg
stepi
Repeat demo
b read_req
r
next
AAA..AA
next
nexti
nexti
disas
disas main
Jump to the movl before call to printf
set {int}$esp = 0x080484c4 ##from main
c
The code jumped to where we wanted, called printf but crashed. Why?
The main does a ret but does not have a valid return addr.
We did not carefully setup all the ret address carefully.
You will do it in project 1.
What does the compiler generate in terms of memory layout?
x86 stack looks like this:
Stack grows down.
%esp points to the last (bottom-most) valid thing on the stack.
%ebp points to the caller's %esp value.
+------------------+ higher mem addresses
entry %ebp ----> | .. prev frame .. |
| | |
| | | stack grows down
+------------------+ |
entry %esp ----> | return address | v
+------------------+
new %ebp ------> | saved %ebp |
+------------------+
| i |
+------------------+
| buf[127] |
| ... |
| buf[0] |
+------------------+
new %esp ------> | ... |
+------------------+
Caller's code (say, main):
call read_req
read_req's code:
push %ebp
mov %esp -> %ebp
sub 168, %esp # stack vars, etc
...
mov %ebp -> %esp
pop %ebp
ret
How does the adversary take advantage of this code?
Supply long input, overwrite data on stack past buffer.
Interesting bit of data: return address, gets used by 'ret'.
Can set return address to the buffer itself, include some code in there.
How does the adversary know the address of the buffer?
What if one machine has twice as much memory?
Luckily for adversary, virtual memory makes things more deterministic.
For a given OS and program, addresses will often be the same.
What happens if stack grows up, instead of down?
Look at the stack frame for gets.
Can we do something interesting beyond overflowing the buffer?
An adversary can place code in the buffer and then change the ret address to
point inside the injected code.
This injected executable code is called shell code.
What can the adversary do once they are executing code?
Use any privileges of the process.
Often leverage overflow to gain easier access into system.
Originally on Unix, run shell /bin/sh (thus, "shell code").
Called return to libc attacks.
If the process is running as root or Administrator, can do anything.
Even if not, can still send spam, read files (web server, database), ..
Can attack other machines behind a firewall.
Why would programmers write such code?
Legacy code, wasn't exposed to the internet.
Programmers were not thinking about security.
Many standard functions used to be unsafe (strcpy, gets, sprintf).
Even safe versions have gotchas (strncpy does not null-terminate).
-Systems software is often written in C (operating
systems, file systems, databases, compilers, network
servers, command shells and console utilities)
-C is essentially high-level assembly, so . . .
*Exposes raw pointers to memory
*Does not perform bounds-checking on arrays
(b/c the hardware doesn't do this, and C
wants to get you as close to the hardware
as possible)
-Attack also leveraged architectural knowledge about
how x86 code works:
*The direction that the stack grows
*Layout of stack variables (esp. arrays and
return addresses for functions)
More generally, any memory errors can translate into a vulnerability.
Using memory after it has been deallocated (use-after-free).
If writing, overwrite new data structure, e.g. function ptr.
If reading, might call a corrupted function pointer.
Freeing the same memory twice (double-free).
Might cause malloc to later return the same memory twice.
Decrementing the stack ptr past the end of stack, into some other memory.
[ http://www.invisiblethingslab.com/resources/misc-2010/xorg-large-memory-attacks.pdf ]
A one-byte stray write can lead to compromise.
[ http://www.openwall.com/lists/oss-security/2014/08/26/2 ]
Might not even need to overwrite a return address or function pointer.
Can suffice to read sensitive data like an encryption key.
Can suffice to change some bits (e.g. int isLoggedIn, int isRoot).
How to avoid mechanism problems?
In general have fewer mechanisms.
Reduce the amount of security-critical code.
Don't rely on the entire application to enforce security.
Avoid bugs in security-critical code.
E.g., don't use gets(), use fgets() which can limit buffer length.
Use common, well-tested security mechanisms ("Economy of mechanism").
Audit these common security mechanisms (lots of incentive to do so).
Avoid developing new, one-off mechanisms that may have bugs.
Good mechanism supports many uses, policies (more incentive to audit).
Examples of common mechanisms:
- OS-level access control (but, could often be better)
- network firewalls (but, could often be better)
- cryptography, cryptographic protocols.