Secure Web Applications via Automatic Partitioning ================================================== What is the problem this paper is trying to address? Fast web applications require pushing application code into the browser Must ensure this doesn't inappropriately disclose data E.g., game that pushes secret info to client allows cheating Must ensure this doesn't inappropriately change data on server E.g., shopping site must not let browser chose price of items purchased Let's look at Jif labels What does "int {alice->bob,alice; bob->bob,alice,charlie} x;" mean? Declares x to be an integer with the label between braces {...} Labels have two clauses representing information flow restrictions Consider "alice->bob,alice". Let's break it in two: "->bob,alice" - says only bob or alice should be able to see value of x mnemonic: x's value should only flow to (->) alice,bob "alice" - represents "owner" of the restriction i.e., person who cares that this restriction is enforced Similarly, bob only wants alice, bob, or charlie to see the data What does "int {alice->bob,alice; bob<-bob,charlie} y;" mean? First clause same as above Second clause, "bob<-bob,charlie": "<-bob,charlie" - says only bob or charlie should influence value of y mnemonic: value should only flow from (<-) bob,charlie "bob" - owner of this restriction (who cares it's enforced) When should you be able to assign B = A? B's label (L_B) should be at least as restrictive as A's label (L_A) We will write this as L_A [= L_B ("L_A can flow to L_B") How to define L_A [= L_B with Jif labels? Consider each secrecy (->) clause in L_A must have one at least as restrictive in L_B Example: {alice->alice,bob} [= {alice->alice} If only alice can read B, more restrictive than alice OR bob {alice->alice,bob} [= {alice->bob; bob->charlie} Bob adds restriction to B, but alice's restriction preserved {alice->alice,bob} NOT[= {bob->alice,bob} Consider each integrity (<-) clause in L_B there must be one at least as restrictive in L_A Example: {bob<-bob} [= {bob<-bob,charlie} only bob can write B, while A writable by bob or charlie {bob<-bob,charlie} NOT[= {bob<-bob} can't assign A to B if A writable by charlie but not B When should you allow a compound statement like C = A + B? Information is flowing from A to C and from B to C So want L_A [= L_C and L_B [= L_C Can also view label of "A + B" as least upper bound of L_A and L_B So requirement becomes (L_A LUB L_B) [= L_C Note one principle can *act for* another If alice acts for bob, means bob completely trusts alice So {bob->bob,alice} [= {alice->bob,alice} What is an implicit flow? See if statement in sec 3.1 on p. 3: "if (x == 0) z = y;" Note even if L_y [= L_z, this might leak information about x To fix this, must view the program counter PC as having a label Initially, L_{PC} is empty. In if statement, is set to L_x. When compiling "z = y", must check (L_y LUB L_{PC}) [= L_z How to compute L_C = (L_A LUB L_B) for Jif labels? Every secrecy clause in L_A or L_B must be reflected in L_C E.g., {a->t,u} LUB {b->u,w} = {a->t,u; b->u,w} If same owner, can combine clauses: {a->t,u,v} LUB {a->u,v,w} = {a->u,v} Note intersection of readers: Can only read C if could read A and B Every integrity clause in L_C must be reflected in L_A AND L_B E.g., {} LUB {a<-v,w} = {} AND {a<-t,u} LUB {b<-u,w} = {} Because restrictions appearing in only on label are dropped: Can combine clauses w. same owner: {a<-u,v} LUB {a<-v,w} = {a<-u,v,w} Note union of writers: Anyone who could write A or B can affect C Here any of u, v, or w could influence value of var labeled w. LUB How can you downgrade data with Jif? There is also a notion of what privileges you are running with E.g., might be running on behalf of alice; alice might also act for bob Add two language constructs: declassify and endorse declassify allows you to remove secrecy clauses from variables endorse allows you to add integrity clauses When can you add and remove these clauses in violation of [=? When principal you are running on behalf of owns the restriction So bob can remove {bob->anything} and add {bob<-anything} But alice could only do that if she acts for bob How do we map these restrictions onto web applications? Have two built in principals * - is the server; implicitly trusted by the client client - is one particular client note not a persistent identity like bob might have many clients connecting to same server No static data can have "client" in its label Code running on for different browsers can share data though static vars But "client" means different things in these different contexts On the other hand, can let client *speak for* bob So this lets us capture a notion of authentication E.g., "int {*<-bob} x;" - x only writable by clients authenticated as bob Let's look at figure 2 Game where user can try to guess a number "tries" times makeGuess - method called with "num" the value of each guess * Line 15: void makeGuess{*->client}(Integer{*->client} num) What is {*->client} after method name? It's begin label, L_{Begin} Must have L_{PC} [= L_{Begin} to invoke method w. begin label L_{Begin} Why? Because method might exercise privileges to downgrade data What if we didn't have it? Imagine you had code like: if (secret <= 4) { makeGuess (0); } if (secret <= 7) { makeGuess (1); } Client could learn range of secret by initial value of tries. * Line 16: where authority(*), endorse({*<-*}) What does "where authority(*)" mean? Specifies code's privileges (can declassify & endorse clauses owned by *) What does "endorse({*<-*})"? "auto-endorse" adds this clause to the PC's label What breaks otherwise? Lines 25 & 30, modifying tries tries has *<-* in label, so must have L_{PC} [= {*<-*} to modify it * Line 21: endorse (i, {*<-client} to {*<-*}) First, what is label of i and how does compiler know this? i is assigned on line 20 (i = num.intValue ()) So compiler can compute L_i = (L_{num} LUB L_{PC}) L_{num} = {*->client}, L_{PC} = {*<-*} L_i = L_{num} LUB L_{PC} = {*->client} Note could have explicitly specified i's label But easier to have compiler infer it as LUB of all assignments and PCs What would happen if we dropped line 21? Line 22 "if (i...)" means in if bode L_{new_PC} = L_i LUB L_{old_PC} So would undo the effects of endorse on line 16 And assigning to tries would be illegal, so program wouldn't compile * Lines 24 and 30: declassify ({*->*} to {*->client}) { What do these do and why? Invokes previously declared authority (*) to allow relabeling of data What data is being relabeled? The PC Because of line 23 "if (... i == secret)", have L_{secret} [= L_{new_PC} Meaning L_{PC} contains clause *->* from secret What if we didn't remove this label from PC? Lines 25 & 30 would fail when writing tries as L_{PC} NOT[= L_{tries} L_{tries} = {*->client; *<-*} while L_{PC} = {*->*; *<-*} no clause in L_{tries} preserves the secrecy clause *->* in L_{PC} What is WebIL? Simpler output language w/o labels Just says for each variable and statement where it can be placed Possibilities for placement are: client, server, both Things placed on the server may also have high integrity And annotations give freedom using ? which means optional (see Table 1) E.g., C - must be on client, S - must be on server, CS - must be on both C?S - must be on server, can be on client C?S? - can be on client server, or both (must be on at least one) Adding h means high integrity (Sh, CSh, C?Sh) E.g., used for any line of code that might affect variable labeled {*<-*} Note low-integrity onClick causes high-integrity code in GuessANumber to run That's why we needed "endorse({*<-*})" on line 16 of Figure 2 What happens to WebIL? (See Figure 1) Split into two separate programs Try to minimize the number of messages we have to send and wait for Must send message when must run execution block on machine previous execution block didn't run on Turns out some fancy math gives us optimal solution in O(N^3) time Don't worry how math works, important point is result: Respects WebIL restrictions, minimizes messages What should we be asking for evaluation? - How easy to use? - Any performance costs? Performance? No graphs or performance numbers, just round trips - why? Should mostly be same cost as java code, but might be nice to see Unusual aspect is network round trips Would clearly be bad if Swift required gratuitous round trips Ease of use? Building client-server code is not super easy Anecdotal evidence Swift makes certain things easier E.g., change guessing game so secret is no longer a secret One line of code to change in swift vs. completely re-factoring application On the other hand, paper somewhat hard to read How many people here have written Ajax applications? How many people read paper and wanted to download & use swift?