Flow Sensitive Information Flow Analysis for C Programs

Thông tin tài liệu

Flow Sensitive Information Flow Analysis for C Programs Jun Furuse1 , Dzung Dinh-Khac2 , and Viet Ha Nguyen2 Graduate School of Information Science and Technology, the University of Tokyo A.N.Lab Joint Stock Company, Vietnam Abstract VITC compiler aims to provide information security to legacy C applications, using type based information flow analysis We have recently modified its typing discipline to flow sensitive, while those of the other realistic information secure compiler implementations for Java[5] and ML[8] are flow insensitive This is because local states in C are too frequently stored in global variables such as errno Introduction Language based information flow analysis verifies non-interference property of programs: roughly speaking, a non-interferent program cannot leak its secret information to attackers unintentionally It is a very strong measure against program security holes, however, information flow analysis has not been well applied to C language, which is one of the most popular targets for attackers who try to steal secrets The goal of our VITC compiler project is to secure existing C applications by providing this information flow based security to C language Our static/dynamic type systems track down information flow of a C program annotated by secrecy specifications and verify its non-interference property according to them In this short paper, after brief discussion on VITC’s key features, we explain our recent achievement: VITC’s flow sensitive information flow typing system, which is mandatory to analyze information flow of realistic C applications Such imperatively written programs often use global variables such as errno to store program states, which cannot be flow-insensitively typed well VITC type system overview VITC programs are annotated with security specifications using the lattice model of security levels[2], where the security level constants ℓ called labels form a lattice (L, ⊑): xRULE(L < H) int xSEC(L) l = 0; // specification of the security lattice int xSEC(H) h = 42; l = h + 1; Declaration xRULE(L < H) specifies the finite lattice L = {L, H} where L ⊑ H, L for the lower secrecy and H for the higher Using C type attribute syntax, macro xSEC(ℓ) specifies that variables l and h store information of lower(L) and higher(H) secrecy respectively Using these specifications, our static type system detects that the assignment l = h + illegally leaks the higher secrecy information derived from h to l of the lower secrecy3 So-called implicit flows are also tracked so that insecure codes like if(h){ l = 1;} can be properly rejected To make this type system track the information flow correctly, the programs must be compiled by memory safe C compilation[6, 7] Once memory-secured, C becomes a very imperative functional language Thus, our static type system is partially based on one for ML[8]: to handle functional aspects of C, C functions may have polymorphic security types, for example Even with the assumption of memory safety, static analyses of C programs are very hard, due to its type casts This is also true for information flow analysis, and some flows around type casts must be checked dynamically at run-time For example, when an expression e of type int is type-casted to a pointer type (int *)e, we may not be able to statically determine how secret its content is our compiler enforces programmers to write an explicit annotation here like (int xSEC(L)*)e to embed code which dynamically type-checks whether the result of e is a pointer to lower security information or not Even when these checks for memory safety and the dynamic typing fail, a VITC program must continue its execution in a failure oblivious manner[3], rather than simple fail-safe abortion It is since careless program termination may leak secrecy to whom observes the termination: for example, a termination of code execution if(h){ e; } at e gives a clue that h had a non-zero value Flow sensitive analysis 3.1 A motivating example: errno Until recently VITC type system was flow insensitive as [5, 8], that is, a variable has a fixed secrecy in different contexts It was acceptable as far as we typecheck very simple examples This breaks once we tried to compile more realistic applications with global variables like errno To demonstrate the problem, let us consider the following program: int errno; int main() { int xSEC(H) h; int xSEC(L) l; if(h) { errno = 1; } errno = 0; l = errno; } /* Global variable */ /* variable of higher secrecy */ /* variable of lower secrecy */ Let errno be of secrecy ℓ It is easily seen that after the if-statement, ℓ = H (or higher than H) due to the indirect information flow from h to errno Typing of a variable with an xSEC specification is done flow-insensitively In the flow insensitive information flow analysis, the secrecy of errno is fixed throughout the program Therefore, it reports an error for the assignment l = errno since a flow from H(higher) to L(lower) is forbidden However, in practice the example should be analyzed well since after the assignment errno = 0, the variable carries no information of higher secrecy In C, lacking modern language functionalities such as exceptions, such a global variable like errno is often used to store states which are just locally meaningful This kind of use of global variables for temporal states prevents secure programs from being typed with flow insensitive information flow analysis Flow sensitive information flow analysis gives a solution to this problem since variables can have different secrecy after assignments After the assignment errno = 0, the secrecy of errno can be lowered to L If the code between errno = and l = errno not raise the secrecy of errno, the last assignment does not raise any error since it just induce a flow from L to L In literature, there have been a number of approaches to flow-sensitive information flow analysis, e.g [1, 4] Although they give nice theoretical results, they not consider sub-functions, which are very common in C programs, and thus that makes them less practical We argue that our system, which is presented subsequently, is more useful as we allows for functions 3.2 The language Syntax For our formalization, we first define a small C-like language which supports global variables, conditionals, function declarations and function calls: e ::= | n | x|f | f (e) s ::= | | | | Expressions Constants Variables Function calls Statements skip Skip x := e Assignments s; s Sequences if e then s else s Conditionals t ::= int | char | d ::= | t x = n; Variable decls | tℓ x = n; Variables with specs | t f (t x) {s; return e; } Function decls p ::= d d Programs The language can have security level constants ℓ only at variable declarations tℓ x = n; (ex int xSEC(L) x = 0;) Such variables with levels give security specifications of a program, and their typing is flow insensitive while the others are typed flow sensitively Types, constraints, conditions, and subtyping Types τ in our type system is fully annotated with flow types λ, which is either a level constant ℓ or a type π variable α for polymorphism Functional type τ → τ is annotated with its effect π: the security lower bound of side effects inside the function: λ, π ::= Flow types | ℓ Level constants | α Type variables τ ::= | tλ π | τ →τ Mono-types t = int, char Functional types As we allow variables in types, it is necessary to have a formal manner to express the ordering relation between type variables and flow types In our system, it is represented by type constraints (or constraints for short) of the form k ::= λ1 ⊑ λ2 A set of constraints K forms a trivial constraint system and we write K ⊢ λ1 ⊑ λ2 when λ1 ⊑ λ2 is inferable from K Similarly to [4], to allow types to be flow-sensitive, the type of a variable must be able to ”vary”, i.e a variable may have different types before and after a statement (especially an assignment) is executed Therefore, to keep track of such types of variables during program execution, we must have conditions C, which are partial maps from variables x to mono-types τ Typing of statements must be annotated with respectively pre- and post-conditions in order to represent types of variables before and after the statement is executed Moreover, differently to [4], as we allow for function calls in expressions where functions may have different pre- and post-conditions, each expression is also annotated with a preand a post-condition with the same meaning Partial order between security levels ℓ is naturally extended to the following subtyping relationships between types and conditions The subtyping of function parameters is contra-variant: K ⊢ λ1 ⊑ λ′1 K ⊢ λ′2 ⊑ λ2 K ⊢ π′ ⊑ π K ⊢ λ ⊑ λ′ K ⊢ tλ ⊑ tλ ′ π λ′ π ′ λ′ K ⊢ tλ2 → tλ1 ⊑ t2 → t1 Dom(C1 ) = Dom(C2 ) ∀x ∈ Dom(C1 ) K ⊢ C1 (x) ⊑ C2 (x) K ⊢ C1 ⊑ C2 It now suffices to define polymorphic functional types of the form ∀α1 αn [K] π {Cpre } τ1 → τ2 {Cpost } Intuitively, it states that for all type variables αi ’s which satisfy the constraint set K, if the pre-condition is Cpre , then the corresponding π function works as typed τ1 → τ2 and modifies the condition to Cpost 3.3 Typing rules Our flow sensitive typing system depicted in Appendix A is a mixture and an extension of a flow sensitive type system for while programs [4] and a flow insensitive polymorphic constraint typing for ML [8] Type judgments for expressions and statements take the form of K, π, Γ ⊢ {C1 } · {C2 }, with two kinds of typing environments: C is a condition, a flow sensitive environment which memorizes the types of flow sensitive variables Γ is the flow insensitive counterpart, a partial mapping from variables to polymorphic types or mono-types, which is for functions and flow insensitive variables annotated with security specifications K is a set of constraints which are requirements between type variables and level constants for the judgment π is so-called “program counter” to denote the secrecy of program execution flow Unlike the based type systems, conditions and a program counter also appear in the judgment for expressions K, π, Γ ⊢ {C1 } e : τ {C2 }, since we suppose function calls with side effects may occur inside the expression e The core of flow sensitivity is the rule t-Asgn: types of flow sensitive vari′ ables tλ are modified after assignments, to those of assigned values tλ Thus errno can have different security levels at different point Apart from this rule, pre- and post-conditions must join correctly at each computation step On the other hand, assignments to flow insensitive variables with security specifications are typed not by t-Asgn but by t-AsgnInsens This is very similar to the classical flow insensitive typings: the type of the variable at an assignment must be equal to the assigned value and is never modified The type of a function is also flow insensitive, therefore it is also bound in Γ , with a polymorphic type Its polymorphic type is instantiated (t-Instantiate) then applied (t-FunCall) for each application independently, in order to achieve polymorphism The type instantiation S must be meaningful: S(K) must be satisfiable (|= S(K)) and must not contain contradictive constraints like H ⊑ L The judgment for declarations has a form of K0 , Γ, C0 ⊢ d Since all the definitions are declared at the top-level once and for all, we have no notion of the program counter nor pre- and post-conditions but the initial condition C0 The global constraints K0 are the constraints which must hold throughout the program and be satisfiable t-FunDecl is to type a function declaration Function body s; return e; must be typed under a constraint set K0 + Kα and the pre-condition extended for the function argument x The free variables α introduced in the typing of body are the targets of type generalization, and the extended part of the constraint set Kα must relate with the generalized variables α In the polymorphic type of the function, these generalized variables quantify the mono-type, pre- and post-conditions and the extended part of the constraint Kα Type inference and implementation issues The type inference algorithm is almost automatically obtained, using the typing rules bottom-up, then checking satisfiability of constraints A problem arises at function declarations, since t-FunDecl uses polymorphic recursion, which requires complex inference Currently our algorithm does not support polymorphic recursion: a recursive function is typed monomorphic inside its body Our implementation based on the formalization types various errno examples like one in Section 3.1 well Sometimes programmers are forced to lower the secrecy level of errno by inserting reset assignments like errno = 0, but it is easy and not comparable against the obtained information security Conclusion and future works Flow sensitive typing is required for information flow analysis for C programs, since they often use global variables such as errno in order to store states that are just locally meaningful We have formalized and implemented such a flow sensitive polymorphic information flow typing system for C We leave a formal proof for soundness of the system as a future work Frankly speaking, we believe that it is not very difficult to show that since the proof in [4] can be adjusted to one for our system Currently flow sensitivity is only permitted for pure integers, and pointers are typed flow insensitively Flow sensitivity for pointers is left as a future work, which will require detailed pointer analysis as pointed out in [1] References David Clark, Chris Hankin, and Sebastian Hunt Information flow for algol-like languages Computer Languages, 28(1), 2002 Dorothy E Denning A lattice model of secure information flow Commun ACM, 19(5):236–243, 1976 Martin Rinard et al Enhancing server availability and security through failureoblivious computing, December 2004 Sebastian Hunt and David Sands On flow-sensitive security types In Proc Principles of Programming Languages, 33rd Annual ACM SIGPLAN - SIGACT Symposium (POPL’06), pages 79–90, Charleston, South Carolina, USA, January 2006 ACM Press Andrew C Myers JFlow: Practical mostly-static information flow control In Symposium on Principles of Programming Languages, pages 228–241, 1999 George C Necula, Scott McPeak, and Westley Weimer CCured: type-safe retrofitting of legacy code In Symposium on Principles of Programming Languages, pages 128–139, 2002 Yutaka Oiwa, Tatsurou Sekiguchi, Eijiro Sumii, and Akinori Yonezawa Fail-safe ANSI-C compiler: An approach to making C programs secure (progress report), Fev 2003 Francois Pottier and Vincent Simonet Information flow inference for ML In Symposium on Principles of Programming Languages, pages 319–330, 2002 A Flow sensitive typing rules K, π, Γ ⊢ {C} n : tλn {C} (t-Const) Γ (f ) = ∀α[K] {C1 } τ {C2 } C(x) = τ or Γ (x) = τ K, π, Γ ⊢ {C} x : τ {C} |= S(K) S(K), π, Γ ⊢ {S(C1 )} f : S(τ ) {S(C2 )} (t-Instantiate) (t-Var) K, π, Γ ⊢ {C1 } e : τ {C2 } π K, π, Γ ⊢ {C2 } f : τ → τ ′ {C3 } K, π, Γ ⊢ {C1 } f (e) : τ ′ {C3 } (t-FunCall) K, π, Γ ⊢ {C1 } e : τ {C2 } K ⊢ τ ⊑ τ ′ K, π, Γ ⊢ {C1 } s {C2 } K ⊢ C1 ⊑ C1′ K ⊢ C2′ ⊑ C2 K ⊢ π ′ ⊑ π K ⊢ C1 ⊑ C1′ K ⊢ C2′ ⊑ C2 K ⊢ π ′ ⊑ π K, π ′ , Γ ⊢ {C1′ } e : τ ′ {C2′ } (t-SubExp) K, π, Γ ⊢ {C} skip {C} (t-Skip) K, π, Γ ⊢ {C1 } e : tλ {C2 } ′ K ⊢ π ⊑ λ C2 (x) = tλ C3 = C2 [x : tλ ] K, π, Γ ⊢ {C1 } x := e {C3 } (t-Asgn) K, π, Γ ⊢ {C1 } e : tλ {C2 } K ⊢ π′ ⊒ π ⊔ λ K, π ′ , Γ ⊢ {C1′ } s {C2′ } (t-SubStmt) K, π, Γ ⊢ {Ci } si {Ci+1 } i = 1, K, π, Γ ⊢ {C1 } s1 ; s2 {C3 } (t-Seq) K, π, Γ ⊢ {C1 } e : tλ {C2 } K⊢π⊑λ Γ (x) = tλ K, π, Γ ⊢ {C1 } x := e {C2 } (t-AsgnInsens) K, π ′ , Γ ⊢ {C2 } si {C3 } i = 1, K, π, Γ ⊢ {C1 } if e then s1 else s2 {C3 } (t-Cond) C(x) = tλn |= K0 K0 , Γ, C ⊢ tn x = n; (t-VarDecl) Γ (x) = tℓn |= K0 K0 , Γ, C ⊢ tℓn x = n; (t-VarDeclInsens) π ′ Γ (f ) = ∀α[Kα ].{C1 } tλ → t′λ {C3 } ′ π α = FV ({C1 } tλ → t′λ {C3 }) \ (FV (Γ ) ∪ FV (C)) K0 + Kα , π, Γ ⊢ {C1 [x : tλ ]} s {C2 } ′ K0 + Kα , π, Γ ⊢ {C2 } e : t′λ {C3 } |= K0 + Kα Dom(C1 ) ⊆ Dom(C) ∀k ∈ Kα FV (k) ∩ α = ∅ ∀k ∈ K0 FV (k) ∩ α = ∅ K0 , Γ, C ⊢ t′ f (t x) { s; return e; } (t-FunDecl) ... the indirect information flow from h to errno Typing of a variable with an xSEC specification is done flow- insensitively In the flow insensitive information flow analysis, the secrecy of errno... easy and not comparable against the obtained information security Conclusion and future works Flow sensitive typing is required for information flow analysis for C programs, since they often... variables for temporal states prevents secure programs from being typed with flow insensitive information flow analysis Flow sensitive information flow analysis gives a solution to this problem since

Ngày đăng: 18/12/2017, 05:23

Xem thêm: Flow Sensitive Information Flow Analysis for C Programs

Flow Sensitive Information Flow Analysis for C Programs

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan