Thor ==== What is a traditional database? Relational data model: Represent Store data in tuples (relations) Index tuples by primary key Secondary indexes index by other elements of the tuple Can do several operations on sets of tuples: Product: Concatenate every pair of tuples Take two sets, A = {a1, a2}, B = {b1, b2} Product is {a1b1, a1b2, a2b1, a2b2} Selection (restriction): Selects a subset (employees with salary < $50,000) Projection: Picks a subset of the fields of tuples, eliminating duplicates Join: (Combination of product, selection, and projection) E.g.: Produce list of pairs Example implementation: Use B-trees What is the object oriented data model? Much like object-oriented programming Objects have properties (fields), like data structures Objects have methods through which to access the fields Why is this very different from the relational model? couldn't you just build "methods" into relational database client? Inheritance Define multiple types of employee (manager, etc.) Can use an object without knowing exact type or even having code for methods at compile time Can enforce invariants better with methods in Database E.g., Inverse relations (supported by many OODBs): Employee has a "manager" property Manager has a "manages" set DB maintains them in sync automatically So can you just implement persistent objects in C++? What is C++/OS? Mmap big file, and use different malloc for persistent objects Database might need to be accessed from different applications/platforms Structures might be different on different architectures endianness different, sizeof (long) different Can't guarantee that virtual addresses (and thus pointers) will be the same Suppose two applications create two DBs at same virtual address Then a third application must access the two databases What about inheritance--code for virtual functions Should be able to introduce new types accessible by old applications A single database might be larger than your entire virtual address space Serious problem on 32-bit machines Need some form of garbage collection... Must avoid dangling pointers when freeing objects Need transactions How do you know type of a raw file (could be responsibility of application, but big potential for mistakes) Plus performance (as we will see in Thor paper) What is Thor? Persistent object system Provides a universe of persistent objects Objects implemented in Theta--a type safe language Universe has a "root object", from which all other objects accessible "Veneers" allow non-Theta code to invoke methods on Thor objects Client-server architecture Objects reside on servers Harp-stile replication ensures availability of servers Applications run on clients - provides scalability Clients cache objects so applications access cached copies It is possible to ship Theta code to server for complex queries What is the format of pointers to persistent objects? Two cases, cross-server and intra-server Intra-server "orefs" are 32- (actually 31-)bit quantities 22-bit Page ID (pid) - identifies 8K page on which object resides 9-bit Object ID (oid) - which object in page in being identified Cross-server references use surrogates Small local object with server ID and oref within remote server Page format Offset table at start maps oid to location in page Allows compaction of pages to avoid fragmentation Each object starts with oref to "class object." What is this? Contains type information about object--what are fields, etc. Used among other times when constructing object Is 22-bit pid enough? Can always have one physical machine implement multiple virtual servers Why are orefs only 31-bits? What is other bit used for? Pointer swizzling... what is this? Indirection table... why? How is concurrency control implemented? Implemented at object granularity (page granularity would give false sharing) Basic idea is to use optimistic scheme: Client does whatever it wants using cached copies of objects Server rejects transaction if it can't be serialized when committed Server keeps for each client an "invalid set" of invalidated objects Piggybacks invalidations on other messages May cause client to abort transactions on its own Server keeps validation queue (VQ) of ROS and MOS for each committing/committed transaction ROS - read object set MOS - modified object set CLOCC - clock-based lazy optimistic concurrency control Assume servers have roughtly synchronized clocks - Server receives commit request - Assigns xaction a timestamp -- - Server becomes coordinator for two-phase commit How do individual servers decide whether to validate a xaction? (sec 4.2) Say server is being asked to commit transaction T 0. Check against client's invalid set 1. For any earlier xaction S: If S's MOS & T's ROS != empty, abort if S is uncommitted Why? If S committed, invalid set ensures T read the latest version But if S uncommitted, T may have read older version of object, then S commits later and we are in trouble 2. For each later xaction S: a. If T's MOS intersects S's ROS, abort b. If T's ROS intersects S's MOS, abort if S is committed Why? T has read S's write (by invalid check), but is earlier Good news: In case 2, can retry commit with later timestamp Might just work, with no need to propagate abort back to application Use loosely synchronized timestamps to truncate VQ At client, keep an undo log, so old versions cached when xaction retried What are advantages and disadvantages of page vs. object caching? Page caching good when there is locality But wasteful of space, since >50% of a page might be unused objects What is solution: Hybrid adaptive caching (HAC) See fig. 3 High level idea: Cache both pages and objects For each page, track how many objects are live, and how live they are Compact pages with low percentage of hot objects Tracking liveness of objects Keep 4 bits per object Set high bit when object referenced Periodically add 1 and decay (shift right) Tracking utility of pages Frame usage value is T: liveness threshold above which objects will be retained (when cleaned) H: Fraction of objects that are hot at threshold T Require H < R, for "retention factor" R (e.g., 2/3) Generally choose T as minimum for which H < R How do you select a frame to clean? F is less valuable than G if: F.T < G.T or F.T == G.T && F.H < G.H Note optimization to avoid recalculating these all the time Keep list of candidate frames for cleaning, remove if survive 20 fetches What must server to write back an object? Read page from disk, change object, write page back to disk We know random I/O is slow... Solution: MOB - modified object buffer Keep a log of just modified objects (to make commits permanent) Keep modified objects in memory (to patch fetched pages) Write objects back in background Write absorption - Often multiple objects modified in same page Can do them all at once In fact, use larger segments to do this on larger than page granularity Performance Fig. 7 -- what is going on here? Thor requires less than half memory of C++/OS Smaller objects use less cache HAC caches small objects, not 8K pages with < 50% useful objects on them