PP

Parallel Architecture & Programing VIIII

15618 PP9 Transactional Memory

Posted by freeCookie🍪 on February 1, 2019

Parallel Architecture & Programing VIIII

嗯。。。马上就复习完了。感觉再次看这个课件就基本上都能看懂了,当时学的时候是有很多地方很难理解来着。可能这就是书读百遍其义自现的道理吧。当然也可能是我一紧张脑子都不转了,却还要坚持学习,好气哦:-) 看课件的时候也会想起当时,恨不得找个缝把愚蠢的自己给塞进去orz,苍天鸭。马上要过年啦,巧了么这不是,去年过年的时候我在写assign2呀,今年过年居然还是PP。(小老弟你怎么回事啊

Transactional Memory

Declarative vs. imperative abstractions

  • Declarative: programmer defines what should be done
    • Execute all these independent 1000 tasks
    • Perform this set of operations atomically
  • Imperative: programmer states how it should be done
    • Spawn N worker threads. Assign work to threads by removing work from a shared task queue
    • Acquire a lock, perform operations, release the lock

Terminology

  • Memory transaction 事务内存
    • An atomic and isolated sequence of memory accesses
    • Inspired by database transactions
  • Atomicity (all or nothing) 原子性
    • Upon transaction commit, all memory writes in transaction take effect at once
    • On transaction abort, none of the writes appear to take effect
    • Atomic { } ≠ lock() + unlock()
      • Atomic: high-level declaration of atomicity, does not specify implementation
      • Lock: low-level blocking primitive, does not provide atomicity or isolation on its own
  • Isolation 隔离
    • No other processor can observe writes before transaction commits
  • Serializability 可串行化
    • Transactions appear to commit in a single serial order
    • But the exact order of commits is not guaranteed by semantics of transaction

Load-linked, store conditional (LL/SC)

  • LL/SC is a light version of transactional memory
  • Pair of corresponding instructions
    • load_linked(x): load value from address
    • store_conditional(x, value): store value to x, if x hasn’t been written to since corresponding LL
  • Corresponding ARM instructions: LDREX and STREX

Advantages (promise) of transactional memory

  • Easy to use synchronization construct
    • Claim: transactions are as easy to use as coarse-grain locks
  • Often performs as well as fine-grained locks
    • Provides automatic read-read concurrency and fine-grained concurrency
    • Performance portability: locking scheme for four CPUs may not be the best scheme for 64 CPUs
  • Failure atomicity and recovery
    • No lost locks when a thread fails
    • Failure recovery = transaction abort + restart
  • Composability
    • Safe and scalable composition of software modules

Implementing transactional memory

TM implementation basics

  • TM systems must provide atomicity and isolation
    • Without sacrificing concurrency
  • Basic implementation requirements
    • Data versioning (ALLOWS transaction to abort)
    • Conflict detection and resolution (WHEN to abort)
  • Implementation options
    • Hardware transactional memory
    • Software transactional memory
    • Hybrid transactional memory
Data versioning
  • Manage uncommitted (new) and previously committed (old) versions of data for concurrent transactions.

  • Eager versioning (undo-log based)

    • Update memory location directly on write
    • Maintain undo information in a log
    • Good: faster commit
    • Bad: slower aborts, fault tolerance issues
  • Lazy versioning (write-buffer based)

    • Buffer data in a write buffer until commit
    • Update actual memory location on commit
    • Good: faster abort (just clear log), no fault tolerance issues
    • Bad: slower commits
Conflict detection
  • Must detect and handle conflicts between transactions
    • Read-write conflict: transaction A reads address X, which was written to by pending transaction B
    • Write-write conflict: transactions A and B are both pending, and both write to address X
  • System must track a transaction’s read set and write set
    • Read-set: addresses read within the transaction
    • Write-set: addresses written within the transaction
  • Pessimistic detection

    • Check for conflicts during loads or stores
    • “Contention manager” decides to stall or abort transaction when a conflict is detected
  • Optimistic detection

    • Detect conflicts when a transaction attempts to commit
    • On a conflict, give priority to committing transaction
    • Note: can use optimistic and pessimistic schemes together
  • Trade-offs

    • Pessimistic conflict detection (a.k.a. “eager”)
      • Good: Detect conflicts early (undo less work, turn some aborts to stalls)
      • Bad: no forward progress guarantees, more aborts in some cases
      • Bad: fine-grained communication (check on each load/store)
      • Bad: detection on critical path
    • Optimistic conflict detection (a.k.a.“lazy” or “commit”)
      • Good: forward progress guarantees
      • Good: bulk communication and conflict detection
      • Bad: detects conflicts late, can still have fairness problems
  • Granularity

    • Object granularity (SW-based techniques)
      • Good: reduced overhead (time/space)
      • Good: close to programmer’s reasoning
      • Bad: false sharing on large objects
    • Machine word granularity
      • Good: minimize false sharing
      • Bad: increased overhead
    • Cache-line granularity
      • Good: compromise between object and word
Hardware transactional memory (HTM)
  • Data versioning is implemented in caches
    • Cache the write buffer or the undo log
    • Add new cache line metadata to track transaction read set and write set
  • Conflict detection through cache coherence protocol
    • Coherence lookups detect conflicts between transactions
    • Works with snooping and directory coherence
  • Note: Register checkpoint must also be taken at transaction begin

基本上是概念上的东西。。。我还不如刷题去