Parallel Architecture & Programing VIIII
嗯。。。马上就复习完了。感觉再次看这个课件就基本上都能看懂了,当时学的时候是有很多地方很难理解来着。可能这就是书读百遍其义自现的道理吧。当然也可能是我一紧张脑子都不转了,却还要坚持学习,好气哦:-) 看课件的时候也会想起当时,恨不得找个缝把愚蠢的自己给塞进去orz,苍天鸭。马上要过年啦,巧了么这不是,去年过年的时候我在写assign2呀,今年过年居然还是PP。(小老弟你怎么回事啊
Transactional Memory
Declarative vs. imperative abstractions
- Declarative: programmer defines what should be done
- Execute all these independent 1000 tasks
- Perform this set of operations atomically
- Imperative: programmer states how it should be done
- Spawn N worker threads. Assign work to threads by removing work from a shared task queue
- Acquire a lock, perform operations, release the lock
Terminology
- Memory transaction 事务内存
- An atomic and isolated sequence of memory accesses
- Inspired by database transactions
- Atomicity (all or nothing) 原子性
- Upon transaction commit, all memory writes in transaction take effect at once
- On transaction abort, none of the writes appear to take effect
- Atomic { } ≠ lock() + unlock()
- Atomic: high-level declaration of atomicity, does not specify implementation
- Lock: low-level blocking primitive, does not provide atomicity or isolation on its own
- Isolation 隔离
- No other processor can observe writes before transaction commits
- Serializability 可串行化
- Transactions appear to commit in a single serial order
- But the exact order of commits is not guaranteed by semantics of transaction
Load-linked, store conditional (LL/SC)
- LL/SC is a light version of transactional memory
- Pair of corresponding instructions
- load_linked(x): load value from address
- store_conditional(x, value): store value to x, if x hasn’t been written to since corresponding LL
- Corresponding ARM instructions: LDREX and STREX
Advantages (promise) of transactional memory
- Easy to use synchronization construct
- Claim: transactions are as easy to use as coarse-grain locks
- Often performs as well as fine-grained locks
- Provides automatic read-read concurrency and fine-grained concurrency
- Performance portability: locking scheme for four CPUs may not be the best scheme for 64 CPUs
- Failure atomicity and recovery
- No lost locks when a thread fails
- Failure recovery = transaction abort + restart
- Composability
- Safe and scalable composition of software modules
Implementing transactional memory
TM implementation basics
- TM systems must provide atomicity and isolation
- Without sacrificing concurrency
- Basic implementation requirements
- Data versioning (ALLOWS transaction to abort)
- Conflict detection and resolution (WHEN to abort)
- Implementation options
- Hardware transactional memory
- Software transactional memory
- Hybrid transactional memory
Data versioning
-
Manage uncommitted (new) and previously committed (old) versions of data for concurrent transactions.
-
Eager versioning (undo-log based)
- Update memory location directly on write
- Maintain undo information in a log
- Good: faster commit
- Bad: slower aborts, fault tolerance issues
-
Lazy versioning (write-buffer based)
- Buffer data in a write buffer until commit
- Update actual memory location on commit
- Good: faster abort (just clear log), no fault tolerance issues
- Bad: slower commits
Conflict detection
- Must detect and handle conflicts between transactions
- Read-write conflict: transaction A reads address X, which was written to by pending transaction B
- Write-write conflict: transactions A and B are both pending, and both write to address X
- System must track a transaction’s read set and write set
- Read-set: addresses read within the transaction
- Write-set: addresses written within the transaction
-
Pessimistic detection
- Check for conflicts during loads or stores
- “Contention manager” decides to stall or abort transaction when a conflict is detected
-
Optimistic detection
- Detect conflicts when a transaction attempts to commit
- On a conflict, give priority to committing transaction
- Note: can use optimistic and pessimistic schemes together
-
Trade-offs
- Pessimistic conflict detection (a.k.a. “eager”)
- Good: Detect conflicts early (undo less work, turn some aborts to stalls)
- Bad: no forward progress guarantees, more aborts in some cases
- Bad: fine-grained communication (check on each load/store)
- Bad: detection on critical path
- Optimistic conflict detection (a.k.a.“lazy” or “commit”)
- Good: forward progress guarantees
- Good: bulk communication and conflict detection
- Bad: detects conflicts late, can still have fairness problems
- Pessimistic conflict detection (a.k.a. “eager”)
-
Granularity
- Object granularity (SW-based techniques)
- Good: reduced overhead (time/space)
- Good: close to programmer’s reasoning
- Bad: false sharing on large objects
- Machine word granularity
- Good: minimize false sharing
- Bad: increased overhead
- Cache-line granularity
- Good: compromise between object and word
- Object granularity (SW-based techniques)
Hardware transactional memory (HTM)
- Data versioning is implemented in caches
- Cache the write buffer or the undo log
- Add new cache line metadata to track transaction read set and write set
- Conflict detection through cache coherence protocol
- Coherence lookups detect conflicts between transactions
- Works with snooping and directory coherence
- Note: Register checkpoint must also be taken at transaction begin
基本上是概念上的东西。。。我还不如刷题去