Quota design documentation (2.6.36)
{Back to main page}
{Back to quota page}
Table of Contents
Intro
Current Linux quota implementation has been written about 10 year ago. It satisfy just basic smp correctness semantics(deadlock avoidance). Code contains several global locks which protects data. As result it has awful smp scalability. It is almost useless to use quota on systems with more than 8 CPU. This project is aimed to rewrite quota code to make it more scalable. This paper contains guide for accomplish this task. The paper divided in to three parts:
- Current locks and locking rules
- It is almost impossible to understand real locking rules from code comments. So I've collected updated list of locks, and table of locks for each quota function.
- Definition of most contented locks
- Definition of most contented locks
- Proposed locking changes
- This part was moved to patchseries page.
Current locks and locking rules (2.6.36)
Locks
- dq_list_lock
- protects all lists with quotas and quota formats dqstats structure containing statistics about the lists, dqstats structure(madness)
- dq_data_lock
- protects data from dq_dqb and also mem_dqinfo structures and also guards consistency of dquot->dq_dqb with inode->i_blocks, i_bytes.
- dq_state_lock protects
- modifications of quota state (on quotaon and quotaoff)
- dqonoff_sem
- protect from on/off race
- dqptr_sem (per sb rw sem)
- Any operation working on dquots via inode pointers must hold it.
- dq_lock (per dquot mutex)
- dquot is locked only when it is being read to memory
- dqio_mutex (per dquot mutex)
- per sb io mutex.
- i_mutex (prer quota file)
- each io operation required this mutex, in case of journalled quota all operations result in quota write.
Functions and lock table
*table legend*
- dat :: dq_data_lock
- drt :: mark_quota_dirty() is called form this function.
- number :: lock order
- g :: guarded by this lock, caller must hold it.
- INLK :: inode_lock
- in dqptr :: WR = down_write, rd = down_read
- List manipulation in lst column
*First char == an action,
- a :: add in to a list
- r :: remove from a list
- t :: traverse a list
*Second char == a list
- i :: inuse_list
- d :: dirty_list
- f :: free_list
- F :: tofree_head (dq_free) in remove_inode_dquot_ref()
|---------------------------+-----+----+-----+----+----+----+---+---|
| | | | | | | On | d | d |
| | | | | qd | dq | Of | a | r |
| func name | lst | st | ptr | lk | io | | t | t |
|---------------------------+-----+----+-----+----+----+----+---+---|
| wait_on_dquot | | | | 1 | | | | |
| dquot_mark_dquot_dirty | ad | | | | | | | |
| clear_dquot_dirty | grd | | | | | | | |
| dquot_acquire | | | | 1 | 2 | | | |
| dquot_commit cl drt | 2rd | | | | 1 | | | |
| dquot_release | | | | 1 | 2 | | | |
| invalidate_dquots INLK | | | | | | | | |
| dquot_scan_active (ocfs2) | 2ti | | | | | 1 | | |
| vfs_quota_sync ->wr_dq | 2td | | | | | 1 | | |
| shrink_dqcache_memory | dif | | | | | | | |
| dqput lts_l: dq_count | 1 | | | | | | | |
| dqget lst_l: dqhash | 1 | 2 | | | | | | |
| dquot_initialize ->dqget | | | ?WR | | | | | |
| add_dquot_ref / INLK | | | | | | g | | |
| remove_inode_dquot_ref | 1aF | | gWR | | | | | |
| remove_dquot_ref / INLK | | | | | | | | |
| drop_dquot_ref / | | | WR | | | | | |
| dquot_drop ->dqput | | | WR | | | | | |
| dquot_transfer ->dqget | | | WR2 | | | | 1 | 1 |
| dquot_commit_info | | | | | 1 | | | |
| vfs_quota_disable | | 2 | | | | 1 | | |
| vfs_load_quota_inode | | 3 | WR2 | | 3 | 1 | | |
| | | | | | | | | |
| __dquot_alloc_space [as] | | | | | | | 1 | |
| dquot_alloc_space ->as | | | rd | | | | | 1 |
| dquot_reserve_space ->as | | | rd | | | | | |
| dquot_alloc_inode | | | rd | | | | 1 | 1 |
| dquot_claim_space | | | rd | | | | 1 | 1 |
| dquot_release_rsrv_spc | | | rd | | | | 1 | |
| dquot_free_space | | | rd | | | | 1 | 1 |
| dquot_free_inode | | | rd | | | | 1 | 1 |
|---------------------------+-----+----+-----+----+----+----+---+---|
Most contented locks chart
- i_mutex
-
This is most annoying lock in case of journalled quota.
Just think about it. Then several concurrent users perform write.
they have to getting in to sleep on i_mutex. CRAP
vfs_dq_alloc_space()->mark_quota_dirty()->write_quot() - dqio_mutex
- Same as previous, per sb io mutex.
- dqptr_sem
- Currently this acquired on write during vfs_dq_init() In fact many vfs functions call this callback, for example (open(for write), truncate, unlink, link, etc).
- dq_data_lock
- Problem with this lock is what it is global. It must protect just given dquot and inode's bytes struct.
- dq_list_lock
- Again the lock is global, proper code reorganization will help.
- dq_state_lock
- Why do we need it? we already have dqonoff_sem, This is overwhelming.