PostgreSQL源码解读(143)-BufferManager#8(BufTableHashCode函数)

本节简单介绍了PostgreSQL缓存管理(Buffer Manager)中的实现函数ReadBuffer_common->BufferAlloc->BufTableHashCode,该函数根据BufferTag计算Hash Code。

创新互联专注于企业营销型网站、网站重做改版、阆中网站定制设计、自适应品牌网站建设、html5商城开发、集团公司官网建设、成都外贸网站建设公司、高端网站制作、响应式网页设计等建站业务,价格优惠性价比高,为阆中等各大城市提供网站开发制作服务。

一、数据结构

BufferDesc
共享缓冲区的共享描述符(状态)数据


/*
 * Flags for buffer descriptors
 * buffer描述器标记
 *
 * Note: TAG_VALID essentially means that there is a buffer hashtable
 * entry associated with the buffer's tag.
 * 注意:TAG_VALID本质上意味着有一个与缓冲区的标记相关联的缓冲区散列表条目。
 */
//buffer header锁定
#define BM_LOCKED               (1U << 22)  /* buffer header is locked */
//数据需要写入(标记为DIRTY)
#define BM_DIRTY                (1U << 23)  /* data needs writing */
//数据是有效的
#define BM_VALID                (1U << 24)  /* data is valid */
//已分配buffer tag
#define BM_TAG_VALID            (1U << 25)  /* tag is assigned */
//正在R/W
#define BM_IO_IN_PROGRESS       (1U << 26)  /* read or write in progress */
//上一个I/O出现错误
#define BM_IO_ERROR             (1U << 27)  /* previous I/O failed */
//开始写则变DIRTY
#define BM_JUST_DIRTIED         (1U << 28)  /* dirtied since write started */
//存在等待sole pin的其他进程
#define BM_PIN_COUNT_WAITER     (1U << 29)  /* have waiter for sole pin */
//checkpoint发生,必须刷到磁盘上
#define BM_CHECKPOINT_NEEDED    (1U << 30)  /* must write for checkpoint */
//持久化buffer(不是unlogged或者初始化fork)
#define BM_PERMANENT            (1U << 31)  /* permanent buffer (not unlogged,
                                             * or init fork) */
/*
 *  BufferDesc -- shared descriptor/state data for a single shared buffer.
 *  BufferDesc -- 共享缓冲区的共享描述符(状态)数据
 *
 * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change
 * the tag, state or wait_backend_pid fields.  In general, buffer header lock
 * is a spinlock which is combined with flags, refcount and usagecount into
 * single atomic variable.  This layout allow us to do some operations in a
 * single atomic operation, without actually acquiring and releasing spinlock;
 * for instance, increase or decrease refcount.  buf_id field never changes
 * after initialization, so does not need locking.  freeNext is protected by
 * the buffer_strategy_lock not buffer header lock.  The LWLock can take care
 * of itself.  The buffer header lock is *not* used to control access to the
 * data in the buffer!
 * 注意:必须持有Buffer header锁(BM_LOCKED标记)才能检查或修改tag/state/wait_backend_pid字段.
 * 通常来说,buffer header lock是spinlock,它与标记位/参考计数/使用计数组合到单个原子变量中.
 * 这个布局设计允许我们执行原子操作,而不需要实际获得或者释放spinlock(比如,增加或者减少参考计数).
 * buf_id字段在初始化后不会出现变化,因此不需要锁定.
 * freeNext通过buffer_strategy_lock锁而不是buffer header lock保护.
 * LWLock可以很好的处理自己的状态.
 * 务请注意的是:buffer header lock不用于控制buffer中的数据访问!
 *
 * It's assumed that nobody changes the state field while buffer header lock
 * is held.  Thus buffer header lock holder can do complex updates of the
 * state variable in single write, simultaneously with lock release (cleaning
 * BM_LOCKED flag).  On the other hand, updating of state without holding
 * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag
 * is not set.  Atomic increment/decrement, OR/AND etc. are not allowed.
 * 假定在持有buffer header lock的情况下,没有人改变状态字段.
 * 持有buffer header lock的进程可以执行在单个写操作中执行复杂的状态变量更新,
 *   同步的释放锁(清除BM_LOCKED标记).
 * 换句话说,如果没有持有buffer header lock的状态更新,会受限于CAS,
 *   这种情况下确保BM_LOCKED没有被设置.
 * 比如原子的增加/减少(AND/OR)等操作是不允许的.
 *
 * An exception is that if we have the buffer pinned, its tag can't change
 * underneath us, so we can examine the tag without locking the buffer header.
 * Also, in places we do one-time reads of the flags without bothering to
 * lock the buffer header; this is generally for situations where we don't
 * expect the flag bit being tested to be changing.
 * 一种例外情况是如果我们已有buffer pinned,该buffer的tag不能改变(在本进程之下),
 *   因此不需要锁定buffer header就可以检查tag了.
 * 同时,在执行一次性的flags读取时不需要锁定buffer header.
 * 这种情况通常用于我们不希望正在测试的flag bit将被改变.
 *
 * We can't physically remove items from a disk page if another backend has
 * the buffer pinned.  Hence, a backend may need to wait for all other pins
 * to go away.  This is signaled by storing its own PID into
 * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER.  At present,
 * there can be only one such waiter per buffer.
 * 如果其他进程有buffer pinned,那么进程不能物理的从磁盘页面中删除items.
 * 因此,后台进程需要等待其他pins清除.这可以通过存储它自己的PID到wait_backend_pid中,
 *   并设置标记位BM_PIN_COUNT_WAITER.
 * 目前,每个缓冲区只能由一个等待进程.
 *
 * We use this same struct for local buffer headers, but the locks are not
 * used and not all of the flag bits are useful either. To avoid unnecessary
 * overhead, manipulations of the state field should be done without actual
 * atomic operations (i.e. only pg_atomic_read_u32() and
 * pg_atomic_unlocked_write_u32()).
 * 本地缓冲头部使用同样的结构,但并不需要使用locks,而且并不是所有的标记位都使用.
 * 为了避免不必要的负载,状态域的维护不需要实际的原子操作
 * (比如只有pg_atomic_read_u32() and pg_atomic_unlocked_write_u32())
 *
 * Be careful to avoid increasing the size of the struct when adding or
 * reordering members.  Keeping it below 64 bytes (the most common CPU
 * cache line size) is fairly important for performance.
 * 在增加或者记录成员变量时,小心避免增加结构体的大小.
 * 保持结构体大小在64字节内(通常的CPU缓存线大小)对于性能是非常重要的.
 */
typedef struct BufferDesc
{
    //buffer tag
    BufferTag   tag;            /* ID of page contained in buffer */
    //buffer索引编号(0开始),指向相应的buffer pool slot
    int         buf_id;         /* buffer's index number (from 0) */
    /* state of the tag, containing flags, refcount and usagecount */
    //tag状态,包括flags/refcount和usagecount
    pg_atomic_uint32 state;
    //pin-count等待进程ID
    int         wait_backend_pid;   /* backend PID of pin-count waiter */
    //空闲链表链中下一个空闲的buffer
    int         freeNext;       /* link in freelist chain */
    //缓冲区内容锁
    LWLock      content_lock;   /* to lock access to buffer contents */
} BufferDesc;

BufferTag
Buffer tag标记了buffer存储的是磁盘中哪个block


/*
 * Buffer tag identifies which disk block the buffer contains.
 * Buffer tag标记了buffer存储的是磁盘中哪个block
 *
 * Note: the BufferTag data must be sufficient to determine where to write the
 * block, without reference to pg_class or pg_tablespace entries.  It's
 * possible that the backend flushing the buffer doesn't even believe the
 * relation is visible yet (its xact may have started before the xact that
 * created the rel).  The storage manager must be able to cope anyway.
 * 注意:BufferTag必须足以确定如何写block而不需要参照pg_class或者pg_tablespace数据字典信息.
 * 有可能后台进程在刷新缓冲区的时候深圳不相信关系是可见的(事务可能在创建rel的事务之前).
 * 存储管理器必须可以处理这些事情.
 *
 * Note: if there's any pad bytes in the struct, INIT_BUFFERTAG will have
 * to be fixed to zero them, since this struct is used as a hash key.
 * 注意:如果在结构体中有填充的字节,INIT_BUFFERTAG必须将它们固定为零,因为这个结构体用作散列键.
 */
typedef struct buftag
{
    //物理relation标识符
    RelFileNode rnode;          /* physical relation identifier */
    ForkNumber  forkNum;
    //相对于relation起始的块号
    BlockNumber blockNum;       /* blknum relative to begin of reln */
} BufferTag;

HTAB
哈希表的顶层控制结构.


/*
 * Top control structure for a hashtable --- in a shared table, each backend
 * has its own copy (OK since no fields change at runtime)
 * 哈希表的顶层控制结构.
 * 在这个共享哈希表中,每一个后台进程都有自己的拷贝
 * (之所以没有问题是因为fork出来后,在运行期没有字段会变化)
 */
struct HTAB
{
    //指向共享的控制信息
    HASHHDR    *hctl;           /* => shared control information */
    //段开始目录
    HASHSEGMENT *dir;           /* directory of segment starts */
    //哈希函数
    HashValueFunc hash;         /* hash function */
    //哈希键比较函数
    HashCompareFunc match;      /* key comparison function */
    //哈希键拷贝函数
    HashCopyFunc keycopy;       /* key copying function */
    //内存分配器
    HashAllocFunc alloc;        /* memory allocator */
    //内存上下文
    MemoryContext hcxt;         /* memory context if default allocator used */
    //表名(用于错误信息)
    char       *tabname;        /* table name (for error messages) */
    //如在共享内存中,则为T
    bool        isshared;       /* true if table is in shared memory */
    //如为T,则固定大小不能扩展
    bool        isfixed;        /* if true, don't enlarge */
    /* freezing a shared table isn't allowed, so we can keep state here */
    //不允许冻结共享表,因此这里会保存相关状态
    bool        frozen;         /* true = no more inserts allowed */
    /* We keep local copies of these fixed values to reduce contention */
    //保存这些固定值的本地拷贝,以减少冲突
    //哈希键长度(以字节为单位)
    Size        keysize;        /* hash key length in bytes */
    //段大小,必须为2的幂
    long        ssize;          /* segment size --- must be power of 2 */
    //段偏移,ssize的对数
    int         sshift;         /* segment shift = log2(ssize) */
};
/*
 * Header structure for a hash table --- contains all changeable info
 * 哈希表的头部结构 -- 存储所有可变信息
 *
 * In a shared-memory hash table, the HASHHDR is in shared memory, while
 * each backend has a local HTAB struct.  For a non-shared table, there isn't
 * any functional difference between HASHHDR and HTAB, but we separate them
 * anyway to share code between shared and non-shared tables.
 * 在共享内存哈希表中,HASHHDR位于共享内存中,每一个后台进程都有一个本地HTAB结构.
 * 对于非共享哈希表,HASHHDR和HTAB没有任何功能性的不同,
 * 但无论如何,我们还是把它们区分为共享和非共享表.
 */
struct HASHHDR
{
    /*
     * The freelist can become a point of contention in high-concurrency hash
     * tables, so we use an array of freelists, each with its own mutex and
     * nentries count, instead of just a single one.  Although the freelists
     * normally operate independently, we will scavenge entries from freelists
     * other than a hashcode's default freelist when necessary.
     * 在高并发的哈希表中,空闲链表会成为竞争热点,因此我们使用空闲链表数组,
     *   数组中的每一个元素都有自己的mutex和条目统计,而不是使用一个.
     *
     * If the hash table is not partitioned, only freeList[0] is used and its
     * spinlock is not used at all; callers' locking is assumed sufficient.
     * 如果哈希表没有分区,那么只有freelist[0]元素是有用的,自旋锁没有任何用处;
     * 调用者锁定被认为已足够OK.
     */
    FreeListData freeList[NUM_FREELISTS];
    /* These fields can change, but not in a partitioned table */
    //这些域字段可以改变,但不适用于分区表
    /* Also, dsize can't change in a shared table, even if unpartitioned */
    //同时,就算是非分区表,共享表的dsize也不能改变
    //目录大小
    long        dsize;          /* directory size */
    //已分配的段大小(<= dbsize)
    long        nsegs;          /* number of allocated segments (<= dsize) */
    //正在使用的最大桶ID
    uint32      max_bucket;     /* ID of maximum bucket in use */
    //进入整个哈希表的模掩码
    uint32      high_mask;      /* mask to modulo into entire table */
    //进入低于半个哈希表的模掩码
    uint32      low_mask;       /* mask to modulo into lower half of table */
    /* These fields are fixed at hashtable creation */
    //下面这些字段在哈希表创建时已固定
    //哈希键大小(以字节为单位)
    Size        keysize;        /* hash key length in bytes */
    //所有用户元素大小(以字节为单位)
    Size        entrysize;      /* total user element size in bytes */
    //分区个数(2的幂),或者为0
    long        num_partitions; /* # partitions (must be power of 2), or 0 */
    //目标的填充因子
    long        ffactor;        /* target fill factor */
    //如目录是固定大小,则该值为dsize的上限值
    long        max_dsize;      /* 'dsize' limit if directory is fixed size */
    //段大小,必须是2的幂
    long        ssize;          /* segment size --- must be power of 2 */
    //端偏移,ssize的对数
    int         sshift;         /* segment shift = log2(ssize) */
    //一次性分配的条目个数
    int         nelem_alloc;    /* number of entries to allocate at once */
#ifdef HASH_STATISTICS
    /*
     * Count statistics here.  NB: stats code doesn't bother with mutex, so
     * counts could be corrupted a bit in a partitioned table.
     * 统计信息.
     * 注意:统计相关的代码不会影响mutex,因此对于分区表,统计可能有一点点问题
     */
    long        accesses;
    long        collisions;
#endif
};
/*
 * HASHELEMENT is the private part of a hashtable entry.  The caller's data
 * follows the HASHELEMENT structure (on a MAXALIGN'd boundary).  The hash key
 * is expected to be at the start of the caller's hash entry data structure.
 * HASHELEMENT是哈希表条目的私有部分.
 * 调用者的数据按照HASHELEMENT结构组织(位于MAXALIGN的边界).
 * 哈希键应位于调用者hash条目数据结构的开始位置.
 */
typedef struct HASHELEMENT
{
    //链接到相同桶中的下一个条目
    struct HASHELEMENT *link;   /* link to next entry in same bucket */
    //该条目的哈希函数结果
    uint32      hashvalue;      /* hash function result for this entry */
} HASHELEMENT;
/* Hash table header struct is an opaque type known only within dynahash.c */
//哈希表头部结构,非透明类型,用于dynahash.c
typedef struct HASHHDR HASHHDR;
/* Hash table control struct is an opaque type known only within dynahash.c */
//哈希表控制结构,非透明类型,用于dynahash.c
typedef struct HTAB HTAB;
/* Parameter data structure for hash_create */
//hash_create使用的参数数据结构
/* Only those fields indicated by hash_flags need be set */
//根据hash_flags标记设置相应的字段
typedef struct HASHCTL
{
    //分区个数(必须是2的幂)
    long        num_partitions; /* # partitions (must be power of 2) */
    //段大小
    long        ssize;          /* segment size */
    //初始化目录大小
    long        dsize;          /* (initial) directory size */
    //dsize上限
    long        max_dsize;      /* limit to dsize if dir size is limited */
    //填充因子
    long        ffactor;        /* fill factor */
    //哈希键大小(字节为单位)
    Size        keysize;        /* hash key length in bytes */
    //参见上述数据结构注释
    Size        entrysize;      /* total user element size in bytes */
    //
    HashValueFunc hash;         /* hash function */
    HashCompareFunc match;      /* key comparison function */
    HashCopyFunc keycopy;       /* key copying function */
    HashAllocFunc alloc;        /* memory allocator */
    MemoryContext hcxt;         /* memory context to use for allocations */
    //共享内存中的哈希头部结构地址
    HASHHDR    *hctl;           /* location of header in shared mem */
} HASHCTL;
/* A hash bucket is a linked list of HASHELEMENTs */
//哈希桶是HASHELEMENTs链表
typedef HASHELEMENT *HASHBUCKET;
/* A hash segment is an array of bucket headers */
//hash segment是桶数组
typedef HASHBUCKET *HASHSEGMENT;
/*
 * Hash functions must have this signature.
 * Hash函数必须有它自己的标识
 */
typedef uint32 (*HashValueFunc) (const void *key, Size keysize);
 /*
 * Key comparison functions must have this signature.  Comparison functions
 * return zero for match, nonzero for no match.  (The comparison function
 * definition is designed to allow memcmp() and strncmp() to be used directly
 * as key comparison functions.)
 * 哈希键对比函数必须有自己的标识.
 * 如匹配则对比函数返回0,不匹配返回非0.
 * (对比函数定义被设计为允许在对比键值时可直接使用memcmp()和strncmp())
 */
typedef int (*HashCompareFunc) (const void *key1, const void *key2,
 Size keysize);
 /*
 * Key copying functions must have this signature.  The return value is not
 * used.  (The definition is set up to allow memcpy() and strlcpy() to be
 * used directly.)
 * 键拷贝函数必须有自己的标识.
 * 返回值无用.
 */
typedef void *(*HashCopyFunc) (void *dest, const void *src, Size keysize);
/*
 * Space allocation function for a hashtable --- designed to match malloc().
 * Note: there is no free function API; can't destroy a hashtable unless you
 * use the default allocator.
 * 哈希表的恐惧分配函数 -- 被设计为与malloc()函数匹配.
 * 注意:这里没有释放函数API;不能销毁哈希表,除非使用默认的分配器.
 */
typedef void *(*HashAllocFunc) (Size request);

FreeListData
在一个分区哈希表中,每一个空闲链表与特定的hashcodes集合相关,通过下面的FREELIST_IDX()宏进行定义.
nentries跟踪有这些hashcodes的仍存活的hashtable条目个数.


/*
 * Per-freelist data.
 * 空闲链表数据.
 *
 * In a partitioned hash table, each freelist is associated with a specific
 * set of hashcodes, as determined by the FREELIST_IDX() macro below.
 * nentries tracks the number of live hashtable entries having those hashcodes
 * (NOT the number of entries in the freelist, as you might expect).
 * 在一个分区哈希表中,每一个空闲链表与特定的hashcodes集合相关,通过下面的FREELIST_IDX()宏进行定义.
 * nentries跟踪有这些hashcodes的仍存活的hashtable条目个数.
 * (注意不要搞错,不是空闲的条目个数)
 *
 * The coverage of a freelist might be more or less than one partition, so it
 * needs its own lock rather than relying on caller locking.  Relying on that
 * wouldn't work even if the coverage was the same, because of the occasional
 * need to "borrow" entries from another freelist; see get_hash_entry().
 * 空闲链表的覆盖范围可能比一个分区多或少,因此需要自己的锁而不能仅仅依赖调用者的锁.
 * 依赖调用者锁在覆盖面一样的情况下也不会起效,因为偶尔需要从另一个自由列表“借用”条目,详细参见get_hash_entry()
 *
 * Using an array of FreeListData instead of separate arrays of mutexes,
 * nentries and freeLists helps to reduce sharing of cache lines between
 * different mutexes.
 * 使用FreeListData数组而不是一个独立的mutexes,nentries和freelists数组有助于减少不同mutexes之间的缓存线共享.
 */
typedef struct
{
    //该空闲链表的自旋锁
    slock_t     mutex;          /* spinlock for this freelist */
    //相关桶中的条目个数
    long        nentries;       /* number of entries in associated buckets */
    //空闲元素链
    HASHELEMENT *freeList;      /* chain of free elements */
} FreeListData;

BufferLookupEnt


/* entry for buffer lookup hashtable */
//检索hash表的条目
typedef struct
{
    //磁盘page的tag
    BufferTag   key;            /* Tag of a disk page */
    //相关联的buffer ID
    int         id;             /* Associated buffer ID */
} BufferLookupEnt;

二、源码解读

BufTableHashCode
BufTableHashCode函数根据BufferTag计算Hash Code,主要的逻辑在函数hash_any中实现


/*
 * BufTableHashCode
 *      Compute the hash code associated with a BufferTag
 * 根据BufferTag计算Hash Code
 *
 * This must be passed to the lookup/insert/delete routines along with the
 * tag.  We do it like this because the callers need to know the hash code
 * in order to determine which buffer partition to lock, and we don't want
 * to do the hash computation twice (hash_any is a bit slow).
 * 该函数的返回值需要作为参数传递给与tag相关的检索/插入/删除处理过程.
 * 之所以这样处理是因为调用者需要知道Hash Code以便确定那个buffer partition需要锁定,
 *   而且我们不希望多次计算hash(hash_any有一点点慢).
 */
uint32
BufTableHashCode(BufferTag *tagPtr)
{
    return get_hash_value(SharedBufHash, (void *) tagPtr);
}
/*
 * get_hash_value -- exported routine to calculate a key's hash value
 * get_hash_value -- exported过程,用于计算key的hash值
 *
 * We export this because for partitioned tables, callers need to compute
 * the partition number (from the low-order bits of the hash value) before
 * searching.
 * 之所以export这个过程是因为分区表,调用者需要在搜索前计算分区编号(根据hash值的lower-order bits)
 */
uint32
get_hash_value(HTAB *hashp, const void *keyPtr)
{
    return hashp->hash(keyPtr, hashp->keysize);
}
 /*
 * tag_hash: hash function for fixed-size tag values
 * tag_hash:固定tag大小的hash函数
 */
uint32
tag_hash(const void *key, Size keysize)
{
    return DatumGetUInt32(hash_any((const unsigned char *) key,
                                   (int) keysize));
}
 /*
 * DatumGetUInt32
 *      Returns 32-bit unsigned integer value of a datum.
 * DatumGetUInt32返回datum的32位无符号整型值
 */
#define DatumGetUInt32(X) ((uint32) (X))

hash_any
hash_any函数hash一个可变长键值到一个32位值


/*
 * hash_any() -- hash a variable-length key into a 32-bit value
 *      k       : the key (the unaligned variable-length array of bytes)
 *      len     : the length of the key, counting by bytes
 * hash_any() -- hash一个可变长键值到一个32位值.
 *      k       : the key(未对齐的可变长字节数组)
 *
 * Returns a uint32 value.  Every bit of the key affects every bit of
 * the return value.  Every 1-bit and 2-bit delta achieves avalanche.
 * About 6*len+35 instructions. The best hash table sizes are powers
 * of 2.  There is no need to do mod a prime (mod is sooo slow!).
 * If you need less than 32 bits, use a bitmask.
 * 返回无符号32位整型值.
 * key的每一位都会影响返回值的每一位.每一个1位和2位增量都会产生雪崩.
 * 大概有6*len+35个指令.最好的hash表大小是2的幂.不需要进行很慢的mod操作.
 * 如果需要少于32bits的值,那使用bitmask. 
 *
 * This procedure must never throw elog(ERROR); the ResourceOwner code
 * relies on this not to fail.
 * 这个过程永远都不要抛出elog(ERROR);依赖此函数的ResourceOwner代码永远都不会出现异常.
 *
 * Note: we could easily change this function to return a 64-bit hash value
 * by using the final values of both b and c.  b is perhaps a little less
 * well mixed than c, however.
 * 注意:不能轻易的改变该函数,通过使用b和c的最后值来返回64-bit的hash值.b的混合度可能没有c好
 * 
 */
Datum
hash_any(register const unsigned char *k, register int keylen)
{
    register uint32 a,
                b,
                c,
                len;
    /* Set up the internal state */
    //设置内部状态,初始化a/b/c
    len = keylen;
    a = b = c = 0x9e3779b9 + len + 3923095;
    /* If the source pointer is word-aligned, we use word-wide fetches */
    //如果源指针是字对齐的,那么我们使用字宽提取
    if (((uintptr_t) k & UINT32_ALIGN_MASK) == 0)
    {
        //源数据是对齐的
        /* Code path for aligned source data */
        register const uint32 *ka = (const uint32 *) k;
        /* handle most of the key */
        while (len >= 12)
        {
            a += ka[0];
            b += ka[1];
            c += ka[2];
            mix(a, b, c);
            ka += 3;
            len -= 12;
        }
        /* handle the last 11 bytes */
        //处理后面11个字节
        k = (const unsigned char *) ka;
#ifdef WORDS_BIGENDIAN//大码端
        switch (len)
        {
            case 11:
                c += ((uint32) k[10] << 8);
                /* fall through */
            case 10:
                c += ((uint32) k[9] << 16);
                /* fall through */
            case 9:
                c += ((uint32) k[8] << 24);
                /* fall through */
            case 8:
                /* the lowest byte of c is reserved for the length */
                b += ka[1];
                a += ka[0];
                break;
            case 7:
                b += ((uint32) k[6] << 8);
                /* fall through */
            case 6:
                b += ((uint32) k[5] << 16);
                /* fall through */
            case 5:
                b += ((uint32) k[4] << 24);
                /* fall through */
            case 4:
                a += ka[0];
                break;
            case 3:
                a += ((uint32) k[2] << 8);
                /* fall through */
            case 2:
                a += ((uint32) k[1] << 16);
                /* fall through */
            case 1:
                a += ((uint32) k[0] << 24);
                /* case 0: nothing left to add */
        }
#else                           /* 小码端; !WORDS_BIGENDIAN */
        switch (len)
        {
            case 11:
                c += ((uint32) k[10] << 24);
                /* fall through */
            case 10:
                c += ((uint32) k[9] << 16);
                /* fall through */
            case 9:
                c += ((uint32) k[8] << 8);
                /* fall through */
            case 8:
                /* the lowest byte of c is reserved for the length */
                b += ka[1];
                a += ka[0];
                break;
            case 7:
                b += ((uint32) k[6] << 16);
                /* fall through */
            case 6:
                b += ((uint32) k[5] << 8);
                /* fall through */
            case 5:
                b += k[4];
                /* fall through */
            case 4:
                a += ka[0];
                break;
            case 3:
                a += ((uint32) k[2] << 16);
                /* fall through */
            case 2:
                a += ((uint32) k[1] << 8);
                /* fall through */
            case 1:
                a += k[0];
                /* case 0: nothing left to add */
        }
#endif                          /* WORDS_BIGENDIAN */
    }
    else//---------- 非字对齐
    {
        /* Code path for non-aligned source data */
        /* handle most of the key */
        while (len >= 12)
        {
#ifdef WORDS_BIGENDIAN
            a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));
            b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));
            c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));
#else                           /* !WORDS_BIGENDIAN */
            a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));
            b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));
            c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));
#endif                          /* WORDS_BIGENDIAN */
            mix(a, b, c);
            k += 12;
            len -= 12;
        }
        /* handle the last 11 bytes */
#ifdef WORDS_BIGENDIAN
        switch (len)
        {
            case 11:
                c += ((uint32) k[10] << 8);
                /* fall through */
            case 10:
                c += ((uint32) k[9] << 16);
                /* fall through */
            case 9:
                c += ((uint32) k[8] << 24);
                /* fall through */
            case 8:
                /* the lowest byte of c is reserved for the length */
                b += k[7];
                /* fall through */
            case 7:
                b += ((uint32) k[6] << 8);
                /* fall through */
            case 6:
                b += ((uint32) k[5] << 16);
                /* fall through */
            case 5:
                b += ((uint32) k[4] << 24);
                /* fall through */
            case 4:
                a += k[3];
                /* fall through */
            case 3:
                a += ((uint32) k[2] << 8);
                /* fall through */
            case 2:
                a += ((uint32) k[1] << 16);
                /* fall through */
            case 1:
                a += ((uint32) k[0] << 24);
                /* case 0: nothing left to add */
        }
#else                           /* !WORDS_BIGENDIAN */
        switch (len)
        {
            case 11:
                c += ((uint32) k[10] << 24);
                /* fall through */
            case 10:
                c += ((uint32) k[9] << 16);
                /* fall through */
            case 9:
                c += ((uint32) k[8] << 8);
                /* fall through */
            case 8:
                /* the lowest byte of c is reserved for the length */
                b += ((uint32) k[7] << 24);
                /* fall through */
            case 7:
                b += ((uint32) k[6] << 16);
                /* fall through */
            case 6:
                b += ((uint32) k[5] << 8);
                /* fall through */
            case 5:
                b += k[4];
                /* fall through */
            case 4:
                a += ((uint32) k[3] << 24);
                /* fall through */
            case 3:
                a += ((uint32) k[2] << 16);
                /* fall through */
            case 2:
                a += ((uint32) k[1] << 8);
                /* fall through */
            case 1:
                a += k[0];
                /* case 0: nothing left to add */
        }
#endif                          /* WORDS_BIGENDIAN */
    }
    final(a, b, c);
    /* report the result */
    return UInt32GetDatum(c);
}
/*----------
 * mix -- mix 3 32-bit values reversibly.
 *
 * This is reversible, so any information in (a,b,c) before mix() is
 * still in (a,b,c) after mix().
 *
 * If four pairs of (a,b,c) inputs are run through mix(), or through
 * mix() in reverse, there are at least 32 bits of the output that
 * are sometimes the same for one pair and different for another pair.
 * This was tested for:
 * * pairs that differed by one bit, by two bits, in any combination
 *   of top bits of (a,b,c), or in any combination of bottom bits of
 *   (a,b,c).
 * * "differ" is defined as +, -, ^, or ~^.  For + and -, I transformed
 *   the output delta to a Gray code (a^(a>>1)) so a string of 1's (as
 *   is commonly produced by subtraction) look like a single 1-bit
 *   difference.
 * * the base values were pseudorandom, all zero but one bit set, or
 *   all zero plus a counter that starts at zero.
 *
 * This does not achieve avalanche.  There are input bits of (a,b,c)
 * that fail to affect some output bits of (a,b,c), especially of a.  The
 * most thoroughly mixed value is c, but it doesn't really even achieve
 * avalanche in c.
 *
 * This allows some parallelism.  Read-after-writes are good at doubling
 * the number of bits affected, so the goal of mixing pulls in the opposite
 * direction from the goal of parallelism.  I did what I could.  Rotates
 * seem to cost as much as shifts on every machine I could lay my hands on,
 * and rotates are much kinder to the top and bottom bits, so I used rotates.
 *----------
 */
#define mix(a,b,c) \
{ \
  a -= c;  a ^= rot(c, 4);  c += b; \
  b -= a;  b ^= rot(a, 6);  a += c; \
  c -= b;  c ^= rot(b, 8);  b += a; \
  a -= c;  a ^= rot(c,16);  c += b; \
  b -= a;  b ^= rot(a,19);  a += c; \
  c -= b;  c ^= rot(b, 4);  b += a; \
}
/*
 * UInt32GetDatum
 *      Returns datum representation for a 32-bit unsigned integer.
 */
#define UInt32GetDatum(X) ((Datum) (X))

三、跟踪分析

N/A

四、参考资料

PG Source Code
Buffer Manager

标题名称:PostgreSQL源码解读(143)-BufferManager#8(BufTableHashCode函数)
网站路径:https://www.cdcxhl.com/article40/jiegho.html

成都网站建设公司_创新互联,为您提供关键词优化用户体验自适应网站标签优化搜索引擎优化电子商务

广告

声明:本网站发布的内容(图片、视频和文字)以用户投稿、用户转载内容为主,如果涉及侵权请尽快告知,我们将会在第一时间删除。文章观点不代表本网站立场,如需处理请联系客服。电话:028-86922220;邮箱:631063699@qq.com。内容未经允许不得转载,或转载时需注明来源: 创新互联

网站建设网站维护公司