This paper presents the design of a 10 Gb/s low power wire-line receiver in the 65 nm CMOS process with 1 V supply voltage. The receiver occupies 300×500 μm2. With the novel half rate period calibration clock data recovery (CDR) circuit, the receiver consumes 52 mW power. The receiver can compensate a wide range of channel loss by combining the low power wideband programmable continuous time linear equalizer (CTLE) and decision feedback equalizer (DFE).
Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread.
Content addressable memory (CAM) is widely used and its tests mostly use functional fault models. However, functional fault models cannot describe some physical faults exactly. This paper introduces physical fault models for write-only CAM. Two test algorithms which can cover 100% targeted physical faults are also proposed. The algorithm for a CAM module with N-bit match output signal needs only 2N+2L+4 comparison operations and 5N writing operations, where N is the number of words and L is the word length. The algorithm for a HIT-signal-only CAM module uses 2N+2L+5 comparison operations and 8N writing operations. Compared to previous work, the proposed algorithms can test more physical faults with a few more operations. An experiment on a test chip shows the effectiveness and efficiency of the proposed physical fault models and algorithms.
Although the design of many kinds of microprocessors has been under developing for several decades, the computer architecture R&D community lacks well documented lessons and experiences about design decisions in the research literature. In this paper, we systematically present the design decisions we made during the designing and prototyping of Godson-2 series processors. The 250MHz Godson-2B, 450MHz Godson-2C, and 1GHz Godson-2E processors that implement 64-bit, four-issue, out-of-order architecture were taped out in 2003, 2004, and 2005, respectively. Each processor triples its predecessor in the SPEC CPU2000 rates. Our first-hand experiences and lessons gained from these designs would provide unique perspectives and insights that are not available in any existing text books and/or published papers. We summarize 10 critical lessons and experiences based on hundreds of our attempts at architectural and design optimizations for performance improvement of Godson-2 series processors. The issues include silicon-simulation correlation, design balancing, performance optimizing, and pico-architecture tuning. We conclude that persistent improvement, attitude towards work-on-silicon design, and insightful understanding of software and fabrication process are the three most important factors for designing a high performance processor with low energy consumption.