PARSIVAL: Parallel High-Throughput Simulations for Efficient Nanoelectronic Design and Test Validation
since 10.2014, DFG-Project: WU 245/16-1
Project Description
In this project novel methods for versatile simulation-based VLSI design and test validation on high throughput data-parallel architectures will be developed, which enable a wide range of important state-of-the-art validation tasks for large circuits. In general, due to the nature of the design validation processes and due to the massive amount of data involved, parallelism and throughput-optimization are the keys for making design validation feasible for future industrial-sized designs. The main focus and key features lie in the structure of the simulation model, the abstraction level and the used algorithms, as well as their parallelization on data-parallel architectures. The simulation algorithms should be kept simple to run fast, yet accurate enough to produce acceptable and valuable data for cross-layer validation of complex digital systems.
Design and test validation is one of the most important and complex tasks within modern semi-conductor product development cycles. The design validation processes analyze and assess a developed design with respect to certain validation targets to ensure the compliance with given specifications and customer requirements. With thorough validation, test strategies can be assessed to increase test quality and the quality of the products delivered. The type of specification and requirements can range from the abstract high-level functional behavior of the circuit towards constraints of parameters at lower levels, such as peak power consumption or transistor stress. With the process scaling, more complex defect mechanisms arise and more and more low level parameters have to be considered, which is why validation at lower levels has become an essential part in current manufacturing processes. Yet, state-of-the-art algorithms rely on compute-intensive simulations being unable to scale on traditional computing architectures as a result of the increasing complexity of the designs and the required model accuracy. Over the past years, data-parallel architectures, such as Graphics Processing Units (GPUs), have evolved and introduced the many-core paradigm, which well established in general purpose computing. Current architectures nowadays provide thousands of processors on a single chip and are capable of achieving massive computational throughput of several Teraflops. By exploiting highly parallel hardware acceleration and careful abstraction, we strive for maximum throughput in order to enable a wide range of complex electronic design automation (EDA) applications applicable to industrial-sized designs.
Development of Simulation Models for accurate Validation
The abstraction levels to be considered during the first project phase cover descriptions from gate-level to electrical level and incorporate all the information required for an accurate evaluation of the circuit behavior and its functional properties. The evaluation models must be sufficiently comprehensive to describe all significant electrical parameters that have impact on the logic behavior and timing, but must still support an extremely efficient parallel algorithm environment. Instead of relying on continuous computation of differential equations as common in lower level simulation tools such as SPICE, the algorithm makes use of piecewise approximations of the electrical behavior through linearization in order to model functional properties and to compute the signal values. This offers an attractive alternative in terms of the tradeoff between achievable precision and computational effort.
In addition to the ideal logic and timing behavior, the functional model has to be extended to consider the impact of parasitic and external parameters including: Modeling the power grid, cross-talk, the impact of temperature and environmental influences.
Non-functional properties (NFPs) have to be evaluated over a very wide range of different time scales. Computing the vulnerability of circuit structures with respect to soft-errors is subject of single effects in the range of picoseconds. Circuit robustness is related to noise, inductivity or signal integrity whose duration is in the order of nanoseconds. Current and power-consumption have to be evaluated at this scale as well, while temperature may be evaluated over the range of milliseconds due to the heat capacitance and heat transfer functions of the device under evaluation. However, reliability with respect to wear-out mechanisms and aging has to be analyzed at the scale of months or years and requires a completely different approach. All these NFPs have in turn direct impact on the functional properties and have to be evaluated in an integrated way.
Massively Parallel Validation Algorithms
The developed simulation models will be evaluated by optimized algorithms for massively parallel compute structures like GPUs. On such architectures, high throughput is achieved by parallelizing computations and maximizing the number of occupied computational resources during runtime. Exploiting parallelism in various dimensions requires that the simulation algorithms will be tuned for the targeted data-parallel architectures. For an optimal result, this comprises not only a thorough understanding of the underlying architecture and instruction sets, but also requires a general and flexible algorithm design.
Parallelism will be exploited in many ways during evaluation:
- Structural parallelism, induced by mutually independent nodes in a circuit, allows the concurrent evaluation of the independent nodes.
- Multiple fault simulations can be simulated in parallel, e.g. when different fault propagation cones are involved such that interactions between the faults are prevented.
- Pattern parallelism, a type of data-parallelism, allows the evaluation of a circuit for different patterns at once.
- Instance parallelism takes advantage of circuit instances with different parameters, such as varying node delays, which can be evaluated at the same time for a given pattern or fault.
- Task parallelism allows to execute different subtasks for an instance concurrently on the device to evaluate multiple parameters at the same time.
The validation tasks can be significantly accelerated by optimal combination and exploitation of these different dimensions of parallelism, yielding a maximized throughput. However, this requires a thorough scheduling of the computational tasks and an elaborate utilization of the computational resources of the underlying many-core GPU architecture.
Additional Information
This work is supported by the German Research Foundation (DFG) under grant WU 245/16-1.
Publications
Journals and Conference Proceedings
13. | SWIFT: Switch Level Fault Simulation on GPUs Schneider, Eric; Wunderlich, Hans-Joachim IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) Vol. 38(1), January 2019, pp. 122-135 |
2019 DOI PDF |
Keywords: parallel simulation, fault simulation, switch level, parametric faults, complex gates, variation analysis, GPU | ||
Abstract: Current nanometer CMOS circuits show an increasing sensitivity to deviations in first-order parameters and suffer from process variations during manufacturing. To properly assess and support test validation of digital designs, low-level fault simulation approaches are utilized to accurately capture the behavior of CMOS cells under parametric faults and process variations as early as possible throughout the design phase. However, low-level simulation approaches exhibit a high computational complexity, especially when variation has to be taken into account. In this work a high-throughput parallel fault simulation at switch level is presented. First-order electrical parameters are utilized to capture CMOS-specific functional and timing behavior of complex cells allowing to model faults with transistor granularity and without the need of logic abstraction. Furthermore, variation modeling in cells and transistor devices enables broad and efficient variation analyses of faults over many circuit instances for the first time. The simulation approach utilizes massive parallelization on Graphics Processing Units (GPUs) by exploiting parallelism from cells, stimuli, faults and circuit instances. Despite the lower abstraction levels of the approach, it processes designs with millions of gates and outperforms conventional fault simulation at logic level in terms of speed and accuracy. | ||
BibTeX:
@article{SchneW2018, author = {Schneider, Eric and Wunderlich, Hans-Joachim}, title = {{SWIFT: Switch Level Fault Simulation on GPUs}}, journal = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)}, year = {2019}, volume = {38}, number = {1}, pages = {122--135}, keywords = {parallel simulation, fault simulation, switch level, parametric faults, complex gates, variation analysis, GPU}, abstract = {Current nanometer CMOS circuits show an increasing sensitivity to deviations in first-order parameters and suffer from process variations during manufacturing. To properly assess and support test validation of digital designs, low-level fault simulation approaches are utilized to accurately capture the behavior of CMOS cells under parametric faults and process variations as early as possible throughout the design phase. However, low-level simulation approaches exhibit a high computational complexity, especially when variation has to be taken into account. In this work a high-throughput parallel fault simulation at switch level is presented. First-order electrical parameters are utilized to capture CMOS-specific functional and timing behavior of complex cells allowing to model faults with transistor granularity and without the need of logic abstraction. Furthermore, variation modeling in cells and transistor devices enables broad and efficient variation analyses of faults over many circuit instances for the first time. The simulation approach utilizes massive parallelization on Graphics Processing Units (GPUs) by exploiting parallelism from cells, stimuli, faults and circuit instances. Despite the lower abstraction levels of the approach, it processes designs with millions of gates and outperforms conventional fault simulation at logic level in terms of speed and accuracy.}, doi = {http://dx.doi.org/10.1109/TCAD.2018.2802871}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2018/TCAD_SchneW2018.pdf} } |
||
12. | Multi-Level Timing and Fault Simulation on GPUs Schneider, Eric; Wunderlich, Hans-Joachim INTEGRATION, the VLSI Journal -- Special Issue of ASP-DAC 2018 Vol. 64, January 2019, pp. 78-91 |
2019 DOI URL PDF |
Keywords: Parallel fault simulation, Multi-level, Transistor faults, Waveform accurate, GPUs | ||
Abstract: In CMOS technology first-order parametric faults during manufacturing can exhibit severe changes in the timing as well as in the functional behavior of cells. Since these faults are hard to detect by conventional tests, the accurate simulation of these low-level faults plays an important role for test validation. However, pure low-level fault simulation approaches impose a high computational complexity that can quickly become inapplicable to larger simulation problems due to limitations in scalability. In this paper, the first parallel multi-level fault simulation approach on graphics processing units (GPUs) is presented. The approach utilizes both logic level and switch level descriptions concurrently in a mixed-abstraction timing simulation. The abstraction is lowered in user-defined so-called regions of interest that locally increase the modeling accuracy enabling low-level first-order parametric fault injection. Resulting signal waveforms are transformed between the different abstractions transparently. This way a fast, versatile and efficient multi-level fault simulation approach on GPUs is created that scales for designs with millions of cells while achieving high simulation throughput with runtime savings of up to 84% compared to full switch level simulations. | ||
BibTeX:
@article{SchneW2018a, author = {Schneider, Eric and Wunderlich, Hans-Joachim}, title = {{Multi-Level Timing and Fault Simulation on GPUs}}, journal = {INTEGRATION, the VLSI Journal -- Special Issue of ASP-DAC 2018}, year = {2019}, volume = {64}, pages = {78--91}, keywords = {Parallel fault simulation, Multi-level, Transistor faults, Waveform accurate, GPUs}, abstract = {In CMOS technology first-order parametric faults during manufacturing can exhibit severe changes in the timing as well as in the functional behavior of cells. Since these faults are hard to detect by conventional tests, the accurate simulation of these low-level faults plays an important role for test validation. However, pure low-level fault simulation approaches impose a high computational complexity that can quickly become inapplicable to larger simulation problems due to limitations in scalability. In this paper, the first parallel multi-level fault simulation approach on graphics processing units (GPUs) is presented. The approach utilizes both logic level and switch level descriptions concurrently in a mixed-abstraction timing simulation. The abstraction is lowered in user-defined so-called regions of interest that locally increase the modeling accuracy enabling low-level first-order parametric fault injection. Resulting signal waveforms are transformed between the different abstractions transparently. This way a fast, versatile and efficient multi-level fault simulation approach on GPUs is created that scales for designs with millions of cells while achieving high simulation throughput with runtime savings of up to 84% compared to full switch level simulations.}, url = {https://authors.elsevier.com/a/1Y2vpcBfIgs7p}, doi = {http://dx.doi.org/10.1016/j.vlsi.2018.08.005}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2018/VLSI_SchneW2018.pdf} } |
||
11. | Multi-Level Timing Simulation on GPUs Schneider, Eric; Kochte, Michael A.; Wunderlich, Hans-Joachim Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC'18), Jeju Island, Korea, 22-25 January 2018 , pp. 470-475 |
2018 DOI PDF |
Keywords: timing simulation, switch level, multi-level, parallel simulation, GPUs | ||
Abstract: Timing-accurate simulation of circuits is an important task in design validation of modern nano-scale CMOS circuits. With shrinking technology nodes, detailed simulation models down to transistor level have to be considered. While conventional simulation at logic level lacks the ability to accurately model timing behavior for complex cells, more accurate simulation at lower abstraction levels becomes computationally expensive for larger designs. This work presents the first parallel multi-level waveform-accurate timing simulation approach on graphics processing units (GPUs). The simulation uses logic and switch level abstraction concurrently, thus allowing to combine their advantages by trading off speed and accuracy. The abstraction can be lowered in arbitrary regions of interest to locally increase the accuracy. Waveform transformations allow for transparent switching between the abstraction levels. With the utilization of GPUs and thoughtful unification of algorithms and data structures, a fast and versatile high-throughput multi-level simulation is obtained that is scalable for millions of cells while achieving runtime savings of up to 89% compared to full simulation at switch level. | ||
BibTeX:
@inproceedings{SchneKW2018, author = {Schneider, Eric and Kochte, Michael A. and Wunderlich, Hans-Joachim}, title = {{Multi-Level Timing Simulation on GPUs}}, booktitle = {Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC'18)}, year = { 2018 }, pages = {470--475}, keywords = {timing simulation, switch level, multi-level, parallel simulation, GPUs}, abstract = {Timing-accurate simulation of circuits is an important task in design validation of modern nano-scale CMOS circuits. With shrinking technology nodes, detailed simulation models down to transistor level have to be considered. While conventional simulation at logic level lacks the ability to accurately model timing behavior for complex cells, more accurate simulation at lower abstraction levels becomes computationally expensive for larger designs. This work presents the first parallel multi-level waveform-accurate timing simulation approach on graphics processing units (GPUs). The simulation uses logic and switch level abstraction concurrently, thus allowing to combine their advantages by trading off speed and accuracy. The abstraction can be lowered in arbitrary regions of interest to locally increase the accuracy. Waveform transformations allow for transparent switching between the abstraction levels. With the utilization of GPUs and thoughtful unification of algorithms and data structures, a fast and versatile high-throughput multi-level simulation is obtained that is scalable for millions of cells while achieving runtime savings of up to 89% compared to full simulation at switch level.}, doi = {http://dx.doi.org/10.1109/ASPDAC.2018.8297368}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2018/ASPDAC_SchneKW2018.pdf} } |
||
10. | Analysis and Mitigation of IR-Drop Induced Scan Shift-Errors Holst, Stefan; Schneider, Eric; Kawagoe, Koshi; Kochte, Michael A.; Miyase, Kohei; Wunderlich, Hans-Joachim; Kajihara, Seiji; Wen, Xiaoqing Proceedings of the IEEE International Test Conference (ITC'17), Fort Worth, Texas, USA, 31 October-2 November 2017, pp. 1-8 Distinguished Paper |
2017 DOI PDF |
Abstract: Excessive IR-drop during scan shift can cause localized IR-drop around clock buffers and introduce dynamic clock skew. Excessive clock skew at neighboring scan flip-flops results in hold or setup timing violations corrupting test stimuli or test responses during shifting. We introduce a new method to assess the risk of such test data corruption at each scan cycle and flip-flop. The most likely cases of test data corruption are mitigated in a non-intrusive way by selective test data manipulation and masking of affected responses. Evaluation results show the computational feasibility of our method for large benchmark circuits, and demonstrate that a few targeted pattern changes provide large potential gains in shift safety and test time with negligible cost in fault coverage. | ||
BibTeX:
@inproceedings{HolstSKKMWKW2017, author = {Holst, Stefan and Schneider, Eric and Kawagoe, Koshi and Kochte, Michael A. and Miyase, Kohei and Wunderlich, Hans-Joachim and Kajihara, Seiji and Wen, Xiaoqing}, title = {{Analysis and Mitigation of IR-Drop Induced Scan Shift-Errors}}, booktitle = {Proceedings of the IEEE International Test Conference (ITC'17)}, year = {2017}, pages = {1--8}, abstract = {Excessive IR-drop during scan shift can cause localized IR-drop around clock buffers and introduce dynamic clock skew. Excessive clock skew at neighboring scan flip-flops results in hold or setup timing violations corrupting test stimuli or test responses during shifting. We introduce a new method to assess the risk of such test data corruption at each scan cycle and flip-flop. The most likely cases of test data corruption are mitigated in a non-intrusive way by selective test data manipulation and masking of affected responses. Evaluation results show the computational feasibility of our method for large benchmark circuits, and demonstrate that a few targeted pattern changes provide large potential gains in shift safety and test time with negligible cost in fault coverage.}, doi = {http://dx.doi.org/10.1109/TEST.2017.8242055}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/ITC_HolstSKKMWKW2017.pdf} } |
||
9. | Probabilistic Sensitization Analysis for Variation-Aware Path Delay Fault Test Evaluation Wagner, Marcus; Wunderlich, Hans-Joachim Proceedings of the 22nd IEEE European Test Symposium (ETS'17), Limassol, Cyprus, 22-26 May 2017, pp. 1-6 |
2017 DOI PDF |
Keywords: delay test, process variations, delay test quality | ||
Abstract: With the ever increasing process variability in recent technology nodes, path delay fault testing of digital integrated circuits has become a major challenge. A randomly chosen long path often has no robust test and many of the existing non-robust tests are likely invalidated by process variations. To generate path delay fault tests that are more tolerant towards process variations, the delay test generation must evaluate different non-robust tests and only those tests that sensitize the target path with a sufficiently high probability in presence of process variations must be selected. This requires a huge number of probability computations for a large number of target paths and makes the development of very efficient approximation algorithms mandatory for any practical application. In this paper, a novel and efficient probabilistic sensitization analysis is presented which is used to extract a small subcircuit for a given test vector-pair. The probability that a target path is sensitized by the vector-pair is computed efficiently and without significant error by a Monte-Carlo simulation of the subcircuit. | ||
BibTeX:
@inproceedings{WagneW2017, author = {Wagner, Marcus and Wunderlich, Hans-Joachim}, title = {{Probabilistic Sensitization Analysis for Variation-Aware Path Delay Fault Test Evaluation}}, booktitle = {Proceedings of the 22nd IEEE European Test Symposium (ETS'17)}, year = {2017}, pages = {1--6}, keywords = {delay test, process variations, delay test quality}, abstract = {With the ever increasing process variability in recent technology nodes, path delay fault testing of digital integrated circuits has become a major challenge. A randomly chosen long path often has no robust test and many of the existing non-robust tests are likely invalidated by process variations. To generate path delay fault tests that are more tolerant towards process variations, the delay test generation must evaluate different non-robust tests and only those tests that sensitize the target path with a sufficiently high probability in presence of process variations must be selected. This requires a huge number of probability computations for a large number of target paths and makes the development of very efficient approximation algorithms mandatory for any practical application. In this paper, a novel and efficient probabilistic sensitization analysis is presented which is used to extract a small subcircuit for a given test vector-pair. The probability that a target path is sensitized by the vector-pair is computed efficiently and without significant error by a Monte-Carlo simulation of the subcircuit.}, doi = {http://dx.doi.org/10.1109/ETS.2017.7968226}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/ETS_WagneW2017.pdf} } |
||
8. | GPU-Accelerated Simulation of Small Delay Faults Schneider, Eric; Kochte, Michael A.; Holst, Stefan; Wen, Xiaoqing; Wunderlich, Hans-Joachim IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) Vol. 36(5), May 2017, pp. 829-841 |
2017 DOI PDF |
Keywords: Circuit faults, Computational modeling, Delays, Instruction sets, Integrated circuit modeling, Logic gates, Fault simulation, graphics processing unit (GPU), parallel, process variation, small gate delay faults, timing-accurate, waveform | ||
Abstract: Delay fault simulation is an essential task during test pattern generation and reliability assessment of electronic circuits. With the high sensitivity of current nano-scale designs towards even smallest delay deviations, the simulation of small gate delay faults has become extremely important. Since these faults have a subtle impact on the timing behavior, traditional fault simulation approaches based on abstract timing models are not sufficient. Furthermore, the detection of these faults is compromised by the ubiquitous variations in the manufacturing processes, which causes the actual fault coverage to vary from circuit instance to circuit instance, and makes the use of timing accurate methods mandatory. However, the application of timing accurate techniques quickly becomes infeasible for larger designs due to excessive computational requirements. In this work, we present a method for fast and waveformaccurate simulation of small delay faults on graphics processing units with exceptional computational performance. By exploiting multiple dimensions of parallelism from gates, faults, waveforms and circuit instances, the proposed approach allows for timing-accurate and exhaustive small delay fault simulation under process variation for designs with millions of gates. | ||
BibTeX:
@article{SchneKHWW2016, author = {Schneider, Eric and Kochte, Michael A. and Holst, Stefan and Wen, Xiaoqing and Wunderlich, Hans-Joachim}, title = {{GPU-Accelerated Simulation of Small Delay Faults}}, journal = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)}, year = {2017}, volume = {36}, number = {5}, pages = {829--841}, keywords = {Circuit faults, Computational modeling, Delays, Instruction sets, Integrated circuit modeling, Logic gates, Fault simulation, graphics processing unit (GPU), parallel, process variation, small gate delay faults, timing-accurate, waveform}, abstract = {Delay fault simulation is an essential task during test pattern generation and reliability assessment of electronic circuits. With the high sensitivity of current nano-scale designs towards even smallest delay deviations, the simulation of small gate delay faults has become extremely important. Since these faults have a subtle impact on the timing behavior, traditional fault simulation approaches based on abstract timing models are not sufficient. Furthermore, the detection of these faults is compromised by the ubiquitous variations in the manufacturing processes, which causes the actual fault coverage to vary from circuit instance to circuit instance, and makes the use of timing accurate methods mandatory. However, the application of timing accurate techniques quickly becomes infeasible for larger designs due to excessive computational requirements. In this work, we present a method for fast and waveformaccurate simulation of small delay faults on graphics processing units with exceptional computational performance. By exploiting multiple dimensions of parallelism from gates, faults, waveforms and circuit instances, the proposed approach allows for timing-accurate and exhaustive small delay fault simulation under process variation for designs with millions of gates.}, doi = {http://dx.doi.org/10.1109/TCAD.2016.2598560}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/TCAD_SchneKHWW2016.pdf} } |
||
7. | Aging Monitor Reuse for Small Delay Fault Testing Liu, Chang; Kochte, Michael A.; Wunderlich, Hans-Joachim Proceedings of the 35th VLSI Test Symposium (VTS'17), Caesars Palace, Las Vegas, Nevada, USA, 9-12 April 2017, pp. 1-6 |
2017 DOI PDF |
Keywords: Delay monitoring, delay test, faster-than-at-speed test, stability checker, small delay fault, ATPG | ||
Abstract: Small delay faults receive more and more attention, since they may indicate a circuit reliability marginality even if they do not violate the timing at the time of production. At-speed test and faster-than-at-speed test (FAST) are rather expensive tasks to test for such faults. The paper at hand avoids complex on-chip structures or expensive high-speed ATE for test response evaluation, if aging monitors which are integrated into the device under test anyway are reused. The main challenge in reusing aging monitors for FAST consists in possible false alerts at higher frequencies. While a certain test vector pair makes a delay fault observable at one monitor, it may also exceed the time slack in the fault free case at a different monitor which has to be masked. Therefore, a multidimensional optimizing problem has to be solved for minimizing the masking overhead and the number of test vectors while maximizing delay fault coverage. | ||
BibTeX:
@inproceedings{LiuKW2017, author = {Liu, Chang and Kochte, Michael A. and Wunderlich, Hans-Joachim}, title = {{Aging Monitor Reuse for Small Delay Fault Testing}}, booktitle = {Proceedings of the 35th VLSI Test Symposium (VTS'17)}, year = {2017}, pages = {1--6}, keywords = {Delay monitoring, delay test, faster-than-at-speed test, stability checker, small delay fault, ATPG}, abstract = {Small delay faults receive more and more attention, since they may indicate a circuit reliability marginality even if they do not violate the timing at the time of production. At-speed test and faster-than-at-speed test (FAST) are rather expensive tasks to test for such faults. The paper at hand avoids complex on-chip structures or expensive high-speed ATE for test response evaluation, if aging monitors which are integrated into the device under test anyway are reused. The main challenge in reusing aging monitors for FAST consists in possible false alerts at higher frequencies. While a certain test vector pair makes a delay fault observable at one monitor, it may also exceed the time slack in the fault free case at a different monitor which has to be masked. Therefore, a multidimensional optimizing problem has to be solved for minimizing the masking overhead and the number of test vectors while maximizing delay fault coverage.}, doi = {http://dx.doi.org/10.1109/VTS.2017.7928921}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2017/VTS_LiuKW2017.pdf} } |
||
6. | Timing-Accurate Estimation of IR-Drop Impact on Logic- and Clock-Paths During At-Speed Scan Test Holst, Stefan; Schneider, Eric; Wen, Xiaoqing; Kajihara, Seiji; Yamato, Yuta; Wunderlich, Hans-Joachim; Kochte, Michael A. Proceedings of the 25th IEEE Asian Test Symposium (ATS'16), Hiroshima, Japan, 21-24 November 2016, pp. 19-24 |
2016 DOI PDF |
Abstract: IR-drop induced false capture failures and test clock stretch are severe problems in at-speed scan testing. We propose a new method to efficiently and accurately identify these problems. For the first time, our approach considers the additional dynamic power caused by glitches, the spatial and temporal distribution of all toggles, and their impact on both logic paths and the clock tree without time-consuming electrical simulations. | ||
BibTeX:
@inproceedings{HolstSWKYWK2016, author = {Holst, Stefan and Schneider, Eric and Wen, Xiaoqing and Kajihara, Seiji and Yamato, Yuta and Wunderlich, Hans-Joachim and Kochte, Michael A.}, title = {{Timing-Accurate Estimation of IR-Drop Impact on Logic- and Clock-Paths During At-Speed Scan Test}}, booktitle = {Proceedings of the 25th IEEE Asian Test Symposium (ATS'16)}, year = {2016}, pages = {19--24}, abstract = {IR-drop induced false capture failures and test clock stretch are severe problems in at-speed scan testing. We propose a new method to efficiently and accurately identify these problems. For the first time, our approach considers the additional dynamic power caused by glitches, the spatial and temporal distribution of all toggles, and their impact on both logic paths and the clock tree without time-consuming electrical simulations.}, doi = {http://dx.doi.org/10.1109/ATS.2016.49}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/ATS_HolstSWKYWK2016.pdf} } |
||
5. | High-Throughput Transistor-Level Fault Simulation on GPUs Schneider, Eric; Wunderlich, Hans-Joachim Proceedings of the 25th IEEE Asian Test Symposium (ATS'16), Hiroshima, Japan, 21-24 November 2016, pp. 150-155 |
2016 DOI PDF |
Keywords: fault simulation; transistor level; switch level; GPUs | ||
Abstract: Deviations in the first-order parameters of CMOS cells can lead to severe errors in the functional and time domain. With increasing sensitivity of these parameters to manufacturing defects and variation, parametric and parasitic-aware fault simulation is becoming crucial in order to support test pattern generation. Traditional approaches based on gate-level models are not sufficient to represent and capture the impact of deviations in these parameters in either an efficient or accurate manner. Evaluation at electrical level, on the other hand, severely lacks execution speed and quickly becomes inapplicable to larger designs due to high computational demands. This work presents a novel fault simulation approach considering first-order parameters in CMOS circuits to explicitly capture CMOS-specific behavior in the functional and time domain with transistor granularity. The approach utilizes massive parallelization in order to achieve high-throughput acceleration on Graphics Processing Units (GPUs) by exploiting parallelism of cells, stimuli and faults. Despite the more precise level of abstraction, the simulator is able to process designs with millions of gates and even outperforms conventional simulation at logic level in terms of modeling accuracy and simulation speed. | ||
BibTeX:
@inproceedings{SchneW2016, author = {Schneider, Eric and Wunderlich, Hans-Joachim}, title = {{High-Throughput Transistor-Level Fault Simulation on GPUs}}, booktitle = {Proceedings of the 25th IEEE Asian Test Symposium (ATS'16)}, year = {2016}, pages = {150--155}, keywords = {fault simulation; transistor level; switch level; GPUs}, abstract = {Deviations in the first-order parameters of CMOS cells can lead to severe errors in the functional and time domain. With increasing sensitivity of these parameters to manufacturing defects and variation, parametric and parasitic-aware fault simulation is becoming crucial in order to support test pattern generation. Traditional approaches based on gate-level models are not sufficient to represent and capture the impact of deviations in these parameters in either an efficient or accurate manner. Evaluation at electrical level, on the other hand, severely lacks execution speed and quickly becomes inapplicable to larger designs due to high computational demands. This work presents a novel fault simulation approach considering first-order parameters in CMOS circuits to explicitly capture CMOS-specific behavior in the functional and time domain with transistor granularity. The approach utilizes massive parallelization in order to achieve high-throughput acceleration on Graphics Processing Units (GPUs) by exploiting parallelism of cells, stimuli and faults. Despite the more precise level of abstraction, the simulator is able to process designs with millions of gates and even outperforms conventional simulation at logic level in terms of modeling accuracy and simulation speed.}, doi = {http://dx.doi.org/10.1109/ATS.2016.9}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2016/ATS_SchneW2016.pdf} } |
||
4. | Logic/Clock-Path-Aware At-Speed Scan Test Generation for Avoiding False Capture Failures and Reducing Clock Stretch Asada, Koji; Wen, Xiaoqing; Holst, Stefan; Miyase, Kohei; Kajihara, Seiji; Kochte, Michael A.; Schneider, Eric; Wunderlich, Hans-Joachim; Qian, Jun Proceedings of the 24th IEEE Asian Test Symposium (ATS'15), Mumbai, India, 22-25 November 2015, pp. 103-108 ATS 2015 Best Paper Award |
2015 DOI PDF |
Keywords: launch switching activity, IR-drop, logic path, clock path, false capture failure, test clock stretch, X-filling | ||
Abstract: IR-drop induced by launch switching activity (LSA) in capture mode during at-speed scan testing increases delay along not only logic paths (LPs) but also clock paths (CPs). Excessive extra delay along LPs compromises test yields due to false capture failures, while excessive extra delay along CPs compromises test quality due to test clock stretch. This paper is the first to mitigate the impact of LSA on both LPs and CPs with a novel LCPA (Logic/Clock-Path-Aware) at-speed scan test generation scheme, featuring (1) a new metric for assessing the risk of false capture failures based on the amount of LSA around both LPs and CPs, (2) a procedure for avoiding false capture failures by reducing LSA around LPs or masking uncertain test responses, and (3) a procedure for reducing test clock stretch by reducing LSA around CPs. Experimental results demonstrate the effectiveness of the LCPA scheme in improving test yields and test quality. | ||
BibTeX:
@inproceedings{AsadaWHMKKSWQ2015, author = {Asada, Koji and Wen, Xiaoqing and Holst, Stefan and Miyase, Kohei and Kajihara, Seiji and Kochte, Michael A. and Schneider, Eric and Wunderlich, Hans-Joachim and Qian, Jun}, title = {{Logic/Clock-Path-Aware At-Speed Scan Test Generation for Avoiding False Capture Failures and Reducing Clock Stretch}}, booktitle = {Proceedings of the 24th IEEE Asian Test Symposium (ATS'15)}, year = {2015}, pages = {103-108}, keywords = { launch switching activity, IR-drop, logic path, clock path, false capture failure, test clock stretch, X-filling }, abstract = {IR-drop induced by launch switching activity (LSA) in capture mode during at-speed scan testing increases delay along not only logic paths (LPs) but also clock paths (CPs). Excessive extra delay along LPs compromises test yields due to false capture failures, while excessive extra delay along CPs compromises test quality due to test clock stretch. This paper is the first to mitigate the impact of LSA on both LPs and CPs with a novel LCPA (Logic/Clock-Path-Aware) at-speed scan test generation scheme, featuring (1) a new metric for assessing the risk of false capture failures based on the amount of LSA around both LPs and CPs, (2) a procedure for avoiding false capture failures by reducing LSA around LPs or masking uncertain test responses, and (3) a procedure for reducing test clock stretch by reducing LSA around CPs. Experimental results demonstrate the effectiveness of the LCPA scheme in improving test yields and test quality.}, doi = {http://dx.doi.org/10.1109/ATS.2015.25}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/ATS_AsadaWHMKKSWQ2015.pdf} } |
||
3. | High-Throughput Logic Timing Simulation on GPGPUs Holst, Stefan; Imhof, Michael E.; Wunderlich, Hans-Joachim ACM Transactions on Design Automation of Electronic Systems (TODAES) Vol. 20(3), June 2015, pp. 37:1-37:21 |
2015 DOI URL PDF |
Keywords: Verification, Performance, Gate-Level Simulation, General Purpose computing on Graphics Processing Unit (GP-GPU), Hazards, Parallel CAD, Pin-to-Pin Delay, Pulse-Filtering, Timing Simulation | ||
Abstract: Many EDA tasks like test set characterization or the precise estimation of power consumption, power droop and temperature development, require a very large number of time-aware gate-level logic simulations. Until now, such characterizations have been feasible only for rather small designs or with reduced precision due to the high computational demands. The new simulation system presented here is able to accelerate such tasks by more than two orders of magnitude and provides for the first time fast and comprehensive timing simulations for industrial-sized designs. Hazards, pulse-filtering, and pin-to-pin delay are supported for the first time in a GPGPU accelerated simulator, and the system can easily be extended to even more realistic delay models and further applications. A sophisticated mapping with efficient memory utilization and access patterns as well as minimal synchronizations and control flow divergence is able to use the full potential of GPGPU architectures. To provide such a mapping, we combine for the first time the versatility of event-based timing simulation and multidimensional parallelism used in GPU-based gate-level simulators. The result is a throughput-optimized timing simulation algorithm, which runs many simulation instances in parallel and at the same time fully exploits gate-parallelism within the circuit. | ||
BibTeX:
@article{HolstIW2015, author = {Holst, Stefan and Imhof, Michael E. and Wunderlich, Hans-Joachim}, title = {{High-Throughput Logic Timing Simulation on GPGPUs}}, journal = {ACM Transactions on Design Automation of Electronic Systems (TODAES)}, year = {2015}, volume = {20}, number = {3}, pages = {37:1--37:21}, keywords = {Verification, Performance, Gate-Level Simulation, General Purpose computing on Graphics Processing Unit (GP-GPU), Hazards, Parallel CAD, Pin-to-Pin Delay, Pulse-Filtering, Timing Simulation}, abstract = {Many EDA tasks like test set characterization or the precise estimation of power consumption, power droop and temperature development, require a very large number of time-aware gate-level logic simulations. Until now, such characterizations have been feasible only for rather small designs or with reduced precision due to the high computational demands. The new simulation system presented here is able to accelerate such tasks by more than two orders of magnitude and provides for the first time fast and comprehensive timing simulations for industrial-sized designs. Hazards, pulse-filtering, and pin-to-pin delay are supported for the first time in a GPGPU accelerated simulator, and the system can easily be extended to even more realistic delay models and further applications. A sophisticated mapping with efficient memory utilization and access patterns as well as minimal synchronizations and control flow divergence is able to use the full potential of GPGPU architectures. To provide such a mapping, we combine for the first time the versatility of event-based timing simulation and multidimensional parallelism used in GPU-based gate-level simulators. The result is a throughput-optimized timing simulation algorithm, which runs many simulation instances in parallel and at the same time fully exploits gate-parallelism within the circuit.}, url = {http://dl.acm.org/citation.cfm?id=2714564}, doi = {http://dx.doi.org/10.1145/2714564}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/TODAES_HolstIW2015.pdf} } |
||
2. | GPU-Accelerated Small Delay Fault Simulation Schneider, Eric; Holst, Stefan; Kochte, Michael A.; Wen, Xiaoqing; Wunderlich, Hans-Joachim Proceedings of the ACM/IEEE Conference on Design, Automation and Test in Europe (DATE'15), Grenoble, France, 9-13 March 2015, pp. 1174-1179 Best Paper Candidate |
2015 DOI URL PDF |
Abstract: The simulation of delay faults is an essential task in design validation and reliability assessment of circuits. Due to the high sensitivity of current nano-scale designs against smallest delay deviations, small delay faults recently became the focus of test research. Because of the subtle delay impact, traditional fault simulation approaches based on abstract timing models are not sufficient for representing small delay faults. Hence, timing accurate simulation approaches have to be utilized, which quickly become inapplicable for larger designs due to high computational requirements. In this work we present a waveform-accurate approach for fast high-throughput small delay fault simulation on Graphics Processing Units (GPUs). By exploiting parallelism from gates, faults and patterns, the proposed approach enables accurate exhaustive small delay fault simulation even for multi-million gate designs without fault dropping for the first time. | ||
BibTeX:
@inproceedings{SchneHKWW2015, author = { Schneider, Eric and Holst, Stefan and Kochte, Michael A. and Wen, Xiaoqing and Wunderlich, Hans-Joachim }, title = {{GPU-Accelerated Small Delay Fault Simulation}}, booktitle = {Proceedings of the ACM/IEEE Conference on Design, Automation and Test in Europe (DATE'15)}, year = {2015}, pages = {1174--1179}, abstract = {The simulation of delay faults is an essential task in design validation and reliability assessment of circuits. Due to the high sensitivity of current nano-scale designs against smallest delay deviations, small delay faults recently became the focus of test research. Because of the subtle delay impact, traditional fault simulation approaches based on abstract timing models are not sufficient for representing small delay faults. Hence, timing accurate simulation approaches have to be utilized, which quickly become inapplicable for larger designs due to high computational requirements. In this work we present a waveform-accurate approach for fast high-throughput small delay fault simulation on Graphics Processing Units (GPUs). By exploiting parallelism from gates, faults and patterns, the proposed approach enables accurate exhaustive small delay fault simulation even for multi-million gate designs without fault dropping for the first time.}, url = { http://dl.acm.org/citation.cfm?id=2757084 }, doi = {http://dx.doi.org/10.7873/DATE.2015.0077}, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2015/DATE_SchneHKWW2015.pdf} } |
||
1. | Data-Parallel Simulation for Fast and Accurate Timing Validation of CMOS Circuits Schneider, Eric; Holst, Stefan; Wen, Xiaoqing; Wunderlich, Hans-Joachim Proceedings of the 33rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD'14), San Jose, California, USA, 3-6 November 2014, pp. 17-23 |
2014 URL PDF |
Abstract: Gate-level timing simulation of combinational CMOS circuits is the foundation of a whole array of important EDA tools such as timing analysis and power-estimation, but the demand for higher simulation accuracy drastically increases the runtime complexity of the algorithms. Data-parallel accelerators such as Graphics Processing Units (GPUs) provide vast amounts of computing performance to tackle this problem, but require careful attention to control-flow and memory access patterns. This paper proposes the novel High-Throughput Oriented Parallel Switch-level Simulator (HiTOPS), which is especially designed to take full advantage of GPUs and provides accurate time- simulation for multi-million gate designs at an unprecedented throughput. HiTOPS models timing at transistor granularity and supports all major timing-related effects found in CMOS including pattern-dependent delay, glitch filtering and transition ramps, while achieving speedups of up to two orders of magnitude compared to traditional gate-level simulators. | ||
BibTeX:
@inproceedings{SchneHWW2014, author = {Schneider, Eric and Holst, Stefan and Wen, Xiaoqing and Wunderlich, Hans-Joachim}, title = {{Data-Parallel Simulation for Fast and Accurate Timing Validation of CMOS Circuits}}, booktitle = {Proceedings of the 33rd IEEE/ACM International Conference on Computer-Aided Design (ICCAD'14)}, year = {2014}, pages = {17--23}, abstract = {Gate-level timing simulation of combinational CMOS circuits is the foundation of a whole array of important EDA tools such as timing analysis and power-estimation, but the demand for higher simulation accuracy drastically increases the runtime complexity of the algorithms. Data-parallel accelerators such as Graphics Processing Units (GPUs) provide vast amounts of computing performance to tackle this problem, but require careful attention to control-flow and memory access patterns. This paper proposes the novel High-Throughput Oriented Parallel Switch-level Simulator (HiTOPS), which is especially designed to take full advantage of GPUs and provides accurate time- simulation for multi-million gate designs at an unprecedented throughput. HiTOPS models timing at transistor granularity and supports all major timing-related effects found in CMOS including pattern-dependent delay, glitch filtering and transition ramps, while achieving speedups of up to two orders of magnitude compared to traditional gate-level simulators.}, url = { http://dl.acm.org/citation.cfm?id=2691369 }, file = {http://www.iti.uni-stuttgart.de/fileadmin/rami/files/publications/2014/ICCAD_SchneHWW2014.pdf} } |
Workshop Contributions
1. | Hochbeschleunigte Simulation von Verzögerungsfehlern unter Prozessvariationen Schneider, Eric; Kochte, Michael A.; Wunderlich, Hans-Joachim 27th GI/GMM/ITG Workshop "Testmethoden und Zuverlässigkeit von Schaltungen und Systemen" (TuZ'15), Bad Urach, Germany, 1-3 March 2015 |
2015 |
Abstract: Die Simulation kleiner Verzögerungsfehler ist ein wichtiger Bestandteil der Validierung nano-elektronischer Schaltungen. Prozessvariationen während der Herstellung haben großen Einfluss auf die Erkennung dieser Fehler und müssen bei der Simulation berücksichtigt werden. Die zeitgenaue Simulation von Verzögerungsfehlern ist verglichen mit traditioneller Logiksimulation oder statischer Zeitanalyse sehr aufwändig und die Rechenkomplexität steigt durch die Berücksichtigung von Variationen zusätzlich an. In dieser Arbeit wird ein hochparalleles Verfahren vorgestellt, welches Grafikprozessoren zur beschleunigten parallelen Simulation kleiner Verzögerungsfehler unter Variation anwendet. Das Verfahren berechnet akkurate Signalverläufe in der Schaltung und ermöglicht die Bestimmung einer Monte-Carlo-basierten statistischen Fehlererfassung für industrielle Schaltkreise unter zufälliger sowie systematischer Variation. | ||
BibTeX:
@inproceedings{SchneKW2015, author = {Schneider, Eric and Kochte, Michael A. and Wunderlich, Hans-Joachim}, title = {{Hochbeschleunigte Simulation von Verzögerungsfehlern unter Prozessvariationen}}, booktitle = {27th GI/GMM/ITG Workshop "Testmethoden und Zuverlässigkeit von Schaltungen und Systemen" (TuZ'15)}, year = {2015}, abstract = {Die Simulation kleiner Verzögerungsfehler ist ein wichtiger Bestandteil der Validierung nano-elektronischer Schaltungen. Prozessvariationen während der Herstellung haben großen Einfluss auf die Erkennung dieser Fehler und müssen bei der Simulation berücksichtigt werden. Die zeitgenaue Simulation von Verzögerungsfehlern ist verglichen mit traditioneller Logiksimulation oder statischer Zeitanalyse sehr aufwändig und die Rechenkomplexität steigt durch die Berücksichtigung von Variationen zusätzlich an. In dieser Arbeit wird ein hochparalleles Verfahren vorgestellt, welches Grafikprozessoren zur beschleunigten parallelen Simulation kleiner Verzögerungsfehler unter Variation anwendet. Das Verfahren berechnet akkurate Signalverläufe in der Schaltung und ermöglicht die Bestimmung einer Monte-Carlo-basierten statistischen Fehlererfassung für industrielle Schaltkreise unter zufälliger sowie systematischer Variation.} } |
Contact
- Prof. Dr. rer. nat. habil. Hans Joachim Wunderlich
Tel.: +49-711-685-88-391
wu@informatik.uni-stuttgart.de
- Dipl.-Inf. Eric Schneider
+49-711-685-88-370
schneiec at iti dot uni-stuttgart dot de
- Dr. rer. nat. Claus Braun
Tel.: +49-711-685-88-407
claus.braun@informatik.uni-stuttgart.de