



# A Self-Supervised, Pre-Trained, and Cross-Stage-Aligned Circuit Encoder Provides a Foundation for Various Design Tasks

**Wenji Fang<sup>1</sup>**, Shang Liu<sup>1</sup>, Hongce Zhang<sup>1,2</sup>, Zhiyao Xie<sup>1</sup> wfang838@connect.ust.hk

<sup>1</sup>Hong Kong University of Science and Technology <sup>2</sup>Hong Kong University of Science and Technology (Guangzhou)

#### **Outline**

- Background
- CircuitEncoder Framework

- Experimental Results
- Conclusion & Future Work



# Background

### **Background: AI for EDA**

#### Remarkable achievements

- Design quality evaluation
  - Power, timing, area, routability, etc.
- Functional reasoning
  - Arithmetic word-level abstraction, SAT, etc.
- Optimization
  - Design space exploration, etc.
- Generation
  - RTL code, verification, etc.



### **Background: AI for EDA**

- Most existing predictive solutions are task-specific
  - Supervised learning
  - Tedious and time-consuming
  - Hard to generalize to other tasks





#### **Background: Foundation Models**

- Al foundation models
  - Pretrain-finetune paradigm
    - Pre-training on large amounts of unlabeled data (self-supervised)
    - Fine-tuning based on task-specific labels (supervised)
  - Applications
    - Natural language processing: GPT, BERT, Llama, etc.
    - **Computer vision**: DALLE, stable-diffusion











**CircuitEncoder Framework** 

#### **Motivation: Towards Circuit Foundation Models**

#### Large circuit model

#### SCIENCE CHINA

Information Sciences



• POSITION PAPER •

October 2024, Vol. 67, Iss. 10, 200402:1–200402:42 https://doi.org/10.1007/s11432-024-4155-7

Special Topic: AI Chips and Systems for Large Language Models

#### Large circuit models: opportunities and challenges<sup>†</sup>

Lei CHEN<sup>5</sup>, Yiqi CHEN<sup>7</sup>, Zhufei CHU<sup>6</sup>, Wenji FANG<sup>3</sup>, Tsung-Yi HO<sup>1</sup>, Ru HUANG<sup>7,11</sup>, Yu HUANG<sup>4</sup>, Sadaf KHAN<sup>1</sup>, Min LI<sup>5</sup>, Xingquan LI<sup>9</sup>, Yu LI<sup>1</sup>, Yun LIANG<sup>7</sup>, Jinwei LIU<sup>1</sup>, Yi LIU<sup>1</sup>, Yibo LIN<sup>7</sup>, Guojie LUO<sup>8\*</sup>, Hongyang PAN<sup>2</sup>, Zhengyuan SHI<sup>1</sup>, Guangyu SUN<sup>7</sup>, Dimitrios TSARAS<sup>5</sup>, Runsheng WANG<sup>7</sup>, Ziyi WANG<sup>1</sup>, Xinming WEI<sup>8</sup>, Zhiyao XIE<sup>3</sup>, Qiang XU<sup>1\*</sup>, Chenhao XUE<sup>7</sup>, Junchi YAN<sup>10</sup>, Jun YANG<sup>11</sup>, Bei YU<sup>1</sup>, Mingxuan YUAN<sup>5\*</sup>, Evangeline F.Y. YOUNG<sup>1</sup>, Xuan ZENG<sup>2</sup>, Haoyi ZHANG<sup>7</sup>, Zuodong ZHANG<sup>7</sup>, Yuxiang ZHAO<sup>7</sup>, Hui-Ling ZHEN<sup>5</sup>, Ziyang ZHENG<sup>1</sup>, Binwu ZHU<sup>1</sup>, Keren ZHU<sup>1</sup> & Sunan ZOU<sup>8</sup>



#### **Motivation: Towards Circuit Foundation Model**

- Our targeted circuit foundation model
  - Capture unique circuit intrinsic property
    - *Cross-stage:* RTL (functional) → netlist (Physical)
    - **Equivalent transformation:** semantic & structure
    - ...
  - Support various types of tasks
    - Functionality: reasoning, verification, etc.
    - Design quality: performance, power, area, etc.
    - ...



# Key Idea: First RTL-Netlist Cross-Stage Alignment

- General circuit foundation model solution
  - Two-phase paradigm
    - Self-supervised pre-trainining
    - Supervised fine-funing





### **Comparison with Existing Solution**

#### Circuit representation learning

Goal: to learn a general circuit embedding for various tasks

#### Explorations

- Supervised: HOGA, Gamora, etc.
- Pre-trained: DeepGate Family, FGNN, SNS v2, etc.

Table 1: Existing two-phase circuit representation learning techniques for ASIC design.

|                 |        |                |          |            |             | _           |         |              | •                    |  |
|-----------------|--------|----------------|----------|------------|-------------|-------------|---------|--------------|----------------------|--|
|                 | Dov    | vnstream       | Tasks    | Pre-Tr     | Design      | 1 Stage     | Cummont | O S          |                      |  |
| Method          | Multi- | Design         | Function | Self-      | Train       | Cross-      | Target  | Support      | Open-Source<br>Model |  |
|                 | Type   | Quality        | runction | Supervised | Task        | Stage Stage |         | Seq. Circuit | Middel               |  |
| Design2Vec [25] |        |                | <b>_</b> |            | Cover Point |             | RTL     | <b></b>      |                      |  |
| SNS v2 [36]     |        | <b>/</b>       |          | <b>/</b>   | Contrastive |             | RTL     | <b>/</b>     |                      |  |
| FGNN [28]       |        |                | <b>✓</b> | <b>/</b>   | Contrastive |             | Netlist |              |                      |  |
| DeepGate [17]   |        |                | <b>/</b> |            | Probability |             | Netlist |              |                      |  |
| DeepGate2 [23]  |        |                | <b>/</b> |            | Truth Table |             | Netlist |              | <b>/</b>             |  |
| DeepSeq [16]    |        | $\checkmark$ * |          |            | Probability |             | Netlist | <b>/</b>     |                      |  |
| CircuitEncoder  | /      |                |          | /          | Multi-Stage |             | RTL     | /            | /                    |  |
| (Ours)          | _      | •              | •        | _          | Contrastive | •           | Netlist | _            | _                    |  |



<sup>\*</sup> DeepSeq predicts netlist gate toggle rate at the node level to estimate power consumption, rather than directly modeling power.

# **Comparison with Existing Solution**

#### Circuit representation learning

- Limitations: still do not provide perfectly general circuit representation
  - Mainly support one type of task (phys. PPA or func.)
  - Only target single stage (RTL or netlist)

Table 1: Existing two-phase circuit representation learning techniques for ASIC design.

|                 | Dov    | vnstream | Tasks    | Pre-Ti     | Design      | 1 Stage | Cummont | O S          |                      |  |  |  |
|-----------------|--------|----------|----------|------------|-------------|---------|---------|--------------|----------------------|--|--|--|
| Method          | Multi- | Design   | Function | Self-      | Train       | Cross-  | Target  | Support      | Open-Source<br>Model |  |  |  |
|                 | Type   | Quality  | runction | Supervised | Task        | Stage   | Stage   | Seq. Circuit | Model                |  |  |  |
| Design2Vec [25] |        |          | <b>~</b> |            | Cover Point |         | RTL     | <b></b>      |                      |  |  |  |
| SNS v2 [36]     |        | <b>✓</b> |          | <b>/</b>   | Contrastive |         | RTL     | <b>/</b>     |                      |  |  |  |
| FGNN [28]       |        |          | <b>✓</b> | <b>/</b>   | Contrastive |         | Netlist |              |                      |  |  |  |
| DeepGate [17]   |        |          | <b>✓</b> |            | Probability |         | Netlist |              |                      |  |  |  |
| DeepGate2 [23]  |        |          | <b>✓</b> |            | Truth Table |         | Netlist |              | <b>~</b>             |  |  |  |
| DeepSeq [16]    |        | *        |          |            | Probability |         | Netlist | <b>/</b>     |                      |  |  |  |
| CircuitEncoder  | /      |          |          | /          | Multi-Stage |         | RTL     | /            | /                    |  |  |  |
| (Ours)          | •      | •        | •        | •          | Contrastive | •       | Netlist | •            | •                    |  |  |  |



<sup>\*</sup> DeepSeq predicts netlist gate toggle rate at the node level to estimate power consumption, rather than directly modeling power.

# **Comparison with Existing Solution**

#### Our CircuitEncoder

- Self-supervised pre-trained: circuit graph function contrastive
- Cross-stage aligned: RTL (func.)—netlist (phys.) alignment
- Support various design tasks: PPA + functionality

Table 1: Existing two-phase circuit representation learning techniques for ASIC design.

|                          | Dov           | nstream    | Tasks    | Pre-Ti     | Design                     | n Stage     | Summont        | O C          |                      |  |
|--------------------------|---------------|------------|----------|------------|----------------------------|-------------|----------------|--------------|----------------------|--|
| Method                   | Multi- Design |            | Function | Self-      | Train                      | Cross-      | Target         | Support      | Open-Source<br>Model |  |
|                          | Type          | Quality    | Function | Supervised | Task                       | Stage Stage |                | Seq. Circuit | Model                |  |
| Design2Vec [25]          |               |            | <b>~</b> |            | Cover Point                |             | RTL            | <b></b>      |                      |  |
| SNS v2 [36]              |               | <b>/</b>   |          | <b>/</b>   | Contrastive                |             | RTL            | <b>/</b>     |                      |  |
| FGNN [28]                |               |            | <b>✓</b> | <b>/</b>   | Contrastive                |             | Netlist        |              |                      |  |
| DeepGate [17]            |               |            | <b>✓</b> |            | Probability                |             | Netlist        |              |                      |  |
| DeepGate2 [23]           |               |            | <b>✓</b> |            | Truth Table                |             | Netlist        |              | <b>✓</b>             |  |
| DeepSeq [16]             |               | <b>*</b> * |          |            | Probability                |             | Netlist        | <b>/</b>     |                      |  |
| CircuitEncoder<br>(Ours) | ~             | <b>/</b>   | <b>✓</b> | ~          | Multi-Stage<br>Contrastive | ~           | RTL<br>Netlist | ~            | ~                    |  |

<sup>\*</sup> DeepSeq predicts netlist gate toggle rate at the node level to estimate power consumption, rather than directly modeling power.



# **Preprocessing: Circuit Design Stages**

- RTL
  - Earlier design stage
  - Higher abstraction level
  - More semantic content
  - Task
    - Predicting later netlist PPA



#### Netlist

- Later design stage
- Lower abstraction level
- More implementation details
- Task
  - Reasoning earlier RTL function





### **Preprocessing: Circuit Data Alignment**

Circuit-to-graph transformation



- RTL-Netlist data alignment via backtrace register cone
  - Advantages
    - RTL-netlist cones are strictly aligned & functionally equivalent
    - Capture the entire state transition of each register
    - Intermediate granularity → better scalability



# **Encoding: Graph Learning Model for Circuits**

RTL graph

RTL-

16

- Graph transformer
- Global positional encoding

- Netlist graph
  - Graph neural network
  - Neighbor aggregation

Net-

Node-level embeddings → Cone graph-level embeddings

RTL-Stage Netlist-Stage RTL+ Net + Cone Level Cone Level Design cone graph Aug<sup>+</sup> Node embedding vectors Node feature **Encoding Encoding** Cone embedding vector RTL Netlist (Graph (GNN) Transformer) **Cross-Stage** Pre-train: Multi-stage contrastive learning **Alignment**  $L_{CL} = L_{rr} + L_{nn} + L_{rn}$ Aug-

Phase 1: Self-Supervised Pre-Training of CircuitEncoder



#### **CircuitEncoder Phase 1: Pre-Training**

- Self-supervised pre-training: intrinsic circuit property
  - 1 Intra-stage contrastive learning within each stage
    - Minimizing embed. distance between positive pairs (equiv. transform.)
    - Maximizing embed. distance among negative pairs (func. diff.)

$$L_{rr} = max(||E_{R} - E_{R^{+}}||_{2} - ||E_{R} - E_{R^{-}}||_{2} + m_{rr}, 0),$$
  

$$L_{nn} = max(||E_{N} - E_{N^{+}}||_{2} - ||E_{N} - E_{N^{-}}||_{2} + m_{nn}, 0),$$

#### Phase 1: Self-Supervised Pre-Training of CircuitEncoder





#### **CircuitEncoder Phase 1: Pre-Training**

- Self-supervised pre-training: intrinsic circuit property
  - 2 Inter-stage contrastive learning across stages
    - Cross-stage alignment between RTL and netlist embed.

$$L_{rn} = max(||E_{R} - E_{N^{+}}||_{2} - ||E_{R} - E_{N^{-}}||_{2} + m_{rn}, 0) + max(||E_{N} - E_{R^{+}}||_{2} - ||E_{N} - E_{R^{-}}||_{2} + m_{rn}, 0).$$

$$L_{CL} = \alpha_{rr}L_{rr} + \alpha_{nn}L_{nn} + \alpha_{rn}L_{rn},$$

Phase 1: Self-Supervised Pre-Training of CircuitEncoder





# **CircuitEncoder Phase 2: Fine-Tuning for Tasks**

- Supervised fine-tuning
  - Lightweight task models: MLP, tree-based, etc.

Phase 2: Fine-Tune for Applications





### **CircuitEncoder Phase 2: Fine-Tuning for Tasks**

- Downstream tasks
  - Register cone-level:
    - [PPA] Timing slack prediction RTL
    - [Func] Register function (control/data) identification netlist stage
  - Design-level:
    - [PPA ] Overall PPA prediction RTL
      - WNS
      - TNS
      - Area

Phase 2: Fine-Tune for Applications





**Experimental Results** 

# **Circuit Design Statistics**

- 41 open-source designs
- 7,166 RTL and netlist cone pairs
- Data augmentation → 42,996 graphs in total

Table 2: Benchmark design information.

| Source        | #      | Design Size   | Original            |           |  |
|---------------|--------|---------------|---------------------|-----------|--|
| Benchmarks    | Design | #K Gates      | # Cones             | HDL Type  |  |
| ITC [6]       | 7      | {7, 15, 22}   | {12, 21, 31}        | VHDL      |  |
| OpenCores [1] | 5      | {2, 40, 59}   | $\{12, 96, 173\}$   | Verilog   |  |
| Vex [26]      | 17     | {8, 208, 591} | {39, 168, 694}      | SpinalHDL |  |
| Chipyard [2]  | 12     | {11, 49, 194} | $\{28, 461, 2730\}$ | Chisel    |  |



#### **Experimental Setup**

- Industrial-standard VLSI design flow
  - RTL designs are synthesized using DC / NanGate 45nm
  - Design PPA metrics are obtained from PT
- Circuit augmentation
  - Yosys / ABC for functionally equivalent transformation
- Graph model
  - RTL: Graphormer (graph transformer)
  - Netlist: GraphSage (GNN)



# **Experimental Setup**

#### Model training

Table 4: ML model and training hyperparameters.

| <b>Training Phase</b> | Pre-Tra             | aining                 | Fine-Tuning |      |            |  |  |  |  |  |  |  |  |
|-----------------------|---------------------|------------------------|-------------|------|------------|--|--|--|--|--|--|--|--|
| ML Model              | Graphormer<br>(RTL) | GraphSage<br>(Netlist) | MLP         | GCN  | XGBoost    |  |  |  |  |  |  |  |  |
| # Layers              | 7                   | 3                      | 2           | 2    |            |  |  |  |  |  |  |  |  |
| Hidden Dim            | 256                 | 256                    | 128         | 128  |            |  |  |  |  |  |  |  |  |
| Activation            | GELU                | ReLU                   | ReLU        | ReLU | 100        |  |  |  |  |  |  |  |  |
| <b>Batch Size</b>     | 12                  | 8                      | 3           | 2    | estimator, |  |  |  |  |  |  |  |  |
| Optimizer             | Adar                | nW                     | Ad          | am   | 20         |  |  |  |  |  |  |  |  |
| LR                    | 0.00                | 01                     | 0.0         | 001  | max depth  |  |  |  |  |  |  |  |  |
| <b>Dataset Size</b>   | 331                 | 32                     | 78          |      |            |  |  |  |  |  |  |  |  |
| # Epochs              | 75                  | 5                      | 1000        |      |            |  |  |  |  |  |  |  |  |
| <b>Training Time</b>  | 20                  | h                      | 0.05h       |      |            |  |  |  |  |  |  |  |  |



# **Task Evaluation and Supervised Baseline Methods**

- Design quality evaluation regression metrics
  - Register slack prediction at cone level
    - RTL-Timer [DAC'24]
  - RTL-stage overall quality evaluation at circuit level
    - MasterRTL [ICCAD'23]
    - SNS v2 [MICRO'23]
- Functional reasoning classification metrics
  - Netlist-stage state register classification at cone level
    - RelGNN [ICCAD'21]



# **Results: Comparison with SOTA Solutions**

- Outperforming each task-specific SOTA solution
  - Cone-level tasks
  - Few-shot learning during fine-tuning
    - 50% data for CircuitEncoder > 100% data for supervised baselines

Table 3: Accuracy comparison for the cone-level tasks for RTL and netlist designs.

|            |                                    | RTL-Stage (Register Slack Prediction) |      |      |      |                                          |         |      |                               |      |        | Netlist-Stage (State Register Identification) |        |                                          |        |      |        |      |       |      |
|------------|------------------------------------|---------------------------------------|------|------|------|------------------------------------------|---------|------|-------------------------------|------|--------|-----------------------------------------------|--------|------------------------------------------|--------|------|--------|------|-------|------|
| Method     | RTL-Timer<br>(supervised learning) |                                       |      |      |      | CircuitEncoder<br>(pre-train + few-shot) |         |      | ReIGNN* (supervised learning) |      |        |                                               |        | CircuitEncoder<br>(pre-train + few-shot) |        |      |        |      |       |      |
| % of train | 1                                  | 13%                                   |      | 50%  | 100% |                                          | 13% 50% |      | 13% 50%                       |      | 100%   |                                               | 13%    |                                          | 50%    |      |        |      |       |      |
| Test       | R                                  | MAPE                                  | R    | MAPE | R    | MAPE                                     | R       | MAPE | R                             | MAPE | Sens.  | Acc.                                          | Sens.  | Acc.                                     | Sens.  | Acc. | Sens.  | Acc. | Sens. | Acc. |
| Designs    | K                                  | MALE                                  | I.   | MALE | I.   | MALE                                     | К       | MALE | K                             | MATE | Sells. | Acc.                                          | Sells. | Acc.                                     | Sells. | Acc. | Sells. | Acc. | sens. | Acc. |
| ITC1       | 0.48                               | 22%                                   | 0.77 | 20%  | 0.82 | 18%                                      | 0.91    | 21%  | 0.96                          | 9%   | 0%     | 72%                                           | 50%    | 72%                                      | 50%    | 72%  | 100%   | 98%  | 100%  | 98%  |
| ITC2       | 0.43                               | 26%                                   | 0.83 | 12%  | 0.88 | 10%                                      | 0.92    | 19%  | 0.96                          | 9%   | 0%     | 92%                                           | 100%   | 92%                                      | 100%   | 92%  | 100%   | 100% | 100%  | 100% |
| Chipyard1  | 0.57                               | 30%                                   | 0.89 | 12%  | 0.92 | 18%                                      | 0.81    | 15%  | 0.83                          | 18%  | 0%     | 50%                                           | 0%     | 50%                                      | 30%    | 65%  | 78%    | 77%  | 79%   | 79%  |
| Chipyard2  | 0.56                               | 31%                                   | 0.85 | 19%  | 0.88 | 12%                                      | 0.84    | 12%  | 0.85                          | 13%  | 0%     | 50%                                           | 0%     | 50%                                      | 30%    | 65%  | 84%    | 78%  | 89%   | 85%  |
| Vex1       | 0.28                               | 27%                                   | 0.65 | 15%  | 0.87 | 24%                                      | 0.69    | 25%  | 0.88                          | 26%  | 0%     | 50%                                           | 0%     | 50%                                      | 50%    | 74%  | 76%    | 79%  | 82%   | 72%  |
| Vex2       | 0.73                               | 29%                                   | 0.93 | 17%  | 0.86 | 16%                                      | 0.85    | 13%  | 0.87                          | 13%  | 15%    | 57%                                           | 21%    | 57%                                      | 32%    | 60%  | 73%    | 76%  | 79%   | 78%  |
| Vex3       | 0.27                               | 36%                                   | 0.56 | 40%  | 0.84 | 16%                                      | 0.81    | 14%  | 0.89                          | 12%  | 16%    | 48%                                           | 0%     | 48%                                      | 50%    | 72%  | 81%    | 82%  | 85%   | 84%  |
| Vex4       | 0.12                               | 40%                                   | 0.76 | 18%  | 0.87 | 12%                                      | 0.83    | 16%  | 0.86                          | 14%  | 30%    | 63%                                           | 33%    | 63%                                      | 33%    | 63%  | 88%    | 79%  | 90%   | 81%  |
| Avg.       | 0.43                               | 30%                                   | 0.78 | 19%  | 0.87 | 16%                                      | 0.83    | 17%  | 0.89                          | 14%  | 8%     | 60%                                           | 26%    | 60%                                      | 47%    | 70%  | 85%    | 84%  | 88%   | 85%  |



### **Results: Comparison with SOTA Solutions**

- Outperforming each task-specific SOTA solution
  - Circuit-level tasks





# **Results: Comparison with SOTA Solutions**

- Fine-tuning data size scaling
  - $100\% \rightarrow 50\% \rightarrow 25\% \rightarrow 12\%$
  - Pre-trained CircuitEncoder remains stable





(b) Netlist-Stage Functional Identification



# **Ablation Study**

- Impact of cross-stage alignment
- Impact of graph transformer





**Conclusion & Future Work** 

#### Conclusion

- CircuitEncoder
  - Self-supervised & pre-trained
    - Graph function contrastive learning
  - Cross-stage alignment
    - RTL function netlist physics
  - Support various tasks
    - Design quality: slack, WNS, TNS, area prediction
    - Functional reasoning: state register identification



#### **Future Work**

- Advancing circuit foundation model
  - Contrastive learning → Circuit-specific self-supervised learning
  - Graph modality 

    Multimodality for each design stage
    - RTL: AST/control-data flow graph, Verilog code, specification text
    - Netlist: connectivity graph, annotated node text
    - Layout: image, netlist graph
  - Existing encoders and decoders work separately
    - Encoder (graph) predictive task
    - Decoder (text, LLM) generative task
    - Unified encoder-decoder?





# Thank You! Questions?