Motion reconstruction
Thought this closely related
Pipeline
use off-the-shelf, per-frame regressors and/or per-frame optimization to obtain initial SMPL-X estimates for each frame and masks obtained by randomly masking joints at training time and computing joint visibility at test time
$$ \text{Get motion Sequence} \tilde{X} \in R^{N\times d}\\ X = (R,P)\\ \text{Root-Traj: } \bold{R} \in R^{N\times d_R}\\ \text{Local Body-Features: } \bold{P} \in R^{N\times d_P}\\ \text{Root Joint visibility masks } M_R \in \{0,1\}^{N ×d_R}\\ \text{local joint visibility masks } M_P \in\{0,1\}^{N ×d_P}\\
$$
TrajNet
PoseNet
TrajControl, an auxiliary module for fine-tuning TrajNet with additional control signal from local body pose
Haven’t looked thoroughly but think just fine-tuning technique
Long Sequences Generation
Two-Person Generation
Fine-Tuned Motion Control
b. 感觉改进主要是只对部分加噪
Pipeline
2D pose is often estimated from the image with an off-the-shelf 2D pose detector
Initializing the indeterminate 3D pose distribution $H_k$ based on extracted heatmaps, which capture the underlying uncertainty of the input 2D pose in 3D space
和别的work的不同点我觉得可能是利用2Dpose等直接得到相当于“加噪”后的motion作为k‘s step,别的会在initial上加gaussian noise再去denoise
Performing the reverse diffusion process, where we use a diffusion model g to progressively denoise the initial distribution $H_k$ to a desired high-quality determinate distribution $H_k$, and then we can sample $h_0 ∈ R^{3×J}$ from the pose distribution $H_0$ to synthesize the final 3D pose $h_s$