Cryo-EM Flexible Fitting¶

Flexible fitting refines a single atomic model against a consensus cryo-EM density map using steered MD. Unlike ensemble optimization, there are no particle images or ensemble weights — the loss is a real-space cross-correlation between the projected model density and the reference volume.

You can run flexible fitting through the command line:

run_flexible_fitting --config config_flexible_fitting.yaml

Input Format¶

Flexible fitting uses a custom YAML input format. An example config file for running on a GPU is given below:

# Parameters marked with (*) are optional

path_to_atomic_model: /path/to/atomic_model.pdb         # initial structure (.pdb or .cif)
path_to_prealigned_atomic_model: /path/to/prealigned_atomic_model.pdb  # reference structure aligned to the map
path_to_output: ./flexible_fitting_output/

atom_selection: "not element H"  # (*) Default: "all". mdtraj selection string, or a .txt/.npy file of atom indices
loads_b_factors: false           # (*) Default: false. Load Debye-Waller b-factors from the PDB

reference_volume_params:
  path_to_reference_volume: /path/to/reference_volume.mrc  # consensus map (.mrc)
  flexible_fitting_box_size: 128  # (*) Default: 128. Box size (in pixels) used for the fitting loss
  rigid_alignment_box_size: 32    # (*) Default: 32.  Box size (in pixels) used for rigid-body alignment
  reference_volume_voxel_size:    # (*) Default: null. Overrides the voxel size in the MRC header

projector_params:
  n_steps: 1000                         # number of MD steps per iteration
  bias_constant_in_kjpermol: 3.0e5      # biasing force constant (kJ/mol); or [start, end] for a linear schedule
  platform: CUDA                        # CPU, CUDA, or OpenCL
  platform_properties:                  # (*) platform-specific OpenMM settings
    DeviceIndex: '0'
# platform: CPU                         # CPU version
# platform_properties:
#   Threads: '32'                       # number of CPU threads (OpenMM does not always respect this)
  path_to_initial_state:                # (*) Default: null. Path to an OpenMM state XML for warm-starting
  md_params:                            # (*) Override default OpenMM simulation parameters. All fields are optional.
    forcefield: amber14-all.xml           # (*) Default: amber14-all.xml
    water_model: amber14/tip3p.xml        # (*) Default: amber14/tip3p.xml
    nonbonded_method: CutoffNonPeriodic   # (*) Options: PME, CutoffNonPeriodic, NoCutoff, CutoffPeriodic, Ewald, LJPME
    nonbonded_cutoff_nm: 1.0              # (*) Default: 1.0 nm
    constraints: HBonds                   # (*) Options: HBonds, AllBonds, HAngles, null (no constraints)
    temperature_K: 300.0                  # (*) Default: 300 K
    friction_per_ps: 1.0                  # (*) Default: 1.0 ps^-1
    timestep_ps: 0.002                    # (*) Default: 0.002 ps (2 fs)

walker_optimizer_params:
  n_steps: 10           # (*) Default: 1. Number of gradient steps per iteration
  step_size: 2.0        # step size in Angstroms
  batch_size_for_z_planes: 1  # (*) Default: 1. Z-planes evaluated in parallel with jax.vmap
  n_batches_of_atoms: 1       # (*) Default: 1. Atom batches for memory-friendly volume evaluation

n_steps: 100  # total number of flexible fitting iterations

Comments on the input parameters¶

Alignment. The path_to_prealigned_atomic_model must be aligned to the frame of reference of the consensus map. This reference structure is used only for rigid-body re-alignment at each iteration; the structure being optimized is path_to_atomic_model. Providing a path_to_reference_volume enables this alignment and is strongly recommended. The rigid_alignment_box_size controls the resolution of the alignment step — a value of 32 typically adds about 1 second per iteration.

Flexible fitting box size. The flexible_fitting_box_size sets the resolution at which the cross-correlation loss is computed. Larger boxes capture more detail but require more GPU memory. For memory-constrained settings, increase n_batches_of_atoms to spread atom contributions across sequential evaluations, and increase batch_size_for_z_planes to parallelize over z-planes with jax.vmap.

Bias constant. The bias_constant_in_kjpermol controls how strongly the MD simulation is steered toward the gradient direction. A single float applies a constant force; a two-element list [start, end] applies a linear schedule from start to end over all n_steps iterations. This parameter typically requires tuning for each system. Values that are too large cause the simulation to explode; values that are too small produce no structural change.

Warm-starting. The path_to_initial_state accepts an OpenMM state XML file (e.g., saved from a previous run or a pre-equilibrated simulation) and can improve convergence by starting from a well-equilibrated configuration.

MD simulation parameters. The optional md_params block lets you override any of the underlying OpenMM simulation parameters directly from the YAML config. All fields are optional and fall back to built-in defaults when omitted. The nonbonded_method field accepts string aliases: CutoffNonPeriodic (default, for implicit/vacuum simulations), PME or Ewald (explicit solvent with a periodic box), CutoffPeriodic (periodic box with a simple cutoff), LJPME (PME for both electrostatics and Lennard-Jones), and NoCutoff (no cutoff; very small systems only). The constraints field accepts HBonds (default, compatible with a 2 fs timestep), AllBonds, HAngles (allows ~4 fs timesteps), or null to disable constraints. Numeric fields use explicit unit suffixes: _nm for nanometers, _K for Kelvin, _per_ps for ps⁻¹, and _ps for picoseconds. The platform and platform_properties fields are set at the top level of projector_params and do not belong inside md_params.

Outputs¶

final_walker.npy: final refined atomic positions as a NumPy array.
states_proj/state_*: OpenMM state files saved at each iteration. Useful for restarting or inspecting the MD trajectory.
A log file named by the date the run was started.
A copy of the input config YAML written to the output directory.