PIMMS is still in development, but in general the master branch on this repository can be considered (mostly) stable.
As of version 0.1.37 (March 2024), we consider this to be the 'gold' release with respect to features, and further updates before we increment to 0.2.0 will include only bug fixes, additional tests, and performance improvements
The current version (0.1.4)
As you use this version of PIMMS, please report any/all issues where things:
- Are wrong/don't make sense
- Don't behave in a way you expect
- Errors/exceptions when you wouldn't expect them
Please log any issues in the issue tracker.
There shouldn't be any bugs in this version... but there could be. So we're working on it! This includes developing and deploying a large unit test suite, which takes time, but we're working hard!
PIMMS is a lattice-based simulation engine that allows both 2D and 3D simulations. Useful features include:
- Easy to use! Upon installation, a command-line executable (
PIMMS) is available, should be in your$PATHvariable, and can be used to run simulations. No messing around; it (should) just work! - Easy to define interaction parameters through a simple parameter file (example included in
/demo_keyfiles/demo_1/params.prm) - Easily run fast 2D or 3D lattice-based simulations
- Run simulations with many distinct components
- Run simulations of a single homo or heteropolymer
- Run simulations of many copies of polymers to explore phase behavior
- Drive interactions over three distinct length scales
- Various other things
PIMMS is written almost fully in Python (=>3.7, 3.12 recommended), with the most computationally intensive parts written in fully optimized Cython that compiles down to native C.
With most of the complex behavior implemented in Python, maintenance and development are fast and efficient. However, certain functionality (i.e., analysis of large systems) is, as a result, disproportionately expensive vs. the actual simulation, so you may wish to alter the frequency at which certain analysis routines are performed based on your interests.
PIMMS is a relatively large codebase of ~20K lines of (mostly) Python code. As mentioned, it is under active development, including streamlining and optimization. Several features currently built into PIMMS are not documented here, either because they are not quite ready or are still in development. Again, we're working on finalizing all this up.
Alex Holehouse developed an initial version of PIMMS during his time in the Pappu lab. Since starting his own lab, much of PIMMS has been rewritten, and Dr. Ryan Emenecker has joined as a core developer. PIMMS is maintained exclusively by the Holehouse lab.
Why yes, it has, thank you for asking! Please check out:
Alston, J. J. & Soranno, A. Condensation goes viral: a polymer physics perspective. J. Mol. Biol. 167988 (2023).
Soranno, A., Incicco, J. J., De Bona, P., Tomko, E. J., Galburt, E. A., Holehouse, A. S. & Galletto, R. Shelterin Components Modulate Nucleic Acids Condensation and Phase Separation in the Context of Telomeric DNA. J. Mol. Biol. 434, 167685 (2022).
Sankaranarayanan, M., Emenecker, R. J., Wilby, E. L., Jahnel, M., Trussina, I. R. E. A., Wayland, M., Alberti, S., Holehouse, A. S. & Weil, T. T. Adaptable P body physical states differentially regulate bicoid mRNA storage during early Drosophila development. Dev. Cell 56, 2886–2901.e6 (2021).
Moses, D., Yu, F., Ginell, G. M., Shamoon, N. M., Koenig, P. S., Holehouse, A. S. & Sukenik, S. Revealing the Hidden Sensitivity of Intrinsically Disordered Proteins to their Chemical Environment. J. Phys. Chem. Lett. 11, 10131–10136 (2020).
Holehouse, A. S., Ginell, G. M., Griffith, D. & Böke, E. Clustering of Aromatic Residues in Prion-like Domains Can Tune the Formation, State, and Organization of Biomolecular Condensates. Biochemistry 60, 3566–3581 (2021).
Martin, E. W.*, Holehouse, A. S.*, Peran, I.*, Farag, M., Incicco, J. J., Bremer, A., Grace, C. R., Soranno, A., Pappu, R. V. & Mittag, T. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science 367, 694–699 (2020).
Boeynaems, S., Holehouse, A. S., Weinhardt, V., Kovacs, D., Van Lindt, J., Larabell, C., Van Den Bosch, L., Das, R., Tompa, P. S., Pappu, R. V. & Gitler, A. D. Spontaneous driving forces give rise to protein-RNA condensates with coexisting phases and complex material properties. Proc. Natl. Acad. Sci. U. S. A. 116, 7889–7898 (2019).
Martin, E. W.*, Holehouse, A. S.*, Grace, C. R., Hughes, A., Pappu, R. V. & Mittag, T. Sequence Determinants of the Conformational Properties of an Intrinsically Disordered Protein Prior to and upon Multisite Phosphorylation. J. Am. Chem. Soc. 138, 15323–15335 (2016).
No. We are actively working on a full Sphinx-based readthedocs documentation suite for PIMMS, but for now, this readme file serves as the core PIMMS documentation.
NB: Installation assumes you have set up a correct conda or uv environment with Python 3.7 or higher (any 3.7 is fine). Using 3.7 or higher is important as there are some language features in 3.7 that we use that were not in earlier versions that PIMMS requires. We recommend using Python 3.10+ (tests and standard development environment is 3.12 for us).
If conda and pip are new to you, there is a lot of documentation on this online, and I'd suggest taking a look at this page here as a first step. Assuming conda is set up, installation should be easy!
Assuming conda is installed and your in the relevant environment, the first thing to do is ensure the channel conda-forge is available.
Specifically run
conda config --add channels conda-forge
We then recommend creating a clean environment with Python 3.12
conda create -n pimms python=3.12 -y
conda activate pimms
Note you're welcome to install PIMMS into an existing environment if you want, but we strongly recommend doing this FIRST to ensure things actually install correctly into a vanilla and empty environment. Having a dedicated environment also avoids any dependency clashes and is generally a safer bet.
Assuming this works correctly, next install some standard packages:
pip install numpy scipy cython pandas versioningit
And assuming these work, install mdtraj:
# Then install mdtraj, which provides the xtc library backend - this is
# what lets us write VMD-compatible trajectories
pip install mdtrajThis should all work out of the box without issue. At this stage if anything goes wrong it's outside of my hands (although I'm happy to offer advice).
Assuming the packages above are installed correctly, the next step is to actually install PIMMS.
One way this can be done is by installing directly from GitHub:
pip install --no-build-isolation git+https://github.com/holehouse-lab/PIMMS.gitNB you need the --no-build-isolation or pip will try and build the Wheel in an isolated environment that lacks Cython.
Alternatively, you can download the source and install directly from source:
git clone https://github.com/holehouse-lab/PIMMS.gitAlternatively, to install from source, download the codebase and navigate to the directory where the pyproject.toml file is and run
pip install -e . --upgrade --force-reinstall
If all seems to have gone off without a hitch, open a new terminal, start up the conda environment you just installed PIMMS in, and run (from any directory) the command:
PIMMS --version
If it worked, you should see:
version <current version number>
NB: the VERY first time you do this may take 5-10 seconds while the internal Python environment initializes, but after that it should be basically instantaneous.
If you have any issues during installtion, please raise an issue on here and we'll try and fix it!
This installation has been tested and works on both Linux and macOS. If someone has a Windows machine and wants to test this out they are more the welcome, but, PIMMS has (AFAIK) never been run on Windows so I would anticipate things not working well out of the box...
PIMMS uses versioningit for version resolution (instead of versioneer).
- Build/version configuration lives in
pyproject.toml:- build backend:
setuptools.build_meta - version provider:
versioningit
- build backend:
- Runtime version reporting (
pimms.__version__) comes from installed package metadata viaimportlib.metadata. - If version metadata cannot be resolved (for example, source-tree import before install), PIMMS falls back to
0+unknown.
Release/version workflow:
- Create an annotated git tag for the release (for example
0.1.39). - Build/install from that tagged commit.
- Verify reported version:
PIMMS --version- or
python -c "import pimms; print(pimms.__version__)"
Development builds from non-tagged commits will include local version metadata (for example commit distance/hash) as generated by versioningit.
PIMMS simulations require two files
- A keyfile, which defines the components of the simulation and all aspects of that simulation.
- A parameter file which defines the interactions between distinct components. The keyfile also defines the location of the parameter file.
Simulations are run as follows:
PIMMS -k <keyfile.kf>
For convenience some correctly formatted and annotated keyfiles and parameter files are available in the /demo directory. We recommend that you use these as a starting point for your own. These files have been annotated, but for completeness a more expansive description of the keywords is provided below.
Keyfiles define everything about the system and simulation, including
- What polymers are present, their sequence, and how many of them are there
- How long the simulation should run for and how big the simulation box is
- The frequency of different analysis and output
- The move-set
Below we outline the keywords you may wish to change. Note that there are additional keywords that control some advanced functionality, but we're still finalizing that behaviour.
| Keyword | Format (type) | Description |
|---|---|---|
| DIMENSIONS | INT (2 or 3) A x Bor A x B x C |
Size of the simulation box (in lattice units). 2D or 3D (defines if the simulation is a 2D or 3D simulation) |
| RESIZED_EQUILIBRATION [OPTIONAL] | INT (2 or 3) A x Bor A x B x C |
Defines alternative simulation dimensions to be used during equilibration; e.g., if you wish to equilibrate chains at a higher concentration to facilitate single condensate forming. MUST be smaller than the dimensions defined by the DIMENSIONS keyword. |
| CHAIN | See description | One of the few multi component keywords in PIMMS and the only keyword that can appear multiple times, the CHAIN keyword defines a specific polymer chain and the number of that chain that will exist in the simulation. The format should be CHAIN : N {CHAIN IDENTIY} Where N defines the number of the chain and {CHAIN IDENTITY} gives polymer sequence in one-letter alphabet code. As an example CHAIN : 20 QQQQQQQQQQWould give 20 poly-glutamine polymers. In later versions of PIMMS we will be updating this to allow the reading of keyfiles that use three-letter codes |
| TEMPERATURE | FLOAT | Simulation temperature to be used (units are arbitrary and depend on the units of the parameter file) |
| N_STEPS | INT | Number of main chain Monte Carlo steps |
| PARAMETER_FILE | STRING | Relative or absolute path of the parameter file |
| EQUILIBRATION | INT | Number of steps to be run as equilibration (i.e. before any analysis or trajectory output is generated) |
| HARDWALL [OPTIONAL] | BOOL | Boolean flag set to True or False that defines if a hardwall boundary is used or not. By default periodic boundary conditions (PBC) are used, but if hardwall is set to true the edges of the simulation box are reflective with an infinitely repulsive potential. Default = False |
| PRINT_FREQ | INT | Step frequency at which the system status is printed to STDOUT |
| XTC_FREQ | INT | Step frequency at which the system system configuration is written to XTC file |
| EN_FREQ | INT | Step frequency at which the system energy is written to ENERGY.dat |
| SEED [OPTIONAL] | INT | Random seed to allow reproducible runs |
| ENERGY_CHECK | INT | Step frequency at which global energy is recalculated and compared to the current current as determined locally on each step. All energy calculations are exact, so this is primarily for sanity checking. For large systems this can be an expensive operation. |
| NON_INTERACTING [OPTIONAL] | BOOL | Boolean that defines if the Hamiltonian is non-interacting or not. This is a convenient way to generate "EV" ensembles for the same system configuration. Default = False |
| ANGLES_OFF | BOOL | Boolean flag set to True or False that defines if angle potentials are to be used or not. If set to False (or not set), angles from the parameter file will be used. If set to True, angles are ignored and parameter files do not need to define angles. Default = False |
| EXPERIMENTAL_FEATURES [OPTIONAL] | BOOL | Boolean flag set to True or False that defines if experimental features are allowed. Strongly recommend if your name is not Ryan or Alex to keep this at False, and even then, check if your last name is Holehouse or Emenenecker because IF NOT you should still keep it False, probably. Default = False |
| CASE_INSENSITIVE_CHAINS [OPTIONAL] | BOOL | Boolean flag which, if set to False, means that chain sequence is case sensitive. By default, this is True, which means upon reading a keyfile chains are converted to upper case. However, sometimes you may wish for more unique beads, in which case a lower-case chain can be useful. Default = True |
| AUTOCENTER [OPTIONAL] | BOOL | Only relevant for single-chain simulations, but this flag, if set to True, ensures every frame of the resulting trajectory is centered in the middle of the box. This is especially useful if you want to avoid the need to align a trajectory after for analysis or visualization. Default = False |
| LATTICE_TO_ANGSTROMS [OPTIONAL] | FLOAT | Defines the conversion for lattice units to Angstroms for the output trajectory file that's generated. NB This ONLY influences the XTC trajectory, not any of the internal analysis which is always returned in lattice units. Default = 3.65 |
MOVE_ keywords define the frequency with which different moves are performed during the simulation. The values associated with these keywords must sum up to 1.0, and all must be defined.
| Keyword | Format (type) | Description |
|---|---|---|
| MOVE_CRANKSHAFT | FLOAT | Crankshaft moves drive local chain perturbations and are coded in optimized C (and so very fast). In general, a large fraction of your simulation moveset should be these moves. |
| CRANKSHAFT_SUBSTEPS | INT | Defines a multiplier for the number of substeps performed. So each time a crankshaft move is selected, the underlying code performs CRANKSHAFT_SUBSTEPS multiplied by some scaling factors (defined by CRANKSHAFT_MODE) worth of moves for each bead in the system. In this way a single crankshaft move can actually encompass millions of individual MC moves! |
| CRANKSHAFT_MODE | KEYWORD | [PROPORTIONAL] defines how chain length influences the multiplier for the crankshaft moves. Use PROPORTIONAL. |
| MOVE_CHAIN_TRANSLATE | FLOAT | Single chain rigid body translation |
| MOVE_CHAIN_ROTATE | FLOAT | Single chain rigid body rotation |
| MOVE_CHAIN_PIVOT | FLOAT | Chain pivot at a random potion |
| MOVE_CLUSTER_TRANSLATE | FLOAT | Translate a randomly selected contiguous cluster of chains |
| MOVE_CLUSTER_ROTATE | FLOAT | Rotate a randomly selected contiguous cluster of chains |
The following block of keywords defines various options that control quench-based simulations. In quench simulations, the simulation starts at a temperature defined by QUENCH_START and progressively decreases (or increases) to QUENCH_END. This is particularly useful to achieve convergence of complex systems, and is simply an annealing simulation.
| Keyword | Format (type) | Description |
|---|---|---|
| QUENCH_RUN | BOOL | Boolean (true or false) that defines if a quench run will be used. The 'quench' part of a question run always happens first in the simulation, although quenches can be low to high or high to low. Default = False |
| QUENCH_FREQ | INT | Frequency at which the temperature is updated. |
| QUENCH_STEPSIZE | FLOAT | Step (in temperature) that is taken each time the temperature is updated |
| QUENCH_START | FLOAT | Starting temperature |
| QUENCH_END | FLOAT | Ending temperature |
| QUENCH_AS_EQUILIBRATION | BOOL | Boolean (true or false) that defines if the quench is treated as an equilibration period. |
The following block of keywords defines various options that control on-the-fly analysis done in PIMMS:
| Keyword | Format (type) | Description |
|---|---|---|
| ANALYSIS_FREQ | INT | Step frequency at which all default analysis is performed. This sets the default for all other types of analysis unless explicitly defined. |
| ANA_POL | INT | Step frequency at which polymeric analysis is performed. |
| ANA_INTSCAL | INT | Step frequency at which internal scaling analysis is performed. |
| ANA_DISTMAP | INT | Step frequency at which distance-map analysis is performed. |
| ANA_ACCEPTANCE | INT | Step frequency at which acceptance information is printed out |
| ANA_INTER_RESIDUE | INT | Step frequency at which inter-residue interaction analysis is performed |
| ANA_CLUSTER | INT | Step frequency at which cluster-based analysis is performed |
| ANA_RESIDUE_PAIRS | INT INT | Defines pairs of residues (i.e. "1 5") which are analyzed for inter-residue distance. |
Keywords for changing how PIMMS saves your trajectory file.
| Keyword | Format (type) | Description |
|---|---|---|
| SAVE_AT_END | BOOL | Boolean (true or false) that determines whether PIMMS saves your .xtc file at the end of the simulation or saves at each 'save step'. The default is False. If set to true, this will mean the simulation has more sustained RAM usage but may increase simulation performance substantially depending on hardware configuration and setup. It's worth saying the RAM footprint here is not huge, but the bigger issue is if your job crashes/fails, you would lose all trajectory information. |
| SAVE_EQ | BOOL | Boolean (true or false) that determines whether PIMMS saves trajectory frames for the equilibration steps of a simulation. The default is True. If set to False, PIMMS begins to save your trajectory frames after the equilibration steps have completed. |
PIMMS supports running simulations from restart files. This can be useful for a few reasons:
-
Your simulation crashed and you want to pick up where you left off...
-
You ran a simulation but want to collect more data for a specific trajectory.
-
You want to see how changing simulation parameters alters a conclusion given a specific starting state (e.g. temperature, moveset, actual forcefield parameters etc. etc.).
Additionally, PIMMS provides the EXTRA_CHAIN keyword, which lets you take a restart file and then start a new simulation using the chains defined in the restart file AND place new additional chains in the simulation box.
There are a few key restart keywords to be aware of
| Keyword | Format (type) | Description |
|---|---|---|
| RESTART_FREQ | INT | When running a simulation, this defines the frequency with which PIMMS writes a restart file (called restart.pimms out to disk. Note that regardless of what this value is, a final restart file is generated at the end of every simulation, just in case... |
| RESTART_FILE | STRING | Defines an absolute or relative path for a PIMMS-generated restart file. If this is provided, the simulation will override the provided DIMENSIONS and HARDWALL keywords and read these from the restart file. HOWEVER, these can be over-ridden using RESTART_OVERRIDE_DIMENSIONS and RESTART_OVERRIDE_HARDWALL |
| RESTART_OVERRIDE_HARDWALL | BOOL | Flag which if set (to either TRUE or FALSE) will over-ride the HARDWALL mode defined in the restart file. NB: A simulation can have been run initially with HARDWALL:TRUE and then restarted as HARDWALL:FALSE but the converse is not possible (see FAQ). |
| RESTART_OVERRIDE_DIMENSIONS | INT (2 or 3) A x Bor A x B x C |
Enables you to change the dimensions of the box on restart. NB: To change box dimensions the original simulation must have been run with HARDWALL:TRUE and the new box dimensions can only be bigger, not smaller (see FAQ). |
| EXTRA_CHAINS | see description | The EXTRA_CHAIN keyword follows the same syntax as the CHAINS keyword and lets you add additional chains into a restart file that did not exist before. This is particularly useful if you want to ask how a system at equilibrium changes upon the addition of some chains. From a chain ID perspective (relevant if you want to freeze chains), extra chains are added after the restart file chains. NOTE that if no RESTART\_FILE is provided, providing an EXTRA_CHAIN keyword will trigger an error to avoid a situation where the keyfile parsing silently ignores EXTRA\_CHAINS in the keyfile. Note also that EXTRA_CHAIN requires that EXPERIMENTAL_FEATURES be set to True. |
Because restarting simulation is actually pretty powerful, we provide some specific hints/suggestions as to things you can/cannot do...
My original simulation was in a 30x30x30 box - can I restart in a 50x50x50 box?
Yes, assuming the initial simulation had HARDWALL:TRUE set. This is because to re-start using a different box size we want to avoid needing to re-position any chains, and if HARDWALL is False this will mean we'd likely have chains going through the periodic boundaries. Note that to do this you must provide the RESTART_OVERRIDE_DIMENSIONS keyword, or the keyfile will simply default to the restart file's dimensions.
My original simulation was in a 30x30x30 box - can I restart in a 20x20x20 box? No - there is currently no way to restart in a smaller box for similar reasons as described above.
My original simulation had HARDWALL:TRUE set, can I restart with HARDWALL:FALSE?
Yes! If your original simulation had HARDWALL set to True
Note that to do this you must provide the RESTART_OVERRIDE_HARDWALL keyword, or the keyfile will simply default to the restart file's hardwall flag.
I really need to restart in a smaller box, or after equilibrating with PBC conditions Do you? Do you really? If you REALLY do, we can implement these features by re-equilibrating chains that clash boundaries while holding all others frozen. However, this would require additional code to be written, so if you REALLY need this and have a compelling use case, let me know. However, I'd strongly recommend trying to cast your question in a way that does not require that.
What is a restart file anyway?
Good question! A restart file is ACTUALLY just a Python dict saved as a pickle object. It only contains information on the lattice dimensions, prior energy, and chain identity and positions. This means in principle (and in practice) you can actually build your own "restart" file to configure a starting structure for your simulation... This is quite powerful, because you can build structures, surfaces, basically whatever you can imagine as long as you can define that starting structure in terms of one or more chains with one or more beads.
To get technical, the underlying dictionary has four top-level key-value pairs:
DIMENSIONS:[x,y,z]dimensions of the boxENERGY:floatinstantaneous potential energy of the prior configuration. This is not currently used, so you can set this to any value without concern.HARDWALL:boolflag defining if the simulation was using HARDWALL rules or notCHAINS:dictdictionary that maps chain ID to sequence and position of each chain.
The CHAINS dictionary has key-value pairs where each key is a chainID (must start at one and increment monotonically and without gaps) while value is a 3-position list where
- [0] = position for each bead (list of lists, where each sublist has 2x or 3x elements to define [x,y] or [x,y,z] coordinates.
- [1] = sequence for each bead
- [2] = chain type - a unique ID for each chain type, although PIMMS doesn't enforce two sequences with the same sequence to be the same chain type, although we probably expect that to be the case most of the time... We do allow two chains identical in terms of sequence to have different chain types to facilitate specific behavior for a subset of chains of some sequence. However, this is not currently implemented anywhere... As such, in general, it's a good idea to set each unique chain (as defined by sequence) to be its chain type, but two chains with the same sequence should be the same chain type.
Based on this, it should be clear one could, in principle, create a restart file from scratch... Also, combined with a freeze file, one can build complex structures that are frozen (and therefore do not add to the computational cost of the simulation), enabling simulations of polymers under all manner of structural constraints to be entirely feasible.
As of version 0.1.37, PIMMS supports chain freezing. Briefly, this allows you to pass a freeze file (described below) where you can specify specific chains you want to freeze. Freezing means the chains do not move during the simulation and are not sampled, although they still interact as they would if they could move.
There are two keywords relevant to freeze files: WRITE_CHAIN_TO_CHAINID and FREEZE_FILE.
| Keyword | Format (type) | Description |
|---|---|---|
| WRITE_CHAIN_TO_CHAINID | BOOL | Flag which if set to true means we generate a file called chain_to_chainid.txt which maps the chain ID (a unique index starting at 1) to the sequence as PIMMS represents the chain internally. This is useful in that it makes it straight forward to determine which chain(s) one might want to freeze. The chain_to_chainid.txt file has three columns: (1) chain ID, (2) chain length (3) chain sequence. |
| FREEZE_FILE | STRING | Absolute or relative path to a freeze file. |
The freeze file format is very simple - it's a two-column file where the first column defines the freeze mode being used (right now, the only available mode is C for chain), and the second column is the chain ID of the chain to freeze. Each chain must be specified on a single line. A # symbol should precede any comments, and comments can be written in-line (i.e., after a specific ahin) or on their own line.
Note that PIMMS will report on the freeze file status during setup. Finally, freeze files work both for de novo simulations and for simulations run from restart files with or without EXTRA_CHAIN. This becomes especially useful in that it enables you to design a specific starting configuration in an arbitrary restart file, and then freeze that configuration.
The following block of keywords defines various options that control temperature-sweep Metropolis Monte Carlo moves. This functionality is not fully ready, so I do not recommend using it for now until we confirm some key things! To keep things secret, we haven't even included a description of the keywords!
| Keyword | Format (type) | Description |
|---|---|---|
| TSMMC_JUMP_TEMP | FLOAT | |
| TSMMC_STEP_MULTIPLIER | INT | |
| TSMMC_INTERPOLATION_MODE | STRING | |
| TSMMC_NUMBER_OF_POINTS | INT | |
| TSMMC_FIXED_OFFSET | INT |
Similarly, there are some legacy moves that should also not be altered but must be included. Some of these may be removed for the final release or updated, depending on our ongoing tests.
| Keyword | Format (type) | Description |
|---|---|---|
| MOVE_SLITHER | FLOAT | Slither the chain through the system (BROKEN, do not use). Must be set to 0.0 |
| MOVE_HEAD_PIVOT | FLOAT | Pivot the head residue of the chain (this is a somewhat redundant move) Must be set to 0.0 |
| MOVE_CTSMMC | FLOAT | Single chain TSMMC Must be set to 0.0 |
| MOVE_MULTICHAIN_TSMMC | FLOAT | Multiple randomly selected chains undergo TSMMC Must be set to 0.0 |
| MOVE_SYSTEM_TSMMC | FLOAT | Entire system undergoes TSMMC Must be set to 0.0 |
| MOVE_RATCHET_PIVOT | FLOAT | A single chain undergoes a directed pivot move. Must be set to 0.0 |
The parameter file defines the interactions experienced by the system. Note - EVERY bead defined on a CHAIN must be included in the parameter file and fully defined, with no exceptions.
A parameter file has three sections:
PIMMS has a rudimentary 'backbone' angle term. The start of the parameter file includes a section where those angle strengths are defined. The format is
ANGLE_PENALTY <residue name> X X X
For your purposes, these Xs should be 0 - i.e., there is no angle restraint applied. We're still optimizing the implementation here, so I wouldn't use this (yet)
Next, for EVERY bead, one must define the bead - bead interaction. PIMMS allows three different distance ranges for interactions (short range, long range, and super long range). These are defined by three distinct values. In this way, of you wanted to define A-B interaction as short, medium, and long, one might write
A B -30 -10 -5
This would mean beads A and B are directly adjacent to one another, and an interaction strength of -30 is realized. When they are one site apart -10 and two sites apart -5. These pairwise interactions must be defined for every bead in the system.
Finally, we must ALSO define bead-solvent interactions explicitly; The solvent reserves the bead type 0, such that bead-solvent interactions are
A 0 -5
This would say every solvent-exposed face of bead A provides -5 energy.
With this, a simple example of a parameter file might be
# angle section
ANGLE_PENALTY <residue name> X X X
ANGLE_PENALTY <residue name> X X X
# bead-bead interactions
A A -5
A B -10
B B 0
# bead-solvent interaction
A 0 0
B B 0
PIMMS generates a ton of output files. Below is a brief overview of those files. All files are overwritten when the simulation starts, so simulations can be re-run in the same directory. Output files are subdivided below into distinct types.
The following files provide a description of different aspects of the system and simulation.
| Filename | Explanation |
|---|---|
| log.txt | Contains information on system setup. The current code underutilizes this, and we are expanding the information written here. |
| parameters_used.prm | We've discovered it's useful to explicitly save which parameters were used WITH a simulation for cross-referencing in the future. This means one can 100% reproduce a simulation from the output files. |
| absolute_energies_of_angles.txt | There are two modes that angle energies can be defined, one of which scales the energies by T, in which case the absolute energies depend on the simulation temperature. This file reports those absolute energies and is honestly best used for debugging stuff |
The following files report on the 2D or 3D orientation of the simulated system and the frequency at which they are written out is determined by XTC_FREQ.
| Filename | Explanation |
|---|---|
| START.pdb | This file defines the topology of the simulation system and can be viewed in all good molecular viewers (i.e. VMD: vmd START.pdb) |
| traj.xtc | This trajectory file defines the molecular evolution of the system and can be viewed in most good molecular viewers in conjunction with the topology file [START.pdb] (i.e., for VMD: vmd START.pdb traj.xtc |
The following files report the step-dependent value of different aspects of the simulation. For all the following files the first column reports on the relative step number (where each of the move-types is treated as a single step (i.e. ignoring sub-steps).
| Filename | Explanation |
|---|---|
| ENERGY.dat | Reports on the instantaneous potential energy of the system. |
| MOVE_FREQS.dat | Reports the frequency with which each move type is proposed. Note that for crankshaft, the TOTAL number of moves is reported (i.e., including subset MC steps) such that these values can be interpreted as the absolute number of accept/reject moves proposed. |
| ACCEPTANCE.dat | Shows the same information as in MOVE_FREQS.dat but with accepted moves, allowing the user to back-calculate the acceptance ratio on any move type. |
| TOTAL_MOVES.dat | Tracks the total number of moves (again, moves here include all sub-moves in crankshaft steps, so this counts the number of accept/reject operations performed). |
| PERFORMANCE.dat | For evaluating performance, this file writes out the time per step (using overall steps) and elapsed and expected remaining time. This file is generated every 5th percentile of the simulation (or every 1 step, whichever is larger). Estimated remaining time information is also added to the logging file simultaneously. There is no keyword to control this frequency for consistency across runs. |
These files describe the instantaneous analysis that will be most relevant for thinking about a single chain. This means output from this analysis is written at some regular interval as defined by ANALYSIS_FREQ or, if simulations with many chains are run, the associated analysis is performed for every chain, which - if you don't care about it - can be computationally expensive.
| Filename | Explanation |
|---|---|
| RG.dat | Reports on the instantaneous radius of gyration of the chain(s). |
| ASPH.dat | Reports on the instantaneous asphericity of the chain(s). |
| END_TO_END_DIST.dat | Reports on the instantaneous end-to-end distance of the chain(s) |
| RES_TO_RES_DIST.dat | Reports on the instantaneous residue-to-residue distance, where residue pairs are defined by the keyword ANA_RESIDUE_PAIRS |
These files describe the analysis that will be most relevant for thinking about a single chain, but the analysis that is reported ONLY at the end of the simulation represents an ensemble average.
If simulations with many chains are run, the associated analysis is performed for every chain, which - if you don't care about it - can be computationally expensive. The ANA_* keywords control the frequency of analysis moves - in particular, the ANA_POL will scale all polymeric analyses, which can be useful when running simulations in which the behavior of individual chains is of no interest.
| Filename | Explanation |
|---|---|
| INTSCAL.dat | Reports the ensemble-average instantaneous internal scaling profile. |
| INTSCAL_SQUARED.dat | Reports the ensemble-average instantaneous root-mean squared (RMS) internal scaling profile. |
| SCALING_INFORMATION.dat | Result of analytical fits to the root-mean square scaling profile to extract the apparent scaling exponent and pre-factor that best describes the 1D scaling information. For details on this, see the associated discussion in Peran, Holehouse et al. PNAS 2019 |
| DISTANCE_MAP.dat | Reports the ensemble-average inter-residue distance map. A NON_INTERACTING simulation can be run to generate the excluded-volume equivalent, allowing for a scaling map to be easily generated. |
These files describe the analysis that will be most relevant for thinking about multiple chains interacting together. Two types of multi-chain assemblies are analyzed - clusters and long-range (LR) clusters.
| Filename | Explanation |
|---|---|
| CLUSTERS.dat | Lists the number of chains in each possible cluster. Chains not in a cluster are counted as clusters of "1" chain. Clusters are defined as chains in a continuous connected network that are 1 or 2 lattice sites away from one another. |
| LR_CLUSTERS.datw | Lists the number of chains in each long-range cluster. Chains not in a long-range cluster are counted as clusters of "1" chain. Long-range clusters are defined as chain in a continuous connected network that are 1 or 2 lattice sites away from one another. |
| NUM_[LR]_CLUSTERS.dat | Total number of (long-range) clusters at a given moment. |
| [LR]_CLUSTER_RG.dat | Radius of gyration of each (long range) cluster. The clusters defined in CLUSTERS.dat (or LR_CLUSTERS.dat) map to those analyzed here. |
| [LR]_CLUSTER_ASPH.dat | Asphericity of each (long-range) cluster. The clusters defined in CLUSTERS.dat (or LR_CLUSTERS.dat) map to those analyzed here. |
| [LR]_CLUSTER_VOL.dat | Volume of each (long-range) cluster. The clusters defined in CLUSTERS.dat (or LR_CLUSTERS.dat) map to those analyzed here. |
| [LR]_CLUSTER_AREA.dat | Volume of each (long-range) cluster. The clusters defined in CLUSTERS.dat (or LR_CLUSTERS.dat) map to those analyzed here. |
| [LR]_CLUSTER_DEN.dat | Density of each (long-range) cluster. The clusters defined in CLUSTERS.dat (or LR_CLUSTERS.dat) map to those analyzed here. |
| [LR]_CLUSTER_RADIAL_DENSITY_PROFILE.dat | Radial density profile of each (long-range) cluster. The clusters defined in CLUSTERS.dat (or LR_CLUSTERS.dat) map to those analyzed here. |
| CHAIN_<n>_CLUSTERS.dat | Fraction of each cluster defined in CLUSTERS.dat that consists of CHAIN . |
| CHAIN_<n>_LR_CLUSTERS.dat | Fraction of each long-range cluster defined in LR_CLUSTERS.dat that consists of CHAIN . |
Output files that (for now) are not useful/useable
| Filename | Explanation |
|---|---|
| restart.pimms | PIMMS allows simulations to be restarted, although this functionality is not yet ready. However, the restart.pimms is the only file required for restart. |
The pimms.tar.gz tarball comes with two examples under
demo_keyfiles/demo_1 # multi-chain simulation
demo_keyfiles/demo_2 # single-chain simulation
demo_keyfiles/demo_3 # single-chain simulation
The key files here (KEYFILE.kf are heavily annotated, and a separate readme.md is found.
- See
changelog.mdfor an extensive description of the March Modernization
- Improved
PIMMS -ikeyword descriptions to capture all valid keywords. - Added checks when reading in XTC/PDB files for writing output to provide useful warnings if files are missing.
- Fixed smol bug where chainType ID was not being correctly incremented when multiple different EXTRA_CHAINS were being added.
- Added files into the
demos/surfaceshowing example of running a simulation on a surface. - Added support for non-cubic / square boxes (EXPERIMENTAL_FEATURE) as well as the new keyword
EQUILIBRATION_OFFSET - Added check to escape infinite loop in case broken chains have somehow been loaded...
- Fixed bug where
autocenterwas not ignored if multiple chains are provided. NOTE there still seems to be a weird bug where autocenter chains saved usingSAVE_AT_ENDoccasionally 'jump' in absolute position, but chain never straddles PBC so should not through any issues for chain-centric frame of reference analysis. - Fixed bug where if
RESIZED_EQUILIBRATIONwas set to True we tried to write to START.pdb even though eq_START.pdb was created. Also fixed so unused eq_START.pbd and eq_traj.xtc are not created.
- Fixed bug in
write_positions_to_file()where spacing was not correctly passed toinitialize_pdb_file(). - Added ability to pass an sequence to
write_positions_to_file(), although this currently only works/makes sense if you have a SINGLE chain. This could be updated in the future. - Actually documented restart functionality and the
EXTRA CHAINSkeyword, which enables restarted simulations to start with additional chains. - Added sanity check to ensure
EXTRA_CHAINis only valid if a restart file is provided - Added sanity check to restart file to ensure chain sequence string and chain position list are the same length.
- Added freeze chain functionality:
- Freeze chain functionality enables one or more chains in the simulation to be frozen in place. Chains are identified by their chain ID. In PIMMS, each chain is deterministically assigned a unique ID based on the input order (i.e. number of chains, order chains are defined using
CHAINkeyword, use of restart file, andEXTRA_CHAINkeyword). - To define chains to be frozen, the
FREEZE_FILEkeyword lets you define the path to a 2-column file that describes what should be frozen, referencing by this chain ID (seeFREEZE_FILEkeyword). - To make it easy to figure out this mapping, we also included a
WRITE_CHAIN_TO_CHAINIDkeyword which if set to True means PIMMS writes a file mapping the chain ID to each chain so you can manually work out which chainID you want to use. - Added documentation (both here and internally) for the FREEZE file keywords.
- Freeze chain functionality enables one or more chains in the simulation to be frozen in place. Chains are identified by their chain ID. In PIMMS, each chain is deterministically assigned a unique ID based on the input order (i.e. number of chains, order chains are defined using
- KEYWORDS ADDED:
FREEZE_FILE,WRITE_CHAIN_TO_CHAINID(discussed above).
- Updated docs
- Updated demo keyfiles
- Fixed mismatch in integer type breaking PIMMS entirely...
- Changed performance output, removing the
STEPS_PER_SECOND.datfile, which was always questionably useless, and replacing it with aPERFORMANCE.dat, which includes steps per second information, time elapsed, and anticipated time remaining, all nicely formatted in a header-containing output file. This file is always written and is written every 5th percentile through the simulation AND after 20 steps just so if you're running a REALLY long simulation, you can get a ballpark estimate of how bad this is gonna be relatively quickly. - Improved many docstrings
- Ensured
SAVE_AT_ENDandSAVE_EQstatus is now written out during initialization - Updated log info to get estimated time remaining updates
- Added safety check to ensure Cython and Numpy intsizes are matched, rather than just having PIMMS crash with an obscure error!
- Added safety check to ensure PIMMS can accomodate the number of beads on the lattice of the given integer type
-
Major update to Cython backend to facilitate better control over memory usage. In previous versions, PIMMS defined all back-end grids (chain grids and type grids) as [n x n x n] matrices, where each element was a 64-bit number. Because of the cubic term here, as grids become larger the memory footprint associated with PIMMS becomes very large; for a [200 x 200 x 200] grid, the memory footprint can reach 100s of MBs. In version 0.1.35, the backend memory management has been made dynamic; that is, we can compile PIMMS versions that specify the number of bits associated with the elements in the grids. Right now, the default version compiles with 64-bit integers still, but to recompile with a smaller memory footprint you can change just two flags in
CONFIGS.pyand the newly createdcython_config.pxd; specifically :ctypedef cnp.int64_t NUMPY_INT_TYPE # can be changed to ctypedef cnp.int32_t NUMPY_INT_TYPE # or ctypedef cnp.int16_t NUMPY_INT_TYPEWhile in
CONFIGS.pyNP_INT_TYPE = np.int64 # can be changed to NP_INT_TYPE = np.int32 # or NP_INT_TYPE = np.int16Right now, we continue to default to 64-bit numbers as this is test driven, but ultimately the plan is to transition to a 16-bit backend which basically reduces the memory footprint down to 25% of what it would have been with a 64-bit backend. To what extent this improves performance (steps/second) is unclear, but it HUGELY helps running many parallel jobs on many CPU systems where we actually become memory limited!
-
In addition to the memory re-write, we re-wrote
delete_pbc_pairs()ininner_loops_hardwall.pyxto substantially improve performance by fully typing the function - for non 64-bit numbers this adds a ~8x improvement in performance, and maybe 1-2x for native (64-bit) memory implementations. -
Despite the big improvement in memory utilization, arguably the most important update in version 0.1.35 is a re-write of how trajectory saving is done. In particular, we previously had a punishingly inefficient approach for writing new trajectory frames that was so stupid it almost makes you wonder if it was a deliberate act of sabotage by someone. In any case, we (Ryan) has re-written this code to [firstly] ensure trajectory writing is done in a single output operation of XTC only data (instead of the 3x I/O operations we had previously, [don't ask...]).
-
Beyond this update, we (Ryan) also added the
SAVE_AT_ENDkeyword (default = False). If set to True, this means the simulation only writes the entire XTC at the end of the simulation. If you are worried about simulations crashing this is not ideal. However, where this is not a major concern, avoiding many I/O operations offers big gains, especially for larger systems. -
Added
SAVE_EQkeyword. Default True. If set to False, equilibrations steps are not saved. This works for bothRESIZED_EQUILIBRATIONexperiments (an eq.traj file is still made but it only contains a single frame) and whenRESIZED_EQUILIBRATIONis not used (standard sims). WhenRESIZED_EQUILIBRATIONis not used, PIMMS will begin saving (or updating the trajobj ifSAVE_AT_ENDis set to True) after the equilibration step but does not save before. -
KEYWORDS ADDED:
SAVE_AT_END,SAVE_EQ(discussed above). -
Version 0.1.35 is the final architectural change prior to the bump to 0.2.0 which will be the first live PIMMS release. Get psyched. Small updated (e.g. 0.1.36) will come after but these will be minor patches.
- Major update to Cython backend to improve performance. All numpy arrays are now passed as memory views instead of as new arrays, which reduces the overhead on large arrays substantially
- This big re-write has been tested extensively without any issues identified
- The default lattice-to-realspace value (
LATTICE_TO_ANGSTROMS) has been updated from 4.0 nm to 3.65 nm - KEYWORDS ADDED:
CASE_INSENSITIVE_CHAINS,AUTOCENTER
- Update so logfile is always a new file instead of appended to (pimmslogger.py)
- Restructured to define the DEFAULTS dictionary which sets and explains default parameters for keyfiles. This means default options are encoded directly in
- Updated internal documentation
- Added reduceD_printing mode
- KEYWORDS ADDED:
REDUCED_PRINTING - Fixed bug which could lead to an error when a non-essential keyword was unset
- If residues are unknown to single letter-to-three letter conversion ensure that first character in the unknown residue type is not a number, because this causes PDB readers to fail.
- Added
EXTRA_CHAINkeyword - Fixed bug in how pdb chain ID was being written for
TERlines (always using chain A) - Added and improved internal
RestartObjectcode and functionality (including improved parsing) - Improved information printed when a RESTART file is used to make it easier to see what is going on.
- Added CONECT records to output PDB files, so bonds between chains are easily visualized
- Added
LATTICE_TO_ANGSTROMSkeyword such that PDB file dimensions are controllable. Default=4 (same as before) so this will not change anything compared to prior simulations. - Improved code documentation and removed
xtc_utilsdue to redundancy.
- Changed so PDB chains defined by the internal chainType - that is, all chains of same type have same PDB chain ID, which is convenient for visualization
Copyright (c) 2015-2024, Alex Holehouse & Ryan Emenecker
