Tuesday, December 8, 2015

Constructing Systems of Molecules for Molecular Dynamics Simulations Using Self Assembly

It's December 2015 and I have spent many an hour reading about how others build systems of molecules for molecular dynamics simulations. There are many ways to Rome and all have obstacles and suffer from problems and seem not to provide good, computationally quick and cheap solutions. My biggest problem since last October has been funding. I solved this problem in February of 2015 by going back to work full-time. Sadly, working full-time leaves much less time to devote to my research. This is why I have spent time reading papers on building systems. I have seen 12 step programs and and 21 step programs to build a system. The definition of a system is one or more molecules in equilibrium prior to running molecular dynamics (MD) to determine some physical property or perhaps to collect data for diffusion of small molecules in a polymer material. Digging deeper the monomers must be connected to form the polymer. Polymer chain lengths must be decided and constructed and the bonds between polymers must be constructed. The bonds between polymer chains are cross-links. This is where the fun begins. The number of and types of steps seem predicated on achieving some desired outcome for the initial simulation system. Outside of the theoretical (e.g. first principles) work of Flory (Rotational Isomeric State Theory), Theodorou and Suter ("Detailed Molecular Structure of a Vinyl Polymer Glass) the phenomenological concepts continue to rule system construction and therefore "free volume" concepts are invoked.

Constructing a system is usually defined as the automation of the polymerization process and/or constructing bonds between polymers, where molecules are attached to one another. In the case of an epoxy system, DGEBA/IPD, one IPD molecule can cross-link with up to DGEBA molecules. This can result in a 3D networked system of monomers. Many of the reviewed approaches literally beat the system to death in terms of cycling temperature and pressure of the system based on chemistry of the system until it exhibits experimental properties of the macroscale system in simulation. The usual suspects compare the built system to experimental data via radial distribution functions and the like and declare success.  How can one compare a simulated system with an experimental system? I suggest comparing elastic, shear, bulk moduli and density values to their experimental counterparts at specific temperature ranges. Using molecular dynamics (MD)  one must average over some significant time duration to give an average (time independent)  value.  One can do something similar with Monte Carlo methods (MCM), but it does not contain the kinetic energy (KE). Based on many sources across many disciplines, one relies on the lowest system energy representing the most probable configuration, where configuration is geometry.

Obviously, geometry and potential energy (PE) are related via force fields. MD subjects the simulated system to both potential and kinetic energy (KE) since MD provides velocity. Experimental physical properties such as moduli, density and glass transition temperature are are based on PE and KE and therefore dynamic. The interesting point is PE is usually negative and KE is positive. Looking at the equations for the CFF force fields like COMPASS one sees that a single molecule can have a zero PE value if every bond length, bond angle, out-of-plan and dihedral angles are all at their equilibrium values.Once energy is stored by distorting bond lengths and/or angles the system energy is decreased. The movement of atoms provide the kinetic energy, which increases the system energy. Of course multiple molecules create a system that must account for Van der Waals and Coulombic forces as COMPASS does. Forces between atoms are time varying and atoms are attracted and repelled based on proximity. If one were to take a snapshot in time of a system of molecules one would see atoms in all molecules at positions where bond lengths and angles away from the zero PE positions. Assuming that none of the molecules have a net delta velocity the system can be considered to be in equilibrium. Considering a mass of epoxy in an ocean of nitrogen molecules exerting 1 bar of pressure the epoxy would consist of bond lengths and angles dictated by the external forces. However, atoms in a material at some temperature must move as per normal modes of vibration with stretching, bending, rocking, and wagging motions. The result is no net translation or rotation. For molecules in close proximity their Van der Waals and Coulombic forces create a viscous-like environment to dampen movement.

Obviously, in the real-world cross-links form in a nonequilibrium environment with mass transfer and most likely temperature gradients. Chemical reaction can be endo and/or exothermic. Molecular dynamics doesn't account for electronic configurations.  There are some reactive force fields, but they are specific to certain types of chemical reactions. Barring advances in this area the question remains how does one get molecules to "assemble" themselves into the lowest energy conformations? So far, the literature cites using a proximity strategy. Once two molecules come close enough (e.g. 6-10 Angstroms) at their reactive sites a bond is created between the two molecules. This is the cross-link bond. The two molecules are now one molecule and the PE of the one molecule depends on the rotation of the smaller molecule around the newly formed bond and the length of the newly formed bond assuming the two original molecules don't collide. A human can connect the two molecules and with a little playing "eyeball" where the two molecules should be oriented to minimize the PE. Running a MMEM software will attempt to minimize the energy of the entire system. If the two molecules are bonded and the new bond forms deviates from the zero PE bond length or if the angles it forms deviate from the zero PE angles then the new molecule increases in  PE. Time must be spent relaxing the system to a lower energy and there is no guarantee of moving the system to the best conformation. In summary, molecules need to move relative to one another, when they are in proximity they must be bonded together forming the cross-link and then the system PE must be adjusted. Proximity is a bad strategy for cross-linking.

How do molecules move? The molecules or the atoms composing molecules can move by dynamic, static and energy methods. If one adds proximity cross-linking to these molecule movement methods one defines possible system construction methods.  The current construction methods as follows:

1.  Dynamic Methods - Molecular Dynamics (MD)  is obviously a time series of atom and molecule
     movements in a confined space. Molecules go whizzing around in space. force fields (think
     springs between atoms)  are used to insure the molecules do actually translate and rotate according      to Newton's laws. Proximity is used to cross-link molecules at reaction sites to form cross-links.

2. Static Methods - Self-avoiding walks (SAW) and Packmol optimization methods fall into
     this category. Self-avoiding walks place atoms and test for collisions and backtrack when
     necessary and then try another direction to place atoms. Packmol calculates a packing
     optimization solution based on a penalty function for atoms with respect to one another and to
     geometric boundaries (e.g. walls ). Molecules are moved via a heuristics algorithm. Proximity is
     used to cross-link molecules at reaction sites to form cross-links.

3. Energy Methods - Metropolis Monte Carlo Method (MMCM) and Simulated Annealing (SA)  fall
    into this category.  SA relies on reducing the system temperature of the system over the course of       the computation. The probability of the small spatial move of a molecule is calculated accounting
   for the change in system energy and compared to a random number. Metropolis Monte Carlo
   Method is done with a single system temperature. The methods are not time-based and in many  
   case higher energy moves are accepted that allow the system to bypass high energy barriers and  
   hopefully find a lower system energy. Proximity is used to cross-link molecules at reaction sites
   to form cross-links.

4. Other Method(s) - Genetic Algorithms, reactive force fields, etc...

Packmol uses a penalty function to determine if the packing of molecules is successful. Packmol randomly distributes molecules in a given volume and then employs a linked-cell approach to determine if every atom of a molecule is at least 2 Angstroms from the atoms of all other molecules. The linked-cell approach linearizes this problem since only atoms in the linked-cell need to be compared drastically reducing the number of comparisons.  Packmol uses heuristics to move molecules with large penalty functions to locations where molecules with low penalty functions are located. The linked-cell and proximity is just part of the penalty function calculation. Every atom has a list of restrictions. A simple system contains a single box restriction where all atoms must be inside the box. Unless the atom is very near the perimeter its restriction contribution is zero. An atom alone in a linked-cell has the same penalty function value. Simple restrictions such as the box boundary is read in from a configuration file. Every atom has a component contribution of the penalty function. Molecules with the highest penalty function are too close to other molecules and are translated and rotated close to molecules with the lowest penalty function values.

GENCAN is used to minimize the penalty function described above. This is done by calculating the values based on atom proximity and restrictions. The penalty function is a function of the translation and rotation of the molecules in the system. A penalty function of zero guarantees no spearing, or bad dynamics for the system of molecules.

When the atom proximity is being calculated atom types can be used to match the correct pair of atoms across two molecules. In the case of the DGEBA/IPD system the carbon and nitrogen atoms need to be some distance apart to support cross-linking (6-10 Angstroms). If this is the only criteria then bad geometry occurs due to the two molecules involved are not constrained enough to force GENCAN to consider only the dihedral angle between the two molecules. Also, the currently implemented heuristics often forces the two molecules in question apart making the cross-link much greater in length forcing the cross-link to be abandoned. Ideally, two molecules (e.g. IPD and DGEBA) are in proximity and a new restriction could be created that gets calculated in the restrictions section of the algorithm. This on-the-fly restriction (or dynamic constraint) now states that two atoms in different molecules must be some distance apart. As molecules move now the molecule that is to be cross-linked to it must move as well. The solution to this problem comes from the geometry of the sp3 nitrogen and the sp3 carbon involved in the cross-link site between the two molecules. The nitrogen has two hydrogens which come off during the reaction with the epoxide ring which provides the sp3 carbon. The key is if the carbon were to be located where one of the hydrogens are and if the nitrogen were to be located where one of the hydrogens is on the sp3 carbon then the nitrogen and carbon form an axis and the two molecules have the correct attitude relative to one another with the exception of their dihedral angle. By not fixing the dihedral angle, the molecules are free to rotate about their common axis using the GENCAN algorithm. This new approach developed is to specify the hydrogens that are not needed after cross-linking as having an atom type of zero. This is jokingly referred to as "zeronium." The "zeronium atoms are later removed when the molecular new system's machine part file is written. While they exist in the packing process they provide a coordinate to locate the other molecule's Carbon or Nitrogen atoms. The goal being  to achieve the correct geometry in terms of bond lengths and bond angles initially and letting GENCAN solve the dihedral angle. Linking together molecules in this manner solves the problem of  creating new molecules and allowing the cross-linked molecules to achieve new conformations over the course of the packing optimization. It solves the bad geometry problem by taking the ideal geometry as a first estimate.

The heuristics algorithm become very different. The new algorithm becomes a process of "self assembly." If each molecule were a vehicle capable of  navigating in 3D they could broadcast a need for other molecules to attach themselves to them. The attachment site coordinates are the data needed for an prospective molecule to change it's attitude in space to line up for attachment. This approach leads to a self assembly heuristics algorithm. Since two points in space are used to cross-link two molecules, the GENCAN algorithm is free to solve the dihedral angle. With the dynamic constraints moving a single molecule would require the cascading movement of multiple molecules. This is why moving free monomer molecules takes priority. Proximity is tested for during the GENCAN algorithm and dynamic constraints are created on the fly as well.

The current Packmol software has no concept of "holes" or "voids" inside the box. Also, one successfully runs the software on successively smaller and smaller boxes until the algorithm fails to pack the molecules. Making the box bigger gets one back to a successful packing. This usually provides the smallest possible box with the least amount of "holes" in it.  The new heuristics will require moving molecules to a new location in the box with the purpose of creating cross-links. The cross-link sites on each molecule are known and moving the IPD into proximity of the DGEBA or the otherway around will in fact create the permanent linkage between the molecules while the dynamic constraint that follows provides the correct geometry.

The modifications to be implemented use the proximity penalty function algorithm to identify cross-link pairs. The cross-links are maintained in a new dynamic data structure. At the same time the new dynamic constraint is created for the carbon atom in the DGEBA molecule and the zeronium atom in the IPD molecule. The zeronium atom in the DGEBA is paired up with the nitrogen in the IPD molecule. The number of restrictions per atom is currently set at 10 and can be increased to handle these dynamic constraints. Unlike MD software,  Packmol doesn't care when two atoms occupy the same space in this case it helps define the necessary geometry and to avoid overlaps and collisions. The coding of the Packmol enhancement is in progress and shows promise in self assembling the monomers to polymers and in creating cross-links.