# Frequently asked questions about ChromX

Questions and answers

### Frequently asked questions about ChromX

In the following, we’re answering typical questions regarding a ChromX application. The collection comprises general questions concering technical requirements and conditions, as well as practical questions regarding model applications and workflows using ChromX. Please contact us directly, if you have any further question beyond.

### Do I need to be an expert at thermodynamics or maths?

Chromatography modeling with ChromX is very easy in comparison to e.g. modeling with plain equation solvers written in C++ or in MATLAB®. To perform model building with ChromX, it is not necessary to be proficient in thermodynamics or maths. All equations are built in and the graphical user interface simplifies the modeling workflow as much as possible. Only the equations’ parameters are visible and changeable. For the simpler models, it would be enough to understand e.g. what the protein charge and the binding equilibrium constant is to be able to build a model. Using ChromX, it would be technically possible to build a chromatography model without getting a glimpse of a single equation.

### Do I need a computing cluster to run ChromX?

No, you don’t need a computing cluster. ChromX is designed to run on your desktop PC or workstation. It mainly needs CPU performance and makes use of the available cores. An Intel Xeon is not needed, an Intel Core i7/i9 or AMD equivalent achieves a higher performance per dollar. Memory is not as important; ChromX usually needs 4 GB or less for its own data storage, so the system RAM should be at least 8 GB. Further ChromX requirements are a graphics card and drivers that support OpenGL and Windows 10 (recommended), 7, 8 or 8.1 operating systems. An internet connection is recommended for convenient access to the latest updates. There are also options to transfer those updates to a PC disconnected from the internet. You are welcome to contact us for an individual evaluation of your IT infrastructure.

### How long does a simulation take?

It depends on various factors how long a simulation will take. The number of different components, as well as the number of model parameters, are crucial. The number of parameters is determined by the selected fluid dynamic and thermodynamic models. Furthermore, the resolution of the spatial and temporal discretization is important for the calculation and therefore the simulation speed. The more the axial/radial cells and the smaller the time steps, the more points will be used for the concentration calculation and, in conclusion, the slower the simulation speed. Individual simulations, however, should not take longer than a few seconds or up to one minute.

### How do I get the ChromX updates?

The software updates are included in your license. Whenever a new software update is available, you will be notified upon starting ChromX. There is also a short description about the scope of the update and the possibility to install the update immediately or later.

### Who do I ask if I get stuck and don’t know how to continue with ChromX?

If you have any problems, do not hesitate to contact us via support@gosilico.com or http://helpdesk.gosilico.com. We will do our best to help you. If you need technical support, this is free of charge as it is already included in your license purchase.

### Can I start ChromX from the command line?

ChromX can also be executed from the command line with the command: chromx.exe –exec <your-cmx-parameter-file> ChromX will directly save the results in VTK or MAT format, so you will not see any graphical output during the runtime of the program. A result table in Excel® format is generated if operations like estimation and sampling are started from the command line.

### Which data formats can be imported into ChromX?

The Import Wizard that assists data import in ChromX can use Excel® files or the native UNICORN® output files .res or .zip. Plain chromatograms can furthermore be imported using .asc and .csv files.

### Can I export results from ChromX and use those in another software solution?

For the export of the “Concentration files” – they can be used to investigate the concentration profiles inside the column with MATLAB® or a visualization program such as ParaView® – see the “How can I investiagate the concentration profiles inside the column?” section. Moreover, it is possible to export all chromatograms as images or vector graphics and, in addition, the export of the data spreadsheet as an Excel® file is also an option.

### Can I develop my own mathematical models and use those in ChromX?

You can also use your own adsorption models in ChromX with the help of a plug-in interface for the adsorption isotherm. Examples are provided in the ChromX user manual. Further plug-in interfaces exist for objective functions and external optimizers.

### Which experimental scale should I use?

Chromatography modeling using ChromX can be done with every experimental scale from RoboColumns to manufacturing scale. It might not be ideal to rely solely on chromatography runs on a pilot or even manufacturing scale. On these scales, some uncertainty is introduced because the column characterization is very tedious or even impossible. This might cause lot-to-lot variability to go undetected. In addition, larger setups also have larger dead volumes which might not be easy or possible to determine. Similar to large-scale experiments, the column characterization for RoboColumns is very inaccurate and some parameters, e.g. packing quality, cannot be determined. Also, intermittent flow is hard to determine and might not be detected. Nevertheless, RoboColumns are helpful for testing a huge variety of different conditions in a very short time and generate lots of data for model calibration. In contrast, experiments on bench top scales have the lowest uncertainty as the column characterization, determination of system dead volumes and control of the flow are easier and more accurate. Thus, the data quality of bench top experiments is superior to the other scales. As mechanistic modeling (in contrast to statistical modeling) only requires few experiments with high data quality, the bench top scale is most valuable for chromatography modeling in many cases.

### Which experiments should I perform for model calibration?

It is desirable to determine as many parameters as possible ahead of the model calibration to reduce the number of unknown parameters that need to be derived from the recorded chromatograms.

First, the dead volume from injection point to UV and conductivity sensor should be determined. Therefore, a tracer pulse (e.g. NaCl, acetone) is recorded without the column attached to the system. The system dead volumes can be derived from the resulting peaks.

To determine the porosities of the column, the tracer runs are repeated with the column attached. To determine the total porosity, a small tracer molecule (e.g. NaCl, acetone) can be used. For the determination of the interstitial porosity, a larger tracer molecule, which is unable to diffuse into the pores, is required (e.g. dextran). The larger tracer experiment can also be used to calculate the axial dispersion coefficient. If a CEX step is to be modeled, the ionic capacity can be determined with an acid-base titration experiment.

Concerning the protein adsorber interaction, it is impossible to provide a precise guide on which experiments to conduct, but some kinds of experiments are useful in many cases. For IEC, three experiments using linear gradient elutions with different gradient slopes can provide information about the protein charge and the equilibrium constant of the isotherm. These gradient elutions must be performed with low column loading in the linear region of the isotherm. In addition, at least one experiment featuring a step elution (be careful to perform a pump wash previous to the step) should be performed to provide information about mass transfer and kinetic parameters. If the goal of the modeling project is focused on step elutions, one or two additional step elutions at different salt concentrations should be added to the calibration space. To gain information about protein interactions and ligand shielding, a protein run at high loading (with some breakthrough) is beneficial.

### How many fractions should I take?

To obtain an accurate model, the resolution of the different peaks needs to be sufficient for model calibration. As the resolution depends on the fraction size and the number of analyzed fractions, we recommend a fraction size of 0.5 CV if possible. It is strongly dependent on the individual process where best to take fractions and how many fractions to take for analysis. Because of the trade-off between resolution and effort, a rule of thumb for bind & elute chromatography would be to analyze about 3 fractions distributed through your potentially occurring load breakthrough and about 3 fractions of your strip. For the elution, about 10 fractions chosen at the positions where the impurities of your process are expected to elute are sufficient for most cases. When performing flow-through chromatography you should analyze about 10 fractions of the flow-through and about 3 fractions of the elution peak to get information about potential product loss and to better understand the binding impurities.

In order to be able to calculate the mass balances of the proteins, we highly recommend an analysis of the feed material loaded onto the column.

### Which analytical methods should I use?

Concerning fraction analysis, a variety of different analytic methods can be performed and implemented in ChromX. The most frequently used analytical methods are SEC and IEX if critical charge variants of the product are suspected. Sometimes it is also possible to explain non-symmetric peak shapes by considering charge variants. It is preferable to conduct the offline SEC and IEX analytics at the same wavelength as the recording of the chromatogram. Apart from the offline analytics to characterize the mAb variants, additional components, such as HCP or DNA, can be analyzed and included in the calibration space. Please contact us directly for further information.

### How do I combine different offline analytical methods (e.g. SEC and CEX)?

The most common case where the combination of different offline analysis methods is necessary is the investigation of mAb species with SEC and CEX. Without a two-dimensional analysis, the exact amount of e.g. acidic fragments, basic monomers or neutral aggregates is unknown. Regularly, no two-dimensional analysis is conducted due to the high analytical effort.

Therefore, assumptions must be made. For processes with a high initial purity of the monomer species one could e.g. assume that the charge variants are equally distributed for the size variants. In most cases, splitting the fragments and aggregates into different charge variants is not necessary. Therefore, only the monomer needs to be divided into different charge variants. To do this, the amount of monomer determined with SEC needs to be combined with the respective amounts of charge variants evaluated with CEX. It must be kept in mind that this workflow may not be suitable for some processes. Please contact us directly for further information.

### Which model should I use?

The model selection in ChromX is divided into three parts. First, a column model and a pore model needs to be determined. There are basically three different options that are frequently used. The least complex is the Equilibrium Dispersive model. This model bundles the axial dispersion and all mass transfer parameters into one parameter. The ED model might be applied especially for newer resins, where the pores are easily accessible. In contrast, older resins might need more complexity. There is the Transport Dispersive model with the Lumped Rate pore model. Here, the axial dispersion coefficient is an individual parameter while all mass transfer parameters are lumped into a second parameter. The most complex model is Transport Dispersive with the General Rate pore model. Here, the mass transfer parameter is divided into film transfer and pore diffusion. Thus, another dimension in the radial direction of the beads is introduced. Therefore, the computational time is increased significantly. Most of the time, the Lumped Rate model is sufficient, but for some cases, where e.g. extensive shielding occurs, the General Rate model might be needed.

The second part of the model selection is the choice of the isotherm. The choice is limited due to the type of chromatography (IEX, HIC, affinity, …). In each category there are different isotherm equations with varying complexity. With some experience, it is rather easy to select a suitable model for your project.

ChromX also offers the Model Selection Wizard that guides through the model selection.

### Where do I look for the model equations?

The underlying equations are not shown in ChromX in order to simplify the application of the software. If you are, nevertheless, interested in the fluid dynamic and thermodynamic equations, they can be found in the User’s Guide and in the Model Selection Wizard.

### How do I reference my work using ChromX in papers and publications?

As a licensed user of ChromX software, you are free to publish your own modeling work in articles, papers, books, images or conference proceedings. GoSilico appreciates your referencing of our software and our company name in your publications.

If your publisher does not have any specific guidelines, we recommend the following format:

“… we used the ChromX software (GoSilico GmbH) to perform the modeling [Ref. 1] …”

[1] T. Hahn, T. Huuk, V. Heuveline, J. Hubbuch, Simulating and Optimizing Preparative Protein Chromatography with ChromX, J. Chem. Educ. 92 (9)(2015) 1497–1502, http://dx.doi.org/10.1021/ed500854a.”

If available, we recommend citing one of our latest scientific publications, closest to your application. In general, you can always cite the most fundamental ChromX publication on the “Preparative Protein Chromatography with ChromX”, as referenced above.

The following format is suggested if you would like to reference a specific section of the ChromX documentation.

“… This is explained in detail in the ChromX User’s Guide [Ref. 2].”

[2] ChromX User’s Guide v1.3.12.1., pp. XX-XY. GoSilico GmbH v1.0, Karlsruhe, Germany. 2020.“

If you wish to re-publish any information created by GoSilico, including but not limited to images, models and documentation, please contact us directly.

### Can I still use my data if I decide to stop using ChromX?

In the unlikely event that you decide to stop using ChromX at some point in the future, this is not a risk to your modeling results. ChromX data are stored in an open and documented format. All parameters used for simulating a chromatographic experiment in ChromX are stored in an XML-based text format. These .cmx files can be read and edited with any text editor. The modeling results generated with ChromX are commonly exported in other widely used data formats such as Excel® or as graphics in .jpg, .png, or .svg format.

### How do I use the Peak Finder?

The ChromX Peak Finder is based on Yamamoto’s method for deriving the charge and equilibrium parameters from the gradient slope and the salt concentration at retention time. The peak finder can be used together with the SMA isotherm.

To use the peak finder, at least two linear gradients with different slopes at constant pH and low column load are needed. If the column load is outside the linear isotherm range, shielding effects will hinder the correct determination of the charge and equilibrium parameters. After import to ChromX, the peak finder allows the definition of the elution order of the components and assigns them to the individual peaks of the measured chromatograms and the imported offline fraction analysis. Then, the charge and equilibrium parameters are derived from the chromatogram.

### How do I choose the number of axial/radial cells and the time stepping?

The number of axial cells determines the number of points along the column at which the concentration of the components is calculated. A larger number of axial cells yields higher precision, but it also increases the computational time. The pre-set number of axial cells is 31 but sometimes about half this number (15) is sufficient. If the general rate model is used, an additional dimension in the radial direction of the beads is added. The number of radial cells can be much smaller than the number of axial cells, say 7-15.

The step size of the time stepping determines the precision of the temporal discretization of the partial differential equations. As a starting point, a step size between 1s and 10s can be assumed.

When numerical oscillations occur because of fast concentration changes in the column, e.g. at step elutions, the number of cells needs to be increased and/or the time stepping step size needs to be decreased.

For more information, please consider the ChromX Online Help that further guides you through the correct choice of time steps and axial/radial cells.

### Optimization: What is the difference between Objective Beginning/End and Min./Max.?

The parameters Objective Beginning and Objective End determine when the pooling is started and stopped or, mathematically speaking, in which interval the objective function is evaluated. Most users use UV-based pooling and choose an mAU value for the pooling beginning and end. Reasonable boundaries for optimization depend on the individual separation problem. An example would be 0-300 mAU to trigger pooling and to stop it when the signal falls below this threshold. These are rather wide ranges and leave a lot to the freedom of the optimizer.

The other two input boxes, Minimum and Maximum, are only necessary under special circumstances. For example, when there is a wash step with strong impurity elution leading to another peak of at least 300 mAU. In this case, the Objective Beginning trigger would be executed during the wash, which is not desired. To circumvent this, one could set Minimum and Maximum values to restrict the Beginning and End triggers so they are only executed within an admissible time (volume) frame, e.g. after the wash is done. Here, we typically use volume values instead of UV.

In summary, Minimum/Maximum define the window in which the evaluation of the objective can be triggered. The parameters Beginning and End define the span in which the objectives are actually evaluated.

With an Objective Beginning between 0 and 300 mAU the pooling would be triggered for the wash peak, but your target protein is in the second peak. To prevent the pooling of the wash peak, a Minimum of ~4 ml needs to be added. In this way, the pooling will not start before the 4 ml mark.

### How do I set up a meaningful objective function?

ChromX offers a huge variety of objectives that can be optimized such as purity, yield, pool volume and recovery. To perform an optimization with individual goals, the respective objectives need to be incorporated into an objective function. It is important to remember that optimization means the minimization of the objective function. This means that objectives which need to be maximized, such as purity or yield, must be included as differences (e.g. 1-purity) or inversely (e.g. 1/purity).

If all objectives are equally important, it is reasonable to weight each term of the objective function equally. An example would be the inclusion of the pool volume in the objective function. The

assumed pool volume is e.g. in the range of 15 to 20 ml. If no factor of approximately 0.05 would be included for this objective, the pooling volume would be weighted more than e.g. the purity in the term 1-purity. Of course, it is also possible to weight the individual objectives differently, depending on their importance for the optimization goal.

Meaningful examples are for example presented in the ChromX tutorials.

### Which optimization algorithm should I use?

ChromX offers a variety of different optimization algorithms. The algorithms most frequently used are the heuristic ASA (Adapted simulated annealing) algorithm and GALib which is a genetic algorithm, and the deterministic Levenberg-Marquart CERES and CMinPack algorithms.

ASA is a probabilistic method for approximating a global optimum of a goal function. To this, the algorithm performs random jumps in order to explore the search space. If the new parameters improve the objective’s value, the jump width is reduced. ASA quickly scans a large search space.

Genetic Algorithms such as GALib are a heuristic approach to optimization motivated by nature itself. The algorithm starts with several parameter sets at the same time, a generation of parameter sets, so to say. Similarly to the evolution of genes, the “fittest” parameter sets are kept, mixed among each other or slightly mutated with the expectation that the next generation will be even fitter. The algorithm explores a smaller part of the search space than simulated annealing but is able to leave local minima via random mutations, in contrast to Levenberg-Marquardt.

The Levenberg-Marquardt algorithms are optimization algorithms that can be interpreted as a combination of the method of steepest descent and Newton’s method. Through an additional constraint, one forces the solver to decrease/increase the goal-functional value in every iteration, so that the objective is successively better fulfilled in every experiment. The method is fast when starting sufficiently near to an optimum, but otherwise slow and may be unable to leave a local minimum.

Therefore, starting the parameter estimation with a heuristic algorithm (ASA or GALib) is recommended until a good starting point for the deterministic algorithm is found.

### Which error norm should I use?

The optimal error norm depends on the separation problem at hand. The most frequently used error norms are L2Error (least square error), NRMSE (normalized root mean square error), L2ErrorIntegral (integrated least square error) and DTW (dynamic time warping).

The L2Error is a typically used error norm. If multiple chromatograms and/or components with very different peak areas are used for model building, the NMRSE is better suited. Here, the individual residuals of the different components and processes are normalized prior to the calculation of the overall residual. Therefore, also components with considerably smaller peak areas are not neglected during model building.

The L2ErrorIntegral, as an integrated form of the least square error, is helpful for finding the correct peak position as the integration of the residual provides a large error if there is no overlap between the measured and modeled chromatograms.

The DTW is an error algorithm that compresses and stretches the signal along the column to minimize the error between the measured and modeled chromatograms. Therefore, a peak shift along the column is less important for the resulting residual. DTW is helpful if the correct peak shape should be preferred over the correct peak position.

### Which ranges should I use for parameter estimation?

The boundaries for parameter estimation are strongly dependent on your individual process. Therefore, it is not possible to predict the exact ranges for each parameter. Nevertheless, rules of thumb may be applicable for some processes to start the parameter estimation.

The film diffusion coefficient is often in a range between 0.0001 and 1 mm/s. If the general rate model is used, a separate pore diffusion coefficient is introduced. As the diffusion inside the pore is the limiting step, the pore diffusion coefficient is smaller and may be between 10e-8 and 10e-5mm2/s. For CEX, the kinetic parameter is typically <<1, while for HIC, the kinetic parameter is more likely between 1and 1000. The equilibrium coefficient is very hard to predict. Thus, it makes sense to start with a broad range of e.g. 10e-6 and 10e6 and use a logarithmic scale for estimation. The charge parameter indicates the number of charged patches on the protein’s surface, therefore the charge parameter is always >0. Typically, a range between 1 and 15 can be used for a standard mAb process. The shielding value describes the number of binding sites unavailable due to steric shielding and repulsion by the molecules already bound. This value is often between 10 and 200. If the Mollerup extension of the SMA isotherm is used, there are the additional kp and ks parameters that are part of an activity coefficient approximation. The kp parameter describes the influence of the protein concentration on the equilibrium parameter. A negative kp means that the binding of the protein is weakened with increasing protein concentration (a positive kp indicates the opposite). The parameter typically has large values in the range of -50000 to 50000 M-1. The ks parameter describes the salt influence on the equilibrium parameter. Again, a negative ks means a weakening of the protein binding with increasing salt concentration (a positive ks indicates the opposite). This parameter is often in the range of -30 to 30 M-1. It is important to keep in mind that the proposed ranges are not applicable for all processes and are just a rough guideline.

Please contact us directly for a specific consideration of your modeling application.

### How can I investigate the concentration profiles inside the column?

Sometimes it is helpful to look at the concentration profiles inside the column to investigate the behavior of the different components or to see whether some components are not eluting completely. ChromX offers the possibility of exporting files showing the concentration profiles along the column for the whole process. The “Concentration file output” options can be found in the “Display/Output” tab. It is possible to export MAT or VTK files to work with in MATLAB® or a visualization program such as ParaView®. It is also possible to select the intracolumn rate unit (e.g. s, ml, …) and the output rate to determine how many concentration profiles should be exported during the whole process. After all is set, a simulation must be performed to finalize the export of the concentration files.

### How do I include components with concentrations in different units (such as HCP, DNA…)?

The most convenient way to include components that have concentrations in a different unit such as HCP and DNA is to add these components via a separate sensor. Either the information from the offline fraction analysis can be included as an artificial measurement signal or the offline analysis can be included as fraction data in absolute values. For the second option, it is also important to add a mock data column. This could also be a zero line. The corresponding artificial volume (time) column will be used as a grid to display the fraction information. Therefore, the volume steps need to be narrow enough to present a fine enough grid to depict the fraction information correctly. Moreover, it is important to weigh only the fraction information for parameter estimation. Otherwise, the mock measurement line will interfere with the model building.

### What are the kp and ks parameters?

The kp and ks parameters are used in the so-called generalized ionic exchange isotherm which is an extension of the widely used SMA isotherm. These additional parameters were introduced by Mollerup in 2008 and are used for the approximation of the activity coefficient. They describe the dependency of the equilibrium constant on the protein (kp) and salt (ks) concentration. A negative kp/ks means that keq decreases with an increasing protein/salt concentration, resulting in weaker binding. A positive kp/ks indicates the opposite.

This extension of the SMA isotherm can be helpful for describing complex peak shapes such as trapezoidal peaks. Also, it may be useful for accurately describing both step and gradient elutions when SMA fails to do so.

### What is the difference between the measurement factor and the concentration factor?

The measurement factor is derived from the Lambert-Beer law and is used to scale the injection concentration unit to the measurement unit. In most cases, this means scaling M to mAU. This is necessary to enable ChromX to determine the fluid dynamic and thermodynamic model parameters with inverse fitting using the chromatograms from the lab experiments. The measurement factor can be calculated using the extinction coefficient of the protein and the path length of the UV flow cell. If those are unknown, the measurement factor can also be adapted manually to match the model output area with the peak area of the experimental chromatogram. It is also possible to include the measurement factor in the parameter estimation.

The concentration factor is used for the SMA isotherm and its extensions (e.g. Mollerup 2008). The SMA isotherm requires a molar protein concentration for the calculation. The concentration factor is used to scale the injection concentration unit to the molar concentration. In many cases, the injection concentration is already in M, in which case the concentration factor is 1. If the injection concentration is in g/l, the concentration is the molar mass.

### How do I import the buffer ion concentrations?

When performing the buffer import to ChromX, it is important to keep in mind that not only the salt concentration in the buffer is of interest but also the concentration of dissociated buffer ions. The currently used isotherms do not distinguish between the various kinds of ions in the buffer but only the number of ions. Depending on the pH of the buffer solution, the buffering substance can be partially or even fully dissociated and therefore in an ionic state. The pH dependent concentration of dissociated ions can be calculated using the Henderson-Hasselbalch equation. This equation is also hidden behind the Erlenmeyer flask icon in the Import Wizard. The dissociated buffer molecules also contribute to the conductivity of the buffer solutions and need to be included in the calculations. As there is no differentiation between the different kinds of ions in the isotherm, the concentration of dissociated buffer ions needs to be added to the salt concentration and inserted in the “salt” component in ChromX.