Buffer subtraction
Subtraction of the scattering contribution from the solvent allows to obtain the scattering data
corresponding to scattering from biomacromolecules of interest. Previously averaged data of the
buffer is subtracted from averaged sample data and subtracted curve is used for data merging further
analysis.
At P12 the buffer subtraction is performed automatically by the data processing pipeline. Manually
this could be performed using program DATOP as a stay alone version or as is implemented in the
primusqt interface in ATSAS package.
In this example averaged buffer SAXS curve will be subtracted from averaged sample curve and
concentration-normalized using primusqt interface:
Figure 1. Buffer subtraction and concentration normalization. Averaged scattering curves of the
sample and buffer plotted in primusqt interface. Subtraction is performed by clicking the "Subtract"
button on the "Operation" tab. Resulting subtracted data could be concentration-normalized, by
adding the concentration value [mg/ml] in "Conc." column of the "Document" window.
Figure 2. Buffer subtraction. Subtracted dataset is created and saved with automatic file name
prefix "sub". Subtracted concentration-normalized data correspond to scattering from
biomacromolecules or complex of interest and could be used for data merging of further analysis.
Data merging
Scattering data obtained from samples of different concentration, exposure times or angular range
could be merged to obtain dataset with optimal signal:noise ratio.
Generally, scattering data measured from highly concentrated samples are less noisy in higher
angles, but scattering at low angles could be affected by concentration effects as repulsion or
attraction. On the other hand data from low concentration samples contains higher noise level at
higher angular range, but not affected by concentration effects at low angles, see Fig. 1. Usually,
optimal angular range is selected for each concentration and merged subsequently. By merging the
buffer-subtracted, concentration-normalized scattering data one can obtain optimal data set for
further analysis.
Scattering data from concentration series could be extrapolated to zero concentration to obtain
scattering data corresponding to infinite dilution. Data extrapolated to zero concentration are free of
any scattering contribution due to interparticular interaction. Extrapolation to zero concentration
make sense only when no concentration dependent changes in oligomeric state (i.e. Rg) are
observed.
The selection of the optimal angular range could be performed “manually”, based on subjective
criteria of noisiness of individual user or by using available programs (e.g. ALMERGE or SAXS
Merge) for automated merging based on statistical methods.
In the first example (Fig. 1-3) two datasets from samples at different concentration will be manually
merged, in the second example (Fig. 4-5) datasets from concentration series will be extrapolated to
zero concentration, both using the primusqt interface:
Figure 1. Concentration effect and subjective evaluation of the noise level in the scattering data.
Buffer-subtracted concentration-normalized scattering curves of protein sample measured at two
different concentration (3.75 and 1.25 mg/ml) plotted in primusqt interface. The higher
concentration curve is shifted upwards (intensity is multiplied by scaling factor 13.0) to clearly
illustrate the noise level. The scattering curve of high concentration sample is affected by
interparticle repulsion effect exhibited as characteristic decay of intensity at low angles. The
scattering curve of the low concentration sample (magenta) is not affected by interparticular
repulsion, but is noisier (less smooth and with higher error), particularly in the higher angular range.
Figure 2. Selection of optimal angular range of scattering data prior data merging. By changing
the data-point number in the "From" and "To" column in the "Document" window, the optimal
angular range for each dataset is selected. In this case approx. min - 0.15Å-1
is selected from the low
concentration dataset and 0.1 - 0.43Å-1
from the high concentration dataset. The intersection part
(approx. 0.1-0.15Å-1
) will be averaged in the merged dataset. Merging is performed by clicking the
"Merge" button in the "Operation" tab.
Figure 3. Merged scattering dataset. Merged dataset is created and saved with automatic file name
prefix "merge".
Figure 4. Extrapolation to infinite dilution. Two buffer-subtracted, concentration-normalized
datasets opened in primusqt interface. The extrapolation to zero concentration is performed by
clickong the "Extrapolate" button in the "Operation" tab.
Figure 5. Extrapolation to infinite dilution. The dataset extrapolated to zero concentration is created
and saved with default file name prefix "zc". Scattering data extrapolated to infinite dilution are
suitable for further analysis.
Guinier analysis
Guinier analysis is one of the first steps in SAXS data evaluation, following initial data processing
steps as averaging, buffer-subtraction and concentration-normalization, etc.. Guinier analysis
provide information as radius of gyration of the particle, sample condition (monodispersity,
aggregation, repulsion) and forward scattering intensity, which is proportional to molecular weight
of the biomacromolecule.
André Guinier showed that in very low angles the intensity decay is proportional to radius of
gyration regardless the particle shape. For monodisperse globular particles, the Guinier
approximation is given by I(q) = exp( − Rg2
s2
/ 3). Radius of gyration (Rg) is mechanic size
parameter describing the distribution of mass of the particle. Rg could be defined as root mean
square distances of the excess electron density to the center of gravity of the particle.
Guinier analysis is performed in Guinier plots, where the scattered intensity on natural logarithmic
scale is plotted as a function of scattering vector square (Fig.1). In Guinier region (limited to
maximal scattering vector s<1.3/Rg) the scattering intensity could be fitted by straight line (Fig.1).
The slope of this line is proportional to particle Rg (see the Guinier approximation eq.) and by
extrapolation to zero angle the forward scattering intensity is obtained (I(0)), see molecular weight
estimation. If the Guinier plot in the Guinier zone is not linear, sample is considered to be
aggregated or interacting by intramolecular repulsion (Fig.2). Scattering data from aggregated
samples should not be further analyzed and attention should be focused on sample preparation.
Note, linear Guinier zone is not a proof of monodispersity of the sample: oligomeric mixtures or
samples of complexes containing free subunits exhibit linear Guinier behavior and medium values
of Rg and I(0), see polydisperse systems.
In this example the Guinier regions will be inspected using primusqt interface:
Figure 1. Determination of Rg and I(0) by Guinier analysis. The automated Rg determination
procedure is performGuinier analysis is one of the first steps in SAXS data evaluation, following
initial data processing steps as averaging, buffer-subtraction and concentration-normalization, etc..
Guinier analysis provide information as radius of gyration of the particle, sample condition
(monodispersity, aggregation, repulsion) and forward scattering intensity, which is proportional to
molecular weight of the biomacromolecule.
André Guinier showed that in very low angles the intensity decay is proportional to radius of
gyration regardless the particle shape. For monodisperse globular particles, the Guinier
approximation is given by I(q) = exp( − Rg2
s2
/ 3). Radius of gyration (Rg) is mechanic size
parameter describing the distribution of mass of the particle. Rg could be defined as root mean
square distances of the excess electron density to the center of gravity of the particle.
Guinier analysis is performed in Guinier plots, where the scattered intensity on natural logarithmic
scale is plotted as a function of scattering vector square (Fig.1). In Guinier region (limited to
maximal scattering vector s<1.3/Rg) the scattering intensity could be fitted by straight line (Fig.1).
The slope of this line is proportional to particle Rg (see the Guinier approximation eq.) and by
extrapolation to zero angle the forward scattering intensity is obtained (I(0)), see molecular weight
estimation. If the Guinier plot in the Guinier zone is not linear, sample is considered to be
aggregated or interacting by intramolecular repulsion (Fig.2). Scattering data from aggregated
samples should not be further analyzed and attention should be focused on sample preparation.
Note, linear Guinier zone is not a proof of monodispersity of the sample: oligomeric mixtures or
samples of complexes containing free subunits exhibit linear Guinier behavior and medium values
of Rg and I(0), see polydisperse systems.
In this example the Guinier regions will be inspected using primusqt interface:
Figure 1. Determination of Rg and I(0) by Guinier analysis. The automated Rg determination
procedure is performed by clicking the "Radius of Gyration" button in the "Analysis" tab. New
window "Primus Guinier Wizard" appears, where the Rg and I(0) is estimated from the Guinier plot
in valid scattering vector range (sRg<1.3)
Figure 2. Detection of aggregation and repulsion by Guinier zone inspection. Two different datasets
are plotted in "Primus Guinier Wizard". Left: if large aggregates are present in the samples, the
typical increase of the scattering intensity is observed in low angles. This could be detected as nonlinear
Guinier zone, illustrated as of upswing fit residuals (green). SAXS data from samples
containing aggregates are not suitable for further analysis. Attention should be driven to improving
the sample quality, as dilution series, centrifugation or altering the buffer and purification
conditions. Right: interparticle repulsive interaction is observed as typical decrease of scattering
intensity at low angles. This could be detected as downswing of fit residuals in Guinier zone. The
repulsion is usually concentration-dependent at could be avoided by dilution.
ed by clicking the "Radius of Gyration" button in the "Analysis" tab. New window "Primus Guinier
Wizard" appears, where the Rg and I(0) is estimated from the Guinier plot in valid scattering vector
range (sRg<1.3)
Figure 2. Detection of aggregation and repulsion by Guinier zone inspection. Two different datasets
are plotted in "Primus Guinier Wizard". Left: if large aggregates are present in the samples, the
typical increase of the scattering intensity is observed in low angles. This could be detected as nonlinear
Guinier zone, illustrated as of upswing fit residuals (green). SAXS data from samples
containing aggregates are not suitable for further analysis. Attention should be driven to improving
the sample quality, as dilution series, centrifugation or altering the buffer and purification
conditions. Right: interparticle repulsive interaction is observed as typical decrease of scattering
intensity at low angles. This could be detected as downswing of fit residuals in Guinier zone. The
repulsion is usually concentration-dependent at could be avoided by dilution.
Molecular weight wstimation
Most straightforward way to estimate the molecular weight is to use the relation between molecular
weight (MW) and Porod volume given by: MW[kDa]≈ Vp[nm3
]*0.625. The factor 0.625 called
“magic number” is known from experimental praxis. This approach used for scattering data from
well folded monodisperse protein solutions results in MW estimation with error less than 20%. Such
a precision is sufficient for rapid estimation of the oligomeric state or to distinguish the complex
formation from mixture of its subunits. Similar approximation for nucleic acids is given by
MW[kDa]≈ Vp[nm3
].
Another way to estimate the MW of protein of interest is to use the Guinier extrapolation of forward
scattering intensity I(0) of protein standard, as BSA (bovine serum albumin) or lysozyme. The MW
estimation of the protein is given by I(0)protein/I(0)standard≈MWprotein/MWstandard. This approach requires
two SAXS measurements and precise concentration determination of the protein and standard
solution.
Another tool for rapid MW estimation is SAXS MoW, available as a web service. SAXS Mow
algorithm uses Porod volume to estimate MW. As a input serves the scattering data on a relative
scale in form of P(r) function file (*.out) obtained from program GNOM, see pair-distance
distribution function. Usually, this approach results in MW estimation with error less than 10%.
In this example, the MW of the protein of interest (expected MW=12 kDa) will be estimated using
the forward scattering I(0) of the protein standard (Fig. 1-3) and compared with the MW obtained
from Porod volume using the MW≈Vp*0.625 approximation (Fig. 4):
Figure 1. The Guinier extrapolation of forward scattering of protein standard. The forward
scattering of the buffer-subtracted, concentration-normalized scattering data from BSA (cBSA=25.64
mg/ml; in in 50 mM HEPES, KCl 50 mM) was extrapolated using Guinier analysis. The
extrapolated value I(0)=59.33 is highlighted in red.
Figure 2. The Guinier extrapolation of forward scattering of protein of interest. The forward
scattering of the buffer-subtracted, concentration-normalized scattering data from protein of interest
(cPROTEIN=mg/ml) was extrapolated using Guinier extrapolation. The extraplated value I(0)=14.78 is
highlighted in red.
Figure 3. The MW estimation using the forward scattering I(0) of protein standard. MW of the
protein standard BSA is 69kDa. By solving the approximation
I(0)protein/I(0)standard=MWprotein/MWstandard, the MW of the protein of interest is estimated: MW≈17.2
kDa. Expected MW of this protein is 12 kDa.
Figure 4. The MW estimation using the Porod volume. The Porod volume Vp=23.77 Å3
of the
protein of interest was determined using primusqt interface. By solving the MW≈Vp*0.625
approximation, the MW=14.9 mg/ml of the protein of interest is estimated.
Kratky analysis
Flexibility and compactness of the biomacromolecule could be qualitatively evaluated by inspection
of Kratky plot, where s2
I(q) is plotted as a function of s (Fig. 2.). The scattering intensity of a
compact, globular particles decay proportionalyy to s-4
, what could be observed as a bell-shaped
curve in the Kratky plot. Scattering intensity of unfolded macromolecules as intrinsically disordered
proteins (IDP) decays slower, e.g. random chain proportionally to s-2
, what could be observed in
Kratky plot as plateau followed by monotonic increase. Scattering intensity of partially unfolded
macromolecules as multi-domain proteins with flexible linkers exhibits intermediate behavior in
Kratky plot (Fig. 2).
Estimation of the folding state by inspection of Kratky plot is routine step of SAXS data. Kratky
plot analysis is used for detection of flexibility, in folding/unfolding experiments, etc. Note,
scattering intensity of rigid but elongated particles decay slower (proportionally up to s-1
), thus
“flexible-like” shape of scattering data in Kratky plot should be considered as indication, rather than
the proof of flexibility.
In this example three typical behavior of scattering data in Kratky plot will be illustrated using
primusqt interface:
Figure 1. SAXS datasets of rigid, flexible and multi-domain protein with flexible linkers. Re-plotting
scattering data into Kratky plot is performed by selecting the “I*s^2 vs. s (Kratky plot)” from the
“Plot” option in the top menu.
Figure 2. Detection of protein flexibility by inspection of Kratky plot. The Kratky plot of the well
folded, compact protein (red) exhibits a clearly defined maxima in the bell-shaped curve. Flexible
polypeptide chain of IDP (green) doesn’t exhibit this clear maxima, rather the plateau following by
increase with higher s values. The Kratky plot of the multi-domain protein complex with flexible
linkers (blue) exhibits intermediate shape.
Porod volume
Volume of the studied particles could be determined from the scattering data. Günther Porod shows
the asymptotic decay of the scattering intensity at high s range. The integral of Q = ∫s2
[I(s) - K]ds is
called Porod invariant (Q), where K is a constant determined to ensure the asymptotical intensity
decay proportional to s-4
at higher s range. The Porod invariant Q it is related to the volume (Vp) of
the particle by Vp=2π2
I(0)/Q., where I(0) is the forward scattering intensity, see Guinier analysis.
The Porod volume is informative for well folded macromolecules, while Porod volume of flexible
macromolecules will appear higher than the real volume. Determined Porod volume of well folded
protein macromolecules is proportional to molecular weight by MM ≈Vp*0.625.
The Porod value could be determined by program DATPOROD stay alone or as is implemented in
primusqt interface.
In this example the Porod volume will be determined using primusqt interface:
Figure 1. Porod volume determination. The Porod volume value is determined by clicking the
“Distance Distribution” button in the “Analysis” tab.
Figure 2. Porod volume determination. New window “Primus Distance Distribution Wizard”
appears. The units of the determined Porod Volume value are the same as used in the scattering
data. In this case cubic Ångströms.
Pair distance distribution function
Indirect Fourier transform of the scattering data results to the pair-distance distribution function of
the single macromolecule. The pair-distribution function P(r) describes the distribution of distances
between pairs of points (electrons) within the macromolecule. By defining correct P(r) function the
maximal chord length of the particle (Dmax) is obtained. The P(r) is used for shape restoration
experiments using the ab initio modeling programs.
In ideal case of monodisperse solution of not interacting homogenous particles, the pair-distance
distribution function is related to scattering intensity by P(r)=r/2π2
∫[sI(s) sin (sr)]ds, where I is
scattering intensity, s is scattering vector and r is distance in real space. To solve this equation the
precise scattering intensity measurement in angular range from zero to infinity is needed. In
practice, the scattering intensity is measured in limited angular range and containing inherent
statistical and systematic errors. The Indirect Fourier methods were developed to overcome this
problems using regularized scattering data and iterative parameterization. By definition the P(r)
function starts smoothly from zero at p(0) and should terminate smoothly to zero at r=Dmax.
Deviation from zero value at p(0) could be caused by incorrect background-subtraction. Not smooth
ends of P(r) function and/or multiple peaks and minima could be sign of incorrectly estimated
Dmax (Fig. 3).
At P12 is the P(r) function determined automatically by the data processing pipeline. P(r) could be
determined “manually” using program GNOM or DATGNOM stand alone or as it is implemented in
the primusqt interface.
In first example the P(r) function is determined (Fig. 1-2), in the second example the incorrectly
determined P(r) function is shown (Fig. 3). Both using the primusqt interface:
Figure 1. Pair-distance distribution function. Buffer-subtracted, concentration-normalized
scattering data opened in the primusqt interface. The automatic P(r) function estimation is
performed clicking the “Distance Distribution” button in “Analysis” tab.
Figure 2. Pair-distance distribution function. New window “Primus Distance Distribution Wizard”
appears, where the P(r) function is plotted on the right side. In the plot on the left the fit of the
Fourier transform of the determined P(r) to experimental data is shown. Determined Dmax of the
particle is highlighted. Estimation of the Dmax value could be interactively changed and the P(r)
and back-fit to the experimental data is updated in real-time. The P(r) function file is saved with
automatic file extension “.out”. This output file is subsequently used for the shape restoration
experiments by ab initio methods. The units of the Dmax value are the same as used in the
scattering data. In this case Ångströms.
Figure 3. Incorrectly determined P(r) function. Analysis of scattering data of poor quality or by
manual under/over estimation of Dmax could result in incorrect not smooth P(r) function with
multiple peaks and minima. Such a P(r) function should be discarded and not used in further
analysis.