Background: The increasing relevance of individual bile acids quantification in biological samples requires analytical standardization to guarantee robustness and reliability of laboratory results. We have organized the first international ring trial, carried out in 12 laboratories, to evaluate the newly developed LC-MS/MS–based test kit for bile acid analysis.
Methods: Each laboratory received a Biocrates® Bile Acids Kit including system suitability test (SST) protocol. The kit is designed to analyze 16 individual human and 19 mouse bile acids. A set of 9 human and mouse plasma samples was measured in replicates. Laboratories were first required to pass the acceptance criteria for the SST. Within the subset of laboratories passing SST criteria, we evaluated how many laboratories met the target criteria of 80% of reported values with a relative accuracy within the 70%–130% range and analytical precisions (%CV) below 30%.
Results: A total of 12 of 16 participating laboratories passed the SST as the prerequisite to enter the ring trial. All 12 laboratories were then able to successfully run the kit and ring trial samples. Of the overall reported values, 94% were within 70%–130% relative accuracy range. Mean precision was 8.3% CV. The condition of CV <30% was fulfilled by 99% of the reported values.
Conclusions: The first publically available interlaboratory ring trial for standardized bile acids quantification in human and mouse plasma samples showed very good analytical performance, within acceptance criteria typically applied in the preclinical environment. The kit is therefore suitable for standardized quantitative bile acid analysis and the establishment of reference values.
This article presents an effort toward standardization and harmonization in the analysis of individual bile acids, which has an utmost importance not only in metabolomics studies across different laboratories, but also in clinical diagnostics. For this purpose, the newly developed ultra-high pressure liquid chromatography (UHPLC) MS/MS–based kit was evaluated in terms of accuracy and precision in a ring trial. Sixteen sites, including university, hospital, government, and industry laboratories, were invited to participate. Twelve laboratories have met the system suitability test protocol and carried out the measurements. Accuracy, precision, and other quality assurance parameters, which are essential for the standardization, were all within expected ranges.
Bile acids are products of enzymatic oxidation of cholesterol (1) in the liver. They are stored in the bile duct (2), secreted in the duodenum, and can be found in the peripheral blood circulation. On the disruption of any process related to the production and metabolism of bile acids, the concentrations of bile acids in the peripheral bloodstream will be altered. Bile acid analysis in blood, as total bile acids testing, is therefore an established clinical parameter used for several disorders such as prognostic test for hepatitis C virus or testing for cholestasis during pregnancy. The total bile acid content is typically measured using immunoassay/enzymatic techniques. However, these immunoassay-based tests are not able to distinguish the individual bile acids in biological samples (3, 4).
Besides the long-established role in dissolving lipids and fat-soluble vitamins, bile acids play a relevant role in many processes and diseases. Their levels are affected by antibiotics, gut microbiota, metabolic syndrome, type 2 diabetes, Alzheimer disease, sepsis, colon cancer, nonalcoholic fatty liver disease, or other liver-related disorders (5–11). Bile acids, especially the secondary ones formed by the gut microbiota, can express the regulatory effects via nuclear farnesoid-X receptor (FXR) and the G protein–coupled receptor (TGR5) (12). They are also signaling molecules with diverse endocrine/paracrine functions and are known to regulate lipid and glucose metabolism, modulate energy homeostasis, promote cell proliferation and liver regeneration, and even induce programmed cell death (13–18).
The accurate quantitative measurement of individual bile acids and their conjugates is, therefore, essential in diagnostics and toxicology. It is of utmost importance in the development of metabolic signatures in preclinical pharmaceutical research for drug development (19). The quantitative analysis of individual bile acids in turn requires high accuracy and precision of laboratory results, regardless of location, instruments, and staff. The assay robustness and the interlaboratory comparability can be significantly improved by standardizing the entire process. This standardization should be realized as soon as possible in the early stage of metabolomics studies, which can help simplifying the transfer into the clinical routine later on. Harmonization also helps to establish the reference values, invaluable for diagnostic purposes (20–23). Because of the large chemical diversity, the wide concentration range, and the complexity of the biofluid matrices, LC-MS/MS is the method of choice for individual bile acid analysis (24–30).
We present here an effort toward standardization of individual bile acids analysis based on the development of the first widely available LC-MS/MS–based bile acids kit. The kit is able to measure simultaneously 16 human and 19 mouse bile acids using just 10 μL plasma. The validation of the kit for human and mouse plasma provides an important reference for drug development and translational medicine, where the need to transfer study protocols and experimental designs between species (from mouse to humans) is of paramount importance. The very low sample volume needed (10 μL) additionally fits well into experimental designs where the availability of sample volume is limited, e.g., newborns or mouse models.
To guarantee the quality of the kit, ring trials are mandatory to demonstrate its performance under real-life conditions (31). Here, we present the first international ring trial results based on 12 individual testing laboratories in North America and in Europe, which used the Biocrates Bile Acids Kit to analyze a set of human and mouse plasma samples. Using the kit, we also tentatively determined the normal concentration ranges for a number of bile acids in healthy individuals with a limited number of samples.
Materials and Methods
Standardized method for individual bile acid analysis
Analysis of the ring trial samples using the Bile Acids Kit (Biocrates Life Sciences) was performed as described in the manufacturer's instructions (32). In short, 10 μL of internal standards mixture was pipetted onto the filter spots suspended in the wells of the 96-well filter plate. This filter plate was fixed on top of a deep-well plate serving as a receiving plate for the extract (a combi-plate structure). Subsequently, 10-μL samples were pipetted on the spots, followed by nitrogen drying. Then 100 μL methanol was added to the wells, and the combi-plate was shaken for 20 min. The combi-plate was centrifuged to elute the methanol extract into the lower receiving deep-well plate, which was then detached from the upper filter plate. After adding 60 μL Milli-Q® water to the extracts and shaking briefly, the plate was ready for LC-MS/MS analysis. All target isobaric bile acids can be baseline separated under either high-performance liquid chromatography (HPLC)17 or ultra-high pressure liquid chromatography (UHPLC) conditions, whichever is available in the participating laboratory. UHPLC systems were used at a higher flow rate of 0.5 mL/min, enabling a shorter runtime, i.e., 5 min. Conventional HPLC systems used a reduced flow rate of 0.4 mL/min, resulting in a longer runtime of 11 min. A proprietary reversed-phased UHPLC column (Biocrates Life Sciences) was used. Chromatographic conditions (e.g., mobile phase compositions, gradients, column temperature) were described in details in the provided user manual. Mass spectrometric detection is accomplished with electrospray ionization in negative ion mode. Because most of the bile acids, especially the unconjugated ones, do not fragment well in collision-induced dissociation, the most intensive signals were obtained by scanning the parent ions. These signals were used as quantifiers in multiple reaction monitoring, achieved on the triple quadrupole mass spectrometer (MS). The selected scan mode differs from pure selected ion monitoring in the fact that collision energy is applied in the collision cell to induce the fragmentation of isobaric interferences. Weaker signals, arisen from fragments, were used as the qualifiers. For the quantification, a calibration set with 7 concentration levels and a mixture of 10 internal standards was used. The compound panel, corresponding internal standards, validity in human and mouse samples, and calibration ranges are given in Table 1. Single bile acids standards were purchased from Sigma-Aldrich Handels and Steraloids with purities higher than 99%. The calibrators were prepared gravimetrically, followed by sequential dilution. The concentrations, together with their uncertainty, of target bile acids in calibrators, are shown in Supplemental Table 1 in the Data Supplement that accompanies the online version of this article at http://www.jalm.org/content/vol1/issue2.
Material and test samples for the ring trail
A total of 16 laboratories (Supplemental Table 2 in the online Data Supplement) were invited to this interlaboratory testing trial. Each of these laboratories received a Bile Acids Kit together with an analytical column to set up the assay in its own facility. A detailed instruction manual for use, together with the data acquisition method as well as the quantification method, was also provided. Furthermore, to ensure the necessary instrument performance before running the kit, each laboratory received a detailed system suitability test (SST) protocol. A typical testmix chromatogram is shown in Fig. 1A. The main criteria for passing the SST are symmetrical peak shape, baseline separation of isobaric compounds, and reasonable signal-to-noise ratio (of at least 10) of target compounds.
A set of 9 samples was sent out on dry ice for interlaboratory comparison purposes. These were pooled human serum, human EDTA plasma, and mouse EDTA plasma samples, each with 3 samples at endogenous, spiked low, and spiked high concentration levels (see Supplemental Table 3 in the online Data Supplement). These concentration levels were designed to cover the entire range of normal and abnormal concentrations found in real human and mouse samples. The pooled materials, both human and mouse, were purchased from SeraLab. The samples, as well as the QC samples being part of the kit, were measured in 4 replicates to determine the within-run precision of measurements. QC samples were produced from pooled human plasma, charcoal stripped to eliminate endogenous bile acids, spiked to desired concentration levels, and lyophilized.
Data reporting and statistical processing
Each laboratory used the provided quantification method to process the chromatographic data. The concentration values were reported as Excel worksheets and returned to Biocrates. Further data processing and statistical evaluation were carried out on the aggregated data from all laboratories. The performance of each laboratory was accessed based on the relative accuracy and the reproducibility of its measurements. Here we define the relative accuracy as the fractional difference (in percentage) of concentration found by each laboratory compared to the target values, determined beforehand at Biocrates as mean concentrations from 10 measurements on 4 different platforms. The target values were not disclosed to the participants during the trial. Precision was based on the %CV of the 4 replicates measurement. Statistical evaluation was carried out with Microsoft Excel 2010 software (Microsoft Corporation).
The averages of replicate measurements are used for relative accuracy evaluation. Only values between the lower limit of quantification (LLOQ) and the upper limit of quantification (ULOQ) are taken into the statistical evaluation. The following parameters were calculated from the reported data set: the 1st and 3rd quartile (Q1 and Q3) and the interquartile range (IQR = Q3–Q1). Any value outside the range of from Q1 − 1.5 × IQR to Q3 + 1.5 × IQR is declared as outlier.
Individual plasma samples for the determination od bile acids profiles
For the determination of representative bile acids profiles, EDTA plasma samples of individuals, 10 humans and 10 mice, were purchased from in.vent (in.vent DIAGNOSTICA) and SeraLab, respectively. Each set of samples consisted of 5 female and 5 male individuals. Human samples were taken from fasting healthy adults between 28 and 56 years of age, all of whom provided consent.
Results and Discussion
System suitability test and instrument performance
An essential element of the Bile Acids Kit is the SST, in which the measurement of the included test mix is used to access the performance of the instrument before the actual sample preparation. The test mix chromatogram, together with the pressure profile with the (U)HPLC pump, have proven to be powerful system diagnostic tools. The SST data from one laboratory showed unacceptable peak tailing in the test mix chromatogram (Fig. 1B), which was later established as being caused by dead volume in one of the column selector valves. Two other laboratories experienced problems with their mass spectrometers when operating in negative mode. The instability of signals was found to be related to electronic problems, which could not be solved in time for data collection for the ring trial. Another extreme case of the nonconformity of the test protocol was that the laboratory in question attempted to use a high-resolution mass spectrometry, on which the Bile Acids Kit was not yet validated. For the reasons outlined above, 4 of initially 16 laboratories that had expressed their interest in participating in this study were excluded from the final statistical evaluation of the measurement values.
Comparison between laboratories results
The relative accuracy against target values and the within-run precision (%CV) of replicate measurements were used to evaluate the performance in each laboratory in this ring trial. The acceptance criteria for a laboratory to pass the ring trial were set as follows: at least 80% of all reported values show relative accuracy (evaluated on the mean of 4 replicates) within the 70%–130% range and a corresponding CV below 30%. These criteria were set based on the normally used values in ring trials for the analysis of endogenous compounds, such as the proficiency tests organized by the Referenzinstitut für Bioanalytik (Bonn, Germany). Depending on compounds and their concentrations these proficiency test allow maximum deviations typically between ±20% and ±60% (33).
According to the above-mentioned criteria, all 12 laboratories passed the ring trial. Only values measured above the LLOQ of the instrument used were considered in the calculation. In the majority of cases, calibrator level 1 was applied as the LLOQ. One laboratory, however, used a less sensitive mass spectrometer (3200 QTRAP®). In this case, calibrator levels 2 or even 3 were used as LLOQ.
Fig. 2 shows the box plot of relative accuracy and precision of all reported values between different laboratories. Relative accuracy values from all laboratories can be found in Supplemental Table 4 in the online Data Supplement. It can be seen that all 12 laboratories have fulfilled the acceptance criteria for passing the ring trial. The mean and/or median relative accuracy are well inside the 80%–120% range, smaller than the required 70%–130% range, and the corresponding CVs are <15%.
Fig. 2 also shows that for a ring trial not involving CLIA/clinical laboratories, the interlaboratory variability is relatively low. The source of variability that remains may include laboratory-provided reagents (noise/background from solvents, mobile phase modifiers), differences between equipment, and/or pipette calibration within laboratories. Because the participating laboratories are not good laboratory practices (GLP) certified, lots of sources of laboratory-based bias potentially exist.
Two of the participating laboratories showed the opposite trend in their measurements. Laboratory 1 showed the lowest relative accuracy of all. We speculate this could be because the prepared plate had to be stored in the refrigerator overnight before analysis due to instrument unavailability. The storage at a lower temperature (4 °C) might cause partial insolubility of target compounds in the extracts. Moreover, the plate was not shaken properly before the rerun of the sequence later. No detailed instruction on the shaking of the stored plate was given at the time of the ring trial because the original protocol did not anticipate this eventuality. The instruction was added to the kit user manual later. This step might lead to a different behavior of calibrators, which were measured at the beginning of the sequence and the real samples, which were measured later, when the temperature was stabilized. On the other side of the spectrum, laboratory 9 showed relative accuracy at the higher end of the 70%–130% range compared to all other participants. Evidence suggests this was most likely due to a small problem with the autosampler, causing instability of the injection volume. The CV of replicate measurements, however, was not affected due to the use of internal standards.
To compare the results reported by each participating laboratory with target values predetermined in the kit manufacturer laboratory, the weighted Deming regression analysis was carried out. The regression parameters, together with calculated biases, are given in Supplemental Table 5 in the online Data Supplement. While no systematic error in measurements can be detected (negligible intercepts of regression equations in almost all cases), proportional bias is observed in some laboratories (the slopes of regression significantly deviated from 1). However, the median bias is under 30% limit in all laboratories. This confirms the observation shown in Fig. 2.
The irreproducibility of the internal standard, especially during the sample preparation, might affect the precision of the measurements more seriously as in case of laboratory 12, where the overall CV of replicates is slightly increased compared to other laboratories. Here the internal standard was pipetted onto the plate using single pipetting, not with a multistep/repeater pipette as recommended in the manufacturer's manual. It is a generally appreciated fact that the transfer of 10 μL methanol-based internal standard solution by standard pipette is generally less precise than using a multistep/repeater pipette. The final %CV of the analysis was still within the acceptance criteria. The results summary of individual laboratories is given in Supplemental Table 6 in the online Data Supplement.
Comparison between samples
Similar statistical evaluation has been carried out for individual samples. The relative accuracy against the predetermined target values is rather equivalent for all test samples (see Supplemental Fig. 1 in the online Data Supplement). The “less accurate” sample was the pooled mouse sample at endogenous concentration level with 84% of reported values within 70% and 130% relative accuracy range. The other samples at endogenous concentration levels were human plasma and human serum samples with 88% and 92% of reported values within 70% and 30% relative accuracy range. All other samples showed “better accuracy” with more than 92% of reported values inside the above-mentioned range. This is logical because samples at endogenous concentration levels contain several compounds that are present at concentrations near the LLOQ, such as cholic acid, trichloroacetic acid, and lithocholic acid (LCA) in human samples, and chenodeoxycholic acid, glycocholic acid, and LCA in mouse samples. Therefore, the %CVs in replicate measurement of these samples are also higher than that in spiked samples (Supplemental Fig. 1 in the online Data Supplement).
Comparison between compounds
To compare the measurement performance between compounds in the bile acids panel, the reported values of measurements are grouped by individual bile acids (Fig. 3). Again the relative accuracy seems to be equivalent for all individual bile acids. The %CV of replicate measurements is, however, significantly higher for LCA, although it still stays in the acceptable level with 93% of reported values under 30% limit. This is due to the fact that LCA is the most hydrophobic bile acid and coelutes with a lot of background contaminants, mainly phospholipids, left over after sample preparation. These contaminants strongly influence the signal quality of LCA and lead to its higher %CV in replicate measurements.
Comparison of the bile acids profiles of human and mice
Through the interlaboratory ring trial described above, the Bile Acids Kit has been shown to provide accurate and precise measurements in human and mouse samples. A limited number of each type of samples (n = 10) has been measured to establish the differences in their phenotypes. The profile of individual bile acids in healthy human adults is given as box plots in Fig. 4A. For comparison, the bile acid profile of mice is depicted in the lower panel of the same Fig. (Fig. 4B).
The total bile acid content, calculated as the sum concentration of all measured individual bile acids, is given in Table 2. Total bile acid content is currently the most widely accepted parameter in clinical tests based on immunoassay. Table 2 shows the statistical evaluation, where the total bile acid content and the ratio of glycine and taurine conjugates (G/T) as well as primary and secondary (P/S) bile acids are calculated. Due to the discrete nature of the measured values from the limited number of individual samples (n = 10 each) involved, the range can be defined by the whiskers calculated for the box plot construction. In this case, the upper whisker (upper range limit) will be the maximum value measured or the up extension by 1.5 × IQR from the 3rd quartile Q3, whichever is lower. The lower whisker (lower range limit) is the minimum value measured or the down extension by 1.5 × IQR from the 1st quartile Q1, whichever is higher. The concentration ranges of individual bile acids (Supplemental Table 7 in the online Data Supplement) determined by this way is rather rudimentary. Still, these concentration ranges serve as preliminary values because it does not have to rely on the assumption of a normal distribution of the sample set. A more accurate determination of reference concentration ranges will need a larger number of samples with stricter control of the health status of donors, such as age, diet, time of sampling, etc. These parameters were not considered in this small set of samples and as such result in rather large interindividual variations in bile acid concentrations. This fact has been observed before (25). The measured total bile acids concentration in human, however, varied from 0.9 to 8.1 μmol/L, which agrees well with the commonly accepted range for healthy human adults of 2–10 μmol/L (34). The average G/T ratio also is in agreement with the recent findings (35) that G/T ratio increases with age. In the aforementioned publication the authors have stated that the G/T ratio starts around 1 with newborns, progress to 3 with 1–5 years of age, and stabilizes at around 7 in adolescence (11–19 years old). Here we have established G/T ratio of 9.2 in average for adults.
Table 2 shows that the total bile acid content in mouse plasma (after excluding one outlier) ranged from 2.9 to 21.7 μmol/L, which is significantly higher than that of human plasma. It is interesting to compare the G/T ratio between human (mean = 9.2) and mouse (mean = 0.2). While this parameter has a value far above 1 in human plasma, which means human blood contains much more glycine conjugated bile acids than taurine conjugated ones, the situation appears to be essentially inverted in mice. The P/S ratios (ratio of total nonconjugated primary and secondary bile acids) are, however, similar for the two species, i.e., 1.6 in average for human and 1.8 in average for mice. This represents another difference between mouse and human metabolomics phenotype that should be considered when using mouse model in translational medicine.
This newly developed, simple, and robust Bile Acids kit proved to be a reproducible method to measure bile acids between laboratories within the context of a ring trial. Very good performance in terms of bias against the predetermined target values and precision of replicate measurements in this ring trial has shown that the kit is well suited for the standardization and harmonization in individual bile acids measurements. The kit does appear to be a suitable method for helping to establish reference bile acid concentration range in humans and mouse biofluids, though many more samples need to be analyzed to establish these ranges. Additionally, we observe that the total content of bile acids in mouse plasma is significantly higher than that in human. Taurine conjugates are prevalent in mouse samples while the glycine conjugates are present at much higher concentration in human. The balance between primary and secondary bile acids does not appear to differ between mouse and human plasma.
The following additional individuals also contributed equally to the execution of the ring trial as other coauthors: Jerzy Adamski,11 Donatella Caruso,9 Michael Daxböck,1 Annette Garbe,7 Doreen Kirchberg,1 Fiona Liddicoat (Waters Wilmslow, UK), M. Arthur Moseley,6 Herbert Oberacher,2 Martin Post,8 Florence I. Raynaud,3 Denis Reynaud,8 Simon Schafferer,1 and Ines Zitturi.1 The superscripted numbers denote their affiliations (see title page).
↵17 Nonstandard abbreviations:
- high-performance liquid chromatography
- ultra-high pressure liquid chromatography
- system suitability test
- lower limit of quantification
- interquartile range
- lithocholic acid
- ratio of glycine and taurine conjugates.
Authors' Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form.
Employment or Leadership: H. T. Pham, Biocrates Life Sciences; T. Koal, Biocrates Life Sciences.
Consultant or Advisory Role: M. Rauh, Biocrates.
Stock Ownership: None declared.
Honoraria: None declared.
Research Funding: None declared.
Expert Testimony: None declared.
Patents: None declared.
Role of Sponsor: No sponsor was declared.
- Received May 9, 2016.
- Accepted July 1, 2016.
- © 2016 American Association for Clinical Chemistry