Abstract
The biological effect of wear of articulating surfaces is a continued concern with large joint replacements and, likewise, of interest for total disc replacements. There are a number of important biotribological testing parameters that can greatly affect the outcome of a wear study in addition to the implant design and material selection. The current ASTM and ISO wear testing standards/guides for spine arthroplasty leave many choices as testing parameters. These factors include but are not limited to the sequence of kinematics and load, phasing, type of lubricant, and specimen preparation (sterilization and artificial aging). The spinal community should critically assess wear studies and be cognizant of the influence of the selected parameters on the test results.
The bone and joint sequelae associated with wear of articulating surfaces are continuing concerns of total joint replacement, and are similarly highlighted for total disc replacements (TDR). For TDRs, this focus is primarily based on the expectations of even longer implantation lifetimes because of the implantation in younger patients than total joints, difficulties with anterior revision surgery, and the presence of periprosthetic neural elements. Although the literature helps to avoid past total joint errors and accelerate designs, research is still needed to adapt this technology to spinal motion preservation. Because there are gaps in our understanding of biomechanics and wear behavior of TDRs, variations in test methods have resulted. Standardized methods are being adopted, but the variety in spinal motion devices has prompted manufacturers to customize wear testing techniques. This can lead to concerns about the relevance of some test methods. A thorough discussion of total disc wear testing methodology may help compare wear test methods and, therefore, interpret results in terms of test validity. Although not within the scope of this paper, the biocompatibility of wear debris, which depends on particle size, shape, and composition, is a critical factor in interpreting wear test results. Wear rates of devices with different materials cannot be directly compared due to different biological response, particle portability, and biological byproducts.
The biotribological performance of devices depends on many factors, including implant inputs, such as bearing materials (metals, ceramics, polymers, and elastomers) and bearing design, and test conditions, such as, applied motions, loads, and fluid environment. In general, soft bearing wear, such as with polyethylene and other polymers (eg, poly[aryl-ether-ether-ketone] [PEEK]), tends to be dominated by adhesive wear while metal-on-metal (MOM) bearing wear, with cobalt alloy or stainless steel, is dominated by abrasive and surface fatigue wear. Wear mechanisms determine the type of damage, volume of wear, wear rate, particle size, and overall trends such as steady-state behavior and run-in, and can be sensitive to test conditions. For example, cyclic, unidirectional (reciprocating curvilinear) motions represent the greatest challenge for MOM bearings, because of roughening due to abrasive wear,1, 2 but the least challenge for polyethylene bearings due to polymer chain alignment.3, 4
Wear test methods must also consider bearing kinematics. Ideally, devices are cycled so that they accurately replicate implanted motion patterns, which requires careful fixturing and test frame design. For conforming, fixed center of rotation (COR) devices such as the ProDisc-L Total Disc Replacement (Synthes, West Chester, PA) and Maverick Artificial Disc (Medtronic, Memphis, TN), the device's COR is usually aligned with the simulator's COR. Variable COR devices, like those of the Charite Artificial Disc (DePuy Spine, Raynham, MA) or the Prestige Cervical Disc (Medtronic, Memphis, TN), may be setup in multiple ways, which may produce significantly different motion patterns. For example, a Charite disc may be placed so that one or both of the core's bearing surfaces slide, and a Prestige disc may be fixtured to either slide or roll in its trough. Which setup is chosen should depend on the device's demonstrated in vivo behavior.
Although the fluid environment has been shown to affect total joint wear,5–8 little is known about the fluid volume, content, or turnover in the disc space, post-discectomy. The fluid volume could vary in vivo with the formation of a pseudocapsule, as the device would be immersed; otherwise, the disc space would be merely wet. There is some evidence from disc retrievals that suggests the fluid environment contains protein content,9 but the exact content is unknown. Additionally, little is known with regards to the fluid turnover. If the fluid is replenished, the particles may be removed from the bearing region to local tissues; or if the fluid is static, the particles could be recaptured by the bearing and accelerate wear as third bodies. As research into answering these questions continues, the currently accepted practice for wear testing is to immerse the implants in a bovine serum solution with a protein concentration up to 30 g/L.10 The test fluid is replaced every 500,000 to 1,000,000 cycles to minimize the effect of serum degradation on wear and inspect the test specimens.
There is currently 1 wear test standard and 1 wear test standard guide for TDRs. Although both reflect a consensus of the organizations’ members from industry, academia, medicine, and regulatory agencies, these standards developed independently and arrived at different procedures using different philosophies. While generally quite alike, they differ in their scope and kinematics. The American Society for Testing and Materials, International (ASTM), ASTM F2423-05 Standard Guide for Functional, Kinematic, and Wear Assessment of Total Disc Prostheses11 is a guide and, therefore, less specific in its protocol than a Standard Test Method. It encompasses both articulating bearings and elastomeric devices and allows for motions to be applied in either unidirectional or multidirectional (coupled) paths. (It should be noted that the ASTM Standard Guide, like other standards, continues to evolve.) It prescribes larger ranges of motion, close to the maximum ranges of healthy individuals. In contrast, the International Standards Organization (ISO) test method, ISO18192-1:200810 is specific for sliding bearings and prescribes multidirectional, relatively lower ranges of motion, reflecting the ISO committee's expectation of actual in vivo usage. It should be noted that although attempts have been made to make the test methods relevant to physiologic conditions, neither document is a performance standard and both documents caution the user that clinical performance may differ from the test results. The documents advise considering other testing methods to assess other potential failure mechanisms and even different wear conditions. Both standards assess just 1 of the 4 modes of wear defined by McKellop.12 Wear can be produced by different surface interactions, and both standards investigate only the intended wear mode, as opposed to third-body wear, impingement wear, or extraneous wear due to micromotion against the vertebral endplate.
Influential wear parameters
There are a number of important biotribological testing parameters, such as load and kinematics, test fluid media, and specimen preparation, that can greatly affect the outcome of a wear study. For in vitro biotribological evaluations of TDR to be clinically relevant, these testing parameters must be carefully selected.
Load and kinematics
Load and kinematics can influence the wear, wear rate, and type of wear mechanism generated in a wear test. The load and motion profiles, which essentially describe the direction and extent to which one component slides over the other under a described compressive force, is typically controlled by the user's selection of the amplitude, waveform (typically a constant or cyclical load), phasing (ie, timing of the motion in one direction against that in another), and specimen orientation. Specimen orientation can introduce shear loads between the articulations, as recommended by the ISO standard10; but the effect may depend on the implant design. The test frequency combined with total device range of motion determines the sliding speed and distance, which can, in turn, affect surface temperatures, lubrication, and wear (volume and mechanisms). Proper selection of these parameters will allow the implant to be evaluated in a realistic, in-service state.
The bearing biomaterial and the type and magnitude of motion between the articular components are of great importance with respect to implant wear. For example, crossing- path motion, which occurs when a specific location on the implant is subjected to motion in different directions during a wear cycle, can influence wear. The analysis of wear tracks on explanted ball-in-socket lumbar TDRs suggests crossing-path motion.9, 13 This result is not surprising, based on the published literature characterizing lumbar spinal motions for various activities of daily living.14–16 The analysis of wear tracks on cervical TDRs suggests curvilinear motion for a ball-in-trough design17 and asymmetrical motion patterns for a ball-in-socket design.18 It is important to consider the biomaterial, bearing design, and spine location when selecting the type of motion to apply in a wear test.
The proposed motions in the ISO 18192-1 wear standard10 lead to crossing-path motion for both the lumbar and cervical test conditions. However, because the phasing between the 3 degrees of freedom (flexion-extension, lateral bending, and axial rotation) are not defined in the ASTM F2423-05 guidance document,11 it is possible to obtain both curvilinear motion or crossing-path motion depending on the input profile selected by the user. Additionally, the specified range of motion differs between the two standards which will affect the total sliding distance and wear (Table 1).
It has been shown that MOM articulations made from cobalt alloys demonstrate a propensity for self-polishing under crossing-path motion, usually resulting in less wear than with curvilinear motion.1, 2 In contrast, wear of ultrahigh- molecular weight polyethylene (UHMWPE) is substantially lower when tested under curvilinear compared to crossing-path motion conditions.3 This finding is typical for noncrosslinked polymeric material, because of the preferential alignment of the polymer chains with the direction of motion.19–21 In simulator and retrieval studies of hip arthroplasty, it has been shown that curvilinear motion can underestimate the wear rate of UHMWPE in vivo and, possibly, overestimate MOM wear rates.22
While flexion extension (FE), lateral bending (LB), and axial rotation (AR) are applied simultaneously per the ISO 18192-1 wear standard10 (Figs. 1 and 2), various test conditions are allowed in the ASTM F2423-05 guidance document.11 Per the latter, the user can impose each degree of freedom (flexion-extension [FE], lateral bending [LB], and axial rotation [AR]) sequentially (on the same device), concurrently, or as a combination of both, as long as each motion is applied for 10 million cycles (MC). This results in a testing duration of 30 MC in the first case, 10 MC in the second, and 20 MC in the third. This sequential testing order is not only expected to have significantly different results from a simultaneously multidirectional wear test, but as described above, the difference in results will also vary according to materials. The type and magnitude of wear will be affected by the testing sequence, as the topography of the articular surfaces will change after every motion, resulting in surface damage prior to the next test sequence.2
In addition to kinematics, the loading conditions play an important role on the wear behavior of TDRs. The waveform and magnitude of load affect the type of lubrication and contact stresses. An increase in load or test frequency can raise the flash temperature, which, in turn, can alter the lubricant properties and influence the wear mechanism. As temperature increases, the viscosity changes and the serum loses its lubricating properties5; and when it exceeds a certain temperature, the proteins can decompose, especially under shear.23 For polymers such as polyethylene, as the flash temperature approaches the glass transition temperature, a sharp decrease in the hardness and Young's modulus can occur, resulting in greater wear.24 For metallic components, the increased contact pressure associated with a higher load can increase adhesive and abrasive wear between the surfaces through subsurface fatigue.25
The nature of the applied load, ie, having a constant or varying magnitude, can have a negative or positive influence on wear. A cyclic load can lead to fatigue and fracture of the asperity tips in the contact area between the articular surfaces. However, cyclic load can possibly reduce wear by generating squeeze film lubrication if there is biological fluid present unlike a constant load. An increase in the frequency of the motion profiles can increase the potential for lubrication. As relative speed between the mating components increases, so does the entrainment velocity. It should be noted that most TDR wear tests are conducted at frequencies of 1 Hz or higher, mostly to expedite a 10 million cycle wear test. This contrasts with the expected in vivo cycle frequency, which may be less than 1 Hz. The greatest effect the sliding speed can produce on wear is to change the dominant wear mechanism, especially from boundary lubrication to fluid film lubrication. However, fluid film lubrication may not be possible in most articulating TDRs due to the implant geometry and sliding distance, suggesting that the frequency is less important. There is, however, another set of competing factors: while the surface sliding speed of TDRs, even at 2 Hz, is lower than that of hips (due to the respective radii), nearly the entire surface of a TDR is worn throughout the cycle (assuming very low diametric clearance in hard-on-hard bearings), which is in contrast to hips which have bearing regions that experience no stress at some point in the wear cycle.
Lastly, there have been very few reports describing the activities of daily living for lumbar or cervical spines. Although there is some understanding of the load magnitudes26, 28 and the extent of motion,29–34 the motion combinations and their daily frequency is not well reported35–37 or understood. Without a good understanding of this daily activity, and without long-term well-placed clinical retrievals, the predictive value of wear test methods may be limited.
So far, the influence of dynamics on wear regardless of the equipment has been discussed. In reality, different simulator designs have distinct Euler angle sequencing between the motions, ie, because a simulator mechanically applies one motion on top of another, a specific final orientation results which is different from a simulator using a different mechanical linkage,38 although the differences may be minor.39 As with all comparisons, one should exercise caution when comparing the biotribological results from different simulators.
Test fluid media
Review of in vitro wear simulation literature in regards to hip and knee testing has clearly shown that the type of lubricant used has a significant affect on both the magnitude of wear and morphology of the wear particles.40–42 The large joint literature indicates that use of nonphysiological lubricants, such as deionized water and saline, can lead to wear, which is highly unrepresentative of in vivo results.40, 43–45 Physiological based lubricants with protein concentrations similar to that found in human synovial fluid, coupled with appropriate load and motion inputs, have been shown to allow for close predictions of in vivo wear performance for total hip and knee replacements.
Today, within the hip and knee community, it is generally accepted that bovine and/or calf serum is an acceptable lubricant for in vitro wear testing,42, 46 where the simulator fluid ideally mimics the synovial joint proteins, in type and concentration. That being said, there is still considerable discussion on the specifics of the fluid. Wang et al.40 demonstrated the influence of protein concentration on the wear rate of UHMWPE, with no measurable wear when no proteins were present and clinically relevant wear rates with protein concentration from 5 to 25 g/L for a synovial hip joint. Studies have shown that wear rates can differ by up to 15% within this range. Schwenke et al.,41 in a recent study, showed a 50% difference in wear rates of total knee components, using the same bovine serum but with different protein concentrations and additives. They speculated that additives can affect the coefficient of friction at the articular surfaces and may even alter material properties of the implant directly, thereby directly affecting wear of the components. For metal-on-metal bearings relying on boundary lubrication, a similar sensitivity to protein concentration is expected due to the importance of fluid film thickness47 which depends on fluid viscosity. In general, lubricant plays a critical role in determining the accuracy and validity of simulator testing.
In addition to protein concentration, additives are commonly employed to treat the serum for specific issues related to the benchtop test environment. These additives stabilize and prevent protein degradation, thus minimizing bacterial and fungal contamination. Ethylene-diaminetetraacetic acid (EDTA) can be added to serum to bind to the calcium in the bovine serum, thus minimizing precipitation of calcium phosphate onto the bearing surfaces. While these additives are more standardized and less controversial than the serum concentration, failure to include them may cause surface changes affecting friction and wear properties.5
Review of ASTM and ISO standards for wear testing of total hip, total knee, and total disc components shows significant differences in their prescribed lubricants (see Table 2). While bovine/calf serum has become the standard, protein concentration and type and amount of additives have not been fully defined. Regarding TDRs, little is known about the fluid volume or content, and the standards have been based off of knowledge gained from the large joint industry. This may not fully translate to the intervertebral disc space after implantation.
Specimen selection and preparation
Most devices come in various sizes in order to accommodate patient anatomy. With some designs, the bearing size or shape may also change. Hip wear studies have shown that changing the bearing size (ie, bearing radius) significantly affects wear results.52 It is important to consider whether the worst-case device is being tested, and whether all potential failure modes are being addressed. For example, wear testing of the Discover Artificial Cervical Disc (DePuy Spine, Raynham, MA),53 which features 2 different sizes of metal-on-polyethylene bearings, showed that while the smaller diameter bearing had a lower wear volume, the larger diameter bearing had a smaller wear rate through its thickness (akin to hip liner penetration). For metal-on-metal devices, hip studies have shown the importance of manufacturing tolerances.47 In particular, the diametric clearance has been shown to affect steady-state wear rate.54, 55 In addition, studies have shown decreasing wear rates for increasing bearing radii47, 55–58 and decreasing surface roughness.59 For these devices, wear testing should probably be performed on the smallest devices.
The preparation of specimens for a wear test will depend on the materials used in the TDR. For all components regardless of materials, pretest characterization can include, but is not limited to surface profilometry, dimensional analyses, material characterization, and photodocumentation. Additional consideration must be taken with polymeric specimens, as several factors, such as sterilization methods, oxidation potential and presoaking, can influence wear test results.
There are several different sterilization methods, such as ethylene oxide, gas plasma and gamma radiation at various doses, that can influence the wear of polymers. Modern sterilization and packaging techniques minimize the effects of shelf aging,60, 61 but implantation may cause some level of oxidative changes of the exposed surfaces similar to that seen in retrieved hip liners. As with hips and knees, these changes may cause some decrease in mechanical properties and potentially an increase in wear rate. Artificial aging procedures can be used to precondition test specimens in order to accelerate oxidative degradation that would replicate shelf-aging or in vivo oxidation during a wear test. Consideration of the type of polymer,62, 64 fabrication methods, and ability of the polymer to degrade must be made when preparing polymeric components.
Wear assessment
The standardized method for assessing wear is gravimetric weight loss. Volumetric wear is subsequently determined by dividing the mass loss with the density of the specimen material. Although this is usually straightforward, in some cases it can become complicated, such as when determining volumetric loss in devices with a different material coating because the coating may have a different density than the substrate, thus making wear assessment difficult after a delamination event or wear through of the coating. For all materials, a high-precision scale with accuracy and reliability greater than ±10 µg is employed to ensure detection of very small losses in weight. Even still, variations due to specimen cleaning, environmental conditions, and operator technique may cause apparent “negative wear” in extremely low-wearing devices. To help ensure accuracy and repeatability of the scale, standard calibrated masses are weighed before and after the wear test specimens.
Fluid uptake by polymeric specimens is capable of masking wear. This error, due to fluid sorption, can be reduced by presoaking the polymeric TDR specimens until they reach saturation, which may require several weeks. Additionally, load soak control specimens are used to compensate for the further increase in fluid sorption occurring under cyclic loading. These control specimens are cleaned and weighed along with test specimens to determine the remaining fluid sorption. Vacuum drying prior to weight assessment is also used typically to minimize this potential error source.
Several other methods can be used to further characterize wear and help identify wear mechanisms through the observed damage modes. Surface characterization can be performed both macroscopically and microscopically to look for the presence of damage modes such as burnishing, abrasion, scratching, pitting, plastic deformation, fracture, fatigue damage, and embedded debris. The amount of surface damage may not correlate with the amount of wear. For example, a polyethylene component may exhibit considerable surface damage, even though very little wear has occurred.65 Conversely, if the polyethylene is undergoing rapid wear, any scratches or pits that happen to form in the contact zone may soon be polished out, leaving a smooth surface with little damage. Confusion may also occur with the appearance of metal bearings and actual wear, as large areas of abrasion may not result in significant mass loss. Surface characterization typically includes profilometry and photodocumentation which are used to monitor the size, shape and roughness of the wear scar.9
Additionally, serum samples obtained during a wear test can be analyzed by inductively coupled plasma mass spectrometry (ICP-MS) to determine metal ion concentration.56, 66 This method has been successfully used with very low wearing metal-on-metal hips to determine wear, because it removes any variability from cleaning and handling of the specimens. Chemical tracers67 may also be used to determine wear from the fluid bath for polymer articulations. Tracers are added to polymer powder before manufacturing, and as the polymer specimen wears the tracer is released into the test serum. Both metal ion and chemical tracer levels correlate with wear and are indicative of the wear curve phase (eg, run-in or steady-state). Several 3-D mapping techniques (coordinate measuring machine, microCT, radiostereometric analysis, etc.) have been developed to determine wear or penetration rate with varying degrees of accuracy, depending on the method and equipment, and have also been used in retrieval analysis to validate in vitro wear tests. For polymers, the dimensional measurements will include changes due to creep and wear, which can be separated through analysis of load soak controls.
Lessons learned from the large joint industry can be applied to TDR, but additional research in the areas of biomechanics and wear behavior of TDRs is still ongoing in order to fully adapt this technology to spinal motion preservation. Currently, wear tests are best used to compare designs, which is the explicit purpose of the current test standards; but, ideally, tests are developed to simulate implantations, which reflect the totality of the in vivo usage. Simulations will be validated by correlating the preclinical in vitro wear tests results with explant analyses, including surface characterization, wear penetration rates, and wear particle size and shape. Ideally, tests replicate long-term and successful implantions, as well as predict early failure modes. Without long-term explant results with various designs, one cannot necessarily declare that one device has superior wear resistance over another. Better in vitro performance does not necessarily mean better in vivo performance; both may be acceptable.
Despite the limitations on the current knowledge base on TDR wear testing, and the variety of parameters that may be employed, the rates and case reports of osteolysis are low68, 69 and retrievals linked to implant wear, either from implant damage leading to instability or height loss or from inflammation from debris, are also low. Long-term clinical results from various designs and materials are needed. Nevertheless, wear testing is a critical gating item for preclinical evaluation. Research will improve the predictive power of test results and thus discriminate between dangerous, acceptable, and over-engineered designs.
Conclusions
The biotribological performance of TDRs depends on many factors, including bearing design and material selection. Additionally, there are a number of important biotribological testing parameters (eg, kinematics and load, phasing, test fluid medium, etc.), that can greatly affect the outcome of a particular wear study. The spine wear testing standards/guides leave many open-ended choices that will influence results (eg, test sequence). It is essential to compare the test method to what is expected in vivo. Does the test method reflect a genuinely possible worst-case condition for that material and design? The spinal community needs to be critical in their assessment of wear test results so that clinicians can better judge implant designs. Retrievals need to be thoroughly studied for all designs. Furthermore, the interpretation and comparisons of wear results across device designs, laboratories, and standards need to be made with an understanding that some tests better reflect in vivo usage than others.
- © 2009 SAS - The International Society for the Advancement of Spine Surgery. Published by Elsevier Inc. All rights reserved.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-Noncommercial 3.0 Unported License, permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.