Protocol and overview
The AMH2 study [Anti-Müllerian Hormone – At My Home (AMH2)] was a head-to-head-to-head within-person crossover trial conducted in Boston, Massachusetts. This study design was selected because each subject serves as their own control, effectively reducing inter-subject variation and improving statistical power. The different serum tests were completed sequentially during the same session, ensuring the samples were directly comparable with no further need for randomization. Institutional research ethics board (IRB) approval was obtained from Ethical & Independent Review Services IRB, a third-party IRB accredited by the Association for the Accreditation of Human Research Protection Programs . The device examined in this trial was the TAP II by YourBio Health (formerly Seventh Sense Biosystems). The trial was registered on ClinicalTrials.gov as NCT04784325 . No changes were made to the methods after commencement of the trial.
Figure 1 outlines the differences between these two devices. The TAP device is a small unit that attaches via suction to the back of the patient’s arm and uses a microneedle array to pierce capillaries close to the surface of the skin, collecting the specimen in a vial that then detaches from the device. In comparison, the ADx card collects a few drops of blood that a patient obtains from a lanced fingertip, and blood is then diluted during lab processing for analysis.
The most notable clinical difference between devices are the collection of whole blood (TAP) versus dried blood (ADx), the use of microneedles (TAP) versus finger-prick (ADx).
Recruitment and study procedure
Recruitment utilized social media platforms, email recruitment messages, announcements within women’s professional networks, referrals from physicians on Turtle Health’s Medical Advisory Board, and outreach to subjects in Turtle Health’s SELF-HELP (Sonograms Enable Looking Forward – Home Examinations Led by Providers) ultrasound validation study . Subjects were all women between the ages of 20 and 39 (inclusive), able to freely give consent electronically, spoke native or fluent English, had a high school degree or equivalent, and lived within driving distance from Boston. Sponsor employees, women who had recently give birth or had fewer than 3 postpartum menstrual cycles, were currently pregnant or possibly pregnant, and with known bleeding disorders or coagulopathies were excluded.
In total, 69 women were screened, and 41 participated. Of the 28 patients who were screened but not enrolled, 7 were unable to enroll due to inadequate space in their age cohort, 8 were in the screening process when the study completed, 7 never responded after receiving a consent form electronically, and 1 was not eligible because she did not live nearby. An additional 8 women were found to not meet study inclusion criteria. 5 women withdrew after signing the consent form but prior to the start of the trial, with reasons including contracting COVID-19, scheduling difficulties, and failure to respond to follow-up. An additional 7 women who expressed interest in the study never completed the screening questionnaire provided.
This number of participating subjects (N = 41) was over two times the sample size per the Clinical Laboratory Improvement Amendments (CLIA) lab requirements, and provided over 80% power to detect a one-half standard deviation difference in device performance. Inclusion criteria included healthy women aged 20–39 who were able to consent electronically given COVID, which for the purposes of this study was defined as those who speak native or fluent English, have a high school degree or equivalent, and were within driving distance of Boston. Exclusion criteria included Turtle Health employees, women who did not speak English natively or fluently, and postpartum women who had fewer than 3 postpartum menstrual cycles.
For each subject, the study consisted of a single visit to a medical office where blood was drawn using three modalities sequentially in the same ~ 30-minute session: two self-administered TAPs, one self-administered ADx card, and one venipuncture vial drawn by a professional phlebotomist. For both TAP and ADx card samples, participants were instructed to follow the written patient labeling and received no additional verbal direction by the trial coordinator outside of trial-specific instructions. The draw order was consistent for each participant. Samples were de-identified of patient data and labeled with assigned identification numbers. One TAP for each woman was 2-day shipped to the processing lab via the United Parcel Service (UPS®) in Turtle Health’s commercial packaging to simulate the shipping process that would be required of at-home consumers should this product reach the market. All other samples (remaining TAP, ADx card, and venipuncture vial) were hand-delivered to the lab within 6 hours of the blood draw. The venipuncture sample served as the reference standard for AMH for each subject and was processed by the lab upon receipt. Both TAPs and ADx samples were processed by the lab at t = 72 hours.
Samples were processed at BioAgilytix, an independent, Boston-based laboratory, using the Roche Elecsys® AMH assay . One shipped TAP sample, two non-shipped TAP samples and one ADx card were designated “quantity not sufficient” (QNS) for processing by the lab. These 4 QNS results out of 164 assays were not included in the following analyses. Other non-QNS data from subjects with a QNS results remain in the results other than for pairwise comparisons, which consist of only pairs where both results were obtained.
Favorable treatment for the ADx comparator device
The ADx 100 card was chosen as a comparator to the TAP device given its widespread use in home AMH analysis kits. As the current standard of care, ADx 100 card was subject to the most favorable treatment possible throughout study setup, testing, and lab processing to enhance its clinical relevancy.
ADx cards were stored and processed in accordance with manufacturer instructions. Part of those instructions include a lab-dependent correction factor on test results, which typically read much lower than whole-blood samples. To ensure most favorable treatment, the most generous assumptions plausible under manufacturer guidance were applied: first by precisely normalizing ADx results to venipuncture to remove any directional bias (a ~ 16.2x dilution factor, calculated with the trial data itself), and then by simulating an ideal total protein correction. Typically, the final result of an ADx AMH blood test is adjusted by a factor based on total protein. Manufacturer guidance suggested that this factor was no higher than a 15% adjustment for 99.9% of samples, but can vary by lab. To obtain the best theoretically possible result, every normalized ADx sample was adjusted to be 15% closer to the precise venipuncture result (or adjusted to be equal to the venipuncture result, if the difference between the two was < 15% of the normalized ADx result). For example, if venipuncture obtained a 1.20 ng/mL reading and the ADx card obtained a 1.00 ng/mL reading after dilution adjustment, the ADx result was adjusted to 1.15 ng/mL for the purpose of the analysis.
Analysis was performed using the 2019 version of Microsoft Excel . Results from the ADx card were adjusted to ensure most favorable treatment under manufacturer guidance as explained above. Results obtained from the shipped TAP device were all reduced by a constant ~ 5.6%, as AMH increases slightly in stored blood samples over time and requires an experimental correction to remove bias from the assay . Given that minor variations of AMH levels due to shipping times have little impact on clinical categorization and as the TAP devices were consistently shipped via a 2-day guaranteed courier service, a constant correction factor was deemed sufficient for the purposes of this comparison study . Similar to the ADx results, this factor was also calculated in-sample based on the observed average difference between venipuncture and the shipped TAP device. This estimate was consistent across samples, with a standard deviation of only 0.035.
The primary endpoint of the study was the correlation of AMH concentration obtained via the gold standard of venipuncture with the TAP device and the ADx card, respectively, demonstrating each device’s ability to replicate per-person findings that would have been obtained in-clinic. Prior to the main study, precision validation showed that both intra and inter-assay precision were within 1.69% across 65 replicates. Given this consistency, venipuncture on this assay was treated as the reference standard in the remainder of the trial. No changes were made to the trial outcomes after the trial commenced.
To assess the likely patient impact of the use of each device, categorical agreement of results was examined. Using previous literature and published nomograms, practicing physicians pre-defined age and hormonal birth control adjusted thresholds for results, so that each AMH result was categorized as either “Very low” (<5th percentile), “Low” (5th–10th percentile), “Normal” (25th–75th percentile), or “High” (>75th percentile) [17,18,19]. The rate of agreement between categories across collection methods was then calculated. Furthermore, results below the 10th percentile for the subject’s age and birth control status were deemed to be clinically significant, as they would typically lead to a referral to a reproductive endocrinologist (REI).
Age-based percentiles were determined based on previous studies [17, 18]. However, these large-scale studies of population AMH levels did use different assays than that used in this trial. As such, they may not reflect the true percentile of a subject, but do provide an external source for results that are potentially clinically relevant in a broad population. Additionally, as the populations represented in both the Shebl et al. (2011) and Tehrani et al. (2014) studies were not on hormonal birth control, and because hormonal birth control is known to reduce AMH levels in the body, percentiles for patients on hormonal birth control were adjusted (lowered) using a factor derived from the Birch Peterson et al. (2015) study . Positive predictive value and negative predictive value for each device (given this threshold) were calculated, as well as the false positive and false negative rates. False positive results occur when a patient is wrongly identified as having an AMH result lesser than the tenth percentile, triggering a referral to a REI when not otherwise necessary. False negative results occur when a patient is wrongly identified as having an AMH result greater than the tenth percentile, in which case they may miss out on a timely assessment by a REI.
Patient preference was also examined. The direct discomfort from the blood draw was assessed using the NRS-11 Pain Scale, a validated, single dimension, 0–10 scale, with clarifying benchmarks to help respondents identify their relative levels of pain [20, 21]. For example, a 0 indicates “No pain. Feeling perfectly normal.” and a 3 indicates “Very noticeable pain, like an accidental cut, a blow to the nose causing a bloody nose, or a doctor giving you an injection.” Scores were aggregated and averaged across each modality.
Overall patient experience was assessed using Net Promoter Score (NPS), an established and validated measure of preference in a variety of domains, both commercial and clinical. Net Promoter Score measures customer satisfaction with a product or experience . When completing the measure, respondents are asked: “How likely are you to recommend this product to a friend or colleague?” Scores of 9–10 are promoters, 6 and below are detractors, and 7–8 are neutral. Percent detractors are then subtracted from percent promoters; the score range is from − 100 (all detractors) to + 100 (all promoters). The Net Promoter Score indicates a patient or consumer’s satisfaction with an experience compared to plausible alternatives.