| Title: | Hierarchical Adaptive 'RT-QuIC' Classification for Complex Matrices |
|---|---|
| Description: | Extends 'RT-QuIC' (Real-Time Quaking-Induced Conversion) statistical analysis to complex environmental matrices through hierarchical adaptive classification. 'KWELA' is named after a deity of the Fore people of Papua New Guinea, among whom Kuru, a notable human prion disease, was identified. Implements a 6-layer architecture: hard gate biological constraints, per-well adaptive scoring, separation-aware combination, Youden-optimized cutoffs, replicate consensus, and matrix instability detection. Features dual-mode operation (diagnostic/research), auto-profile selection (Standard/Sensitive/Matrix-Robust), RAF integration for artifact detection, matrix-aware baseline correction, and multiple consensus rules. Methods include energy distance (Szekely and Rizzo (2013) <doi:10.1016/j.jspi.2013.03.018>), CRPS (Gneiting and Raftery (2007) <doi:10.1198/016214506000001437>), SSMD (Zhang (2007) <doi:10.1016/j.ygeno.2007.01.005>), and Jensen-Shannon divergence (Lin (1991) <doi:10.1109/18.61115>). This package implements methodology currently under peer review; please contact the author before publication using this approach. Development followed an iterative human-machine collaboration where all algorithmic design, statistical methodologies, and biological validation logic were conceptualized, tested, and iteratively refined by Richard A. Feiss through repeated cycles of running experimental data, evaluating analytical outputs, and selecting among candidate algorithms and approaches. AI systems ('Anthropic Claude' and 'OpenAI GPT') served as coding assistants and analytical sounding boards under continuous human direction. The selection of statistical methods, evaluation of biological plausibility, and all final methodology decisions were made by the human author. AI systems did not independently originate algorithms, statistical approaches, or scientific methodologies. |
| Authors: | Richard A. Feiss IV [aut, cre] (ORCID: <https://orcid.org/0009-0008-0409-6042>) |
| Maintainer: | Richard A. Feiss IV <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-15 10:10:25 UTC |
| Source: | https://github.com/cran/KWELA |
Extends 'RT-QuIC' (Real-Time Quaking-Induced Conversion) statistical analysis to complex environmental matrices through hierarchical adaptive classification. 'KWELA' is named after a deity of the Fore people of Papua New Guinea, among whom Kuru, a notable human prion disease, was identified.
No return value. This is a package documentation page.
Biological constraint filter
Profile-dependent adaptive transforms
Separation-aware score combiner
Youden-optimized threshold
Treatment-level classification
Matrix interference override
Default mode. Deterministic classification from TTT/MP/RAF evidence only. No stochastic rescue or score blending.
Full adaptive architecture with stochastic rescue and distributional scoring.
kwela_analyzeMain analysis function
kwela_summarizeTreatment-level summary
kwela_diagnosticsDiagnostic information
compute_instability_flagsMatrix instability detection
kwela_plate_normalizeMulti-plate normalization
kwela_bootstrap_summaryBootstrap confidence intervals
cohens_d computes Cohen's d effect size between two distributions.
assess_separation evaluates PC/NC separation quality and recommends
a classification profile. Cohen's d thresholds are heuristic, calibrated
against three RT-QuIC datasets.
cohens_d(x, y) assess_separation(pc_mp, nc_mp, pc_ttt, nc_ttt)cohens_d(x, y) assess_separation(pc_mp, nc_mp, pc_ttt, nc_ttt)
x |
First distribution (numeric vector) |
y |
Second distribution (numeric vector) |
pc_mp |
Positive control MP values |
nc_mp |
Negative control MP values |
pc_ttt |
Positive control TTT values |
nc_ttt |
Negative control TTT values |
cohens_d returns a single numeric value (positive = x > y),
or NA_real_ if insufficient data.
assess_separation returns a list with:
Cohen's d for MP
Cohen's d for TTT
Median of absolute d values
Separation regime: "poor_separation", "moderate_separation", or "strong_separation"
Recommended profile: "matrix_robust", "sensitive", or "standard"
set.seed(42) pc_mp <- rnorm(8, 100, 10) nc_mp <- rnorm(8, 20, 5) pc_ttt <- rnorm(8, 8, 1) nc_ttt <- rnorm(8, 72, 5) cohens_d(pc_mp, nc_mp) assess_separation(pc_mp, nc_mp, pc_ttt, nc_ttt)set.seed(42) pc_mp <- rnorm(8, 100, 10) nc_mp <- rnorm(8, 20, 5) pc_ttt <- rnorm(8, 8, 1) nc_ttt <- rnorm(8, 72, 5) cohens_d(pc_mp, nc_mp) assess_separation(pc_mp, nc_mp, pc_ttt, nc_ttt)
Evaluates 6 deterministic metrics to detect matrix interference that compromises classification reliability. A treatment is flagged as unstable when it behaves unlike both positive and negative controls.
compute_instability_flags( trt_mp, trt_ttt, pc_mp, nc_mp, pc_ttt, nc_ttt, trt_raf = NULL, trt_mp_raf = NULL, pc_raf_mp_ratios = NULL, crossing_threshold = NA_real_, strictness = "moderate", has_raf = FALSE )compute_instability_flags( trt_mp, trt_ttt, pc_mp, nc_mp, pc_ttt, nc_ttt, trt_raf = NULL, trt_mp_raf = NULL, pc_raf_mp_ratios = NULL, crossing_threshold = NA_real_, strictness = "moderate", has_raf = FALSE )
trt_mp |
Finite MP values for treatment wells |
trt_ttt |
TTT values for treatment wells (may contain NA/Inf) |
pc_mp, nc_mp
|
Positive/negative control MP values (finite) |
pc_ttt, nc_ttt
|
Positive/negative control TTT values |
trt_raf |
RAF values for treatment wells (NULL if unavailable) |
trt_mp_raf |
MP values corresponding to RAF wells (NULL if unavailable) |
pc_raf_mp_ratios |
RAF/MP ratios from positive controls (NULL if unavailable) |
crossing_threshold |
TTT crossing threshold for this treatment's group |
strictness |
"moderate" (2+ flags), "strict" (1+), or "lenient" (3+) |
has_raf |
Whether RAF data is available |
The six metrics evaluated are:
Fano factor deviation from positive controls
Crossing variability (ambiguous threshold crossing rate)
Wasserstein distance from BOTH controls (normalized by NC spread)
Energy distance from BOTH controls (normalized by NC spread)
TTT dispersion ratio vs positive controls
RAF-MP ratio inconsistency vs positive controls
All metrics are deterministic (no random number generation).
List with components: unstable (logical), reasons
(character vector), n_flags (integer), min_flags_required
(integer), metrics (named list).
set.seed(42) flags <- compute_instability_flags( trt_mp = rnorm(8, 50, 20), trt_ttt = rnorm(8, 40, 15), pc_mp = rnorm(8, 100, 10), nc_mp = rnorm(8, 20, 5), pc_ttt = rnorm(8, 8, 1), nc_ttt = rnorm(8, 72, 5), crossing_threshold = 40, strictness = "moderate" )set.seed(42) flags <- compute_instability_flags( trt_mp = rnorm(8, 50, 20), trt_ttt = rnorm(8, 40, 15), pc_mp = rnorm(8, 100, 10), nc_mp = rnorm(8, 20, 5), pc_ttt = rnorm(8, 8, 1), nc_ttt = rnorm(8, 72, 5), crossing_threshold = 40, strictness = "moderate" )
Computes comprehensive stochastic metrics for a sample compared to controls.
compute_stochastic_profile(sample_values, pc_values, nc_values)compute_stochastic_profile(sample_values, pc_values, nc_values)
sample_values |
Numeric vector of sample measurements |
pc_values |
Numeric vector of positive control measurements |
nc_values |
Numeric vector of negative control measurements |
List containing comprehensive stochastic metrics:
Basic sample statistics
CV ratios vs controls
Fano factor ratio vs PC
Normalized position between NC and PC variance
SSMD vs controls
Signal-to-noise ratio vs NC
Jensen-Shannon metrics
Energy distance metrics
Wasserstein metrics
Log-space Euclidean distances
set.seed(42) pc <- rnorm(20, 100, 10) nc <- rnorm(20, 20, 5) sample <- rnorm(20, 80, 15) profile <- compute_stochastic_profile(sample, pc, nc)set.seed(42) pc <- rnorm(20, 100, 10) nc <- rnorm(20, 20, 5) sample <- rnorm(20, 80, 15) profile <- compute_stochastic_profile(sample, pc, nc)
Proper scoring rules for evaluating probabilistic forecasts.
crps_empirical(forecast_samples, observation) dawid_sebastiani(observation, predicted_mean, predicted_var) log_predictive_score(observation, predicted_mean, predicted_sd) interval_score(observation, lower, upper, alpha = 0.1)crps_empirical(forecast_samples, observation) dawid_sebastiani(observation, predicted_mean, predicted_var) log_predictive_score(observation, predicted_mean, predicted_sd) interval_score(observation, lower, upper, alpha = 0.1)
forecast_samples |
Numeric vector of samples from forecast distribution |
observation |
Single observed value |
predicted_mean |
Predicted mean |
predicted_var |
Predicted variance |
predicted_sd |
Predicted standard deviation |
lower |
Lower bound of prediction interval |
upper |
Upper bound of prediction interval |
alpha |
Nominal miscoverage rate (default 0.1 for 90% interval) |
All functions return a single numeric value or NA_real_ if inputs
are invalid.
crps_empirical returns a non-negative score (lower is better).
dawid_sebastiani returns the DS score (calibration metric).
log_predictive_score returns negative log density.
interval_score returns width plus penalties for miscoverage.
Gneiting T, Raftery AE (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102(477):359-378.
set.seed(42) forecast <- rnorm(50, 5, 1) crps_empirical(forecast, 5.5) dawid_sebastiani(5.5, 5, 1) log_predictive_score(5.5, 5, 1) interval_score(5.5, 3, 7, alpha = 0.1)set.seed(42) forecast <- rnorm(50, 5, 1) crps_empirical(forecast, 5.5) dawid_sebastiani(5.5, 5, 1) log_predictive_score(5.5, 5, 1) interval_score(5.5, 3, 7, alpha = 0.1)
Functions for quantifying stochastic variability in RT-QuIC data.
cv(x) fano_factor(x) ssmd(treatment, control) normalized_stochastic_index(treatment_var, pc_var, nc_var) snr_vs_control(treatment, control)cv(x) fano_factor(x) ssmd(treatment, control) normalized_stochastic_index(treatment_var, pc_var, nc_var) snr_vs_control(treatment, control)
x |
Numeric vector |
treatment |
Numeric vector of treatment values |
control |
Numeric vector of control values |
treatment_var |
Treatment variance |
pc_var |
Positive control variance |
nc_var |
Negative control variance |
All functions return a single numeric value or NA_real_ if
insufficient data.
cv returns the coefficient of variation (MAD-based).
fano_factor returns the variance-to-mean ratio.
ssmd returns the strictly standardized mean difference.
normalized_stochastic_index returns a value in [0, 1].
snr_vs_control returns the signal-to-noise ratio.
Zhang XHD (2007). A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics 89(4):552-61.
set.seed(42) x <- rnorm(30, 10, 2) y <- rnorm(30, 5, 2) cv(x) fano_factor(x + 10) ssmd(x, y) snr_vs_control(x, y) normalized_stochastic_index(var(x), 4, 1)set.seed(42) x <- rnorm(30, 10, 2) y <- rnorm(30, 5, 2) cv(x) fano_factor(x + 10) ssmd(x, y) snr_vs_control(x, y) normalized_stochastic_index(var(x), 4, 1)
Geometric distance metrics for comparing distributions.
energy_distance uses deterministic quantile subsampling for
large datasets (no RNG dependency).
energy_distance(x, y, max_n = NULL) wasserstein_1d(x, y) log_euclidean_distance(x, y) mahalanobis_distance(sample_params, reference_mean, reference_cov)energy_distance(x, y, max_n = NULL) wasserstein_1d(x, y) log_euclidean_distance(x, y) mahalanobis_distance(sample_params, reference_mean, reference_cov)
x |
First distribution (numeric vector) |
y |
Second distribution (numeric vector) |
max_n |
Maximum sample size before subsampling (NULL = no limit) |
sample_params |
Named numeric vector of sample parameters |
reference_mean |
Named numeric vector of reference means |
reference_cov |
Covariance matrix of reference distribution |
All functions return a single non-negative numeric value or
NA_real_ if insufficient data or computation fails.
Szekely GJ, Rizzo ML (2013). Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference 143(8):1249-72.
set.seed(42) x <- rnorm(30, 0, 1) y <- rnorm(30, 2, 1) energy_distance(x, y) wasserstein_1d(x, y) log_euclidean_distance(exp(x), exp(y)) mahalanobis_distance(c(1, 2), c(0, 0), diag(2))set.seed(42) x <- rnorm(30, 0, 1) y <- rnorm(30, 2, 1) energy_distance(x, y) wasserstein_1d(x, y) log_euclidean_distance(exp(x), exp(y)) mahalanobis_distance(c(1, 2), c(0, 0), diag(2))
entropy computes Shannon entropy from discretized distribution.
jensen_shannon computes symmetric Jensen-Shannon divergence
with Laplace smoothing.
entropy(x, n_bins = 10) jensen_shannon(p, q, n_bins = 15)entropy(x, n_bins = 10) jensen_shannon(p, q, n_bins = 15)
x |
Numeric vector |
n_bins |
Number of histogram bins |
p |
First distribution (numeric vector) |
q |
Second distribution (numeric vector) |
entropy returns entropy in nats, or NA_real_ if fewer
than 3 values.
jensen_shannon returns JSD bounded in [0, ln(2)], or NA_real_
if either distribution has fewer than 5 values.
Lin J (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37(1):145-151.
entropy(rnorm(100)) set.seed(42) p <- rnorm(50, 0, 1) q <- rnorm(50, 2, 1) jensen_shannon(p, q)entropy(rnorm(100)) set.seed(42) p <- rnorm(50, 0, 1) q <- rnorm(50, 2, 1) jensen_shannon(p, q)
Implements hierarchical adaptive classification for 'RT-QuIC' data:
Biological constraint filter
Profile-dependent transforms
Separation-aware combiner
Youden-optimized threshold
Treatment-level classification
Matrix interference override
kwela_analyze( data, pc_pattern = "\\bPositive\\s*Control\\b|^POS\\b|\\bPC\\b", nc_pattern = "\\bNegative\\s*Control\\b|^NEG\\b|\\bNC\\b|\\bTDB\\b|\\bBlank\\b", spiked_pattern = "\\+\\s*Pos|Pos\\s*\\d*%|spiked|\\+\\s*CWD", profile = c("auto", "standard", "sensitive", "matrix_robust"), consensus = c("majority", "strict", "flexible", "threshold"), consensus_threshold = 0.5, matrix_groups = NULL, use_raf = TRUE, mode = c("diagnostic", "research"), instability_check = TRUE, instability_strictness = c("moderate", "strict", "lenient"), verbose = TRUE )kwela_analyze( data, pc_pattern = "\\bPositive\\s*Control\\b|^POS\\b|\\bPC\\b", nc_pattern = "\\bNegative\\s*Control\\b|^NEG\\b|\\bNC\\b|\\bTDB\\b|\\bBlank\\b", spiked_pattern = "\\+\\s*Pos|Pos\\s*\\d*%|spiked|\\+\\s*CWD", profile = c("auto", "standard", "sensitive", "matrix_robust"), consensus = c("majority", "strict", "flexible", "threshold"), consensus_threshold = 0.5, matrix_groups = NULL, use_raf = TRUE, mode = c("diagnostic", "research"), instability_check = TRUE, instability_strictness = c("moderate", "strict", "lenient"), verbose = TRUE )
data |
Data frame with Treatment, TTT, MP columns (RAF optional) |
pc_pattern |
Regex for positive controls |
nc_pattern |
Regex for negative controls |
spiked_pattern |
Regex for spiked samples |
profile |
Classification profile: "auto", "standard", "sensitive", or "matrix_robust". "auto" selects based on separation quality. |
consensus |
Replicate consensus rule: "majority", "strict", "flexible", or "threshold" |
consensus_threshold |
For "threshold" consensus: minimum mean well score to classify treatment as positive (default 0.5) |
matrix_groups |
Optional: column name for matrix grouping (enables per-group baseline correction). NULL = global baselines. |
use_raf |
Logical: include RAF in scoring (default TRUE if present) |
mode |
"diagnostic" (deterministic, no stochastic rescue) or "research" (full adaptive architecture). Default: "diagnostic". |
instability_check |
Logical: run instability detection (default TRUE) |
instability_strictness |
"moderate" (2+ flags), "strict" (1+), or "lenient" (3+). Controls override sensitivity. |
verbose |
Print progress |
Data frame with per-well results including:
Well type: "PC", "NC", or "Sample"
Logical indicating if sample is spiked
Logical for TTT threshold crossing
Logical for signal above NC threshold
Logical for Layer 1 gate passage
Logical for suspected artifacts
Logical for stochastic rescue (research mode only)
TTT component score (0-1)
Signal component score (0-1)
RAF component score (0-1, if available)
Stochastic component score (0-1)
Combined well score (0-1)
Logical for well-level positive classification
Alias for well_score
Well classification: "POSITIVE", "NEGATIVE", "INCONCLUSIVE", "ARTIFACT_SUSPECT", or "INVALID"
Confidence level: "VERY_HIGH", "HIGH", "MODERATE", "LOW", or "CONTROL"
Treatment-level consensus classification (may be "INCONCLUSIVE_MATRIX_EFFECT" if instability detected)
Treatment-level confidence
Fraction of positive wells in treatment
Mean well score for treatment
Logical for Layer 6 instability override
Plus attributes: pc_stats, nc_stats, separation, profile, consensus, optimal_cutoff, youden_j, thresholds, trt_summary, control_check, score_semantics, version, mode, instability_check, instability_strictness, instability_summary, group_controls, group_profiles, group_separation.
set.seed(42) df <- data.frame( Treatment = c(rep("Positive Control", 8), rep("Negative Control", 8), rep("Sample_A", 8)), TTT = c(rnorm(8, 8, 1), rnorm(8, 72, 5), rnorm(8, 12, 3)), MP = c(rnorm(8, 100, 10), rnorm(8, 20, 5), rnorm(8, 85, 15)) ) result <- kwela_analyze(df) summary <- kwela_summarize(result)set.seed(42) df <- data.frame( Treatment = c(rep("Positive Control", 8), rep("Negative Control", 8), rep("Sample_A", 8)), TTT = c(rnorm(8, 8, 1), rnorm(8, 72, 5), rnorm(8, 12, 3)), MP = c(rnorm(8, 100, 10), rnorm(8, 20, 5), rnorm(8, 85, 15)) ) result <- kwela_analyze(df) summary <- kwela_summarize(result)
kwela_plate_normalize normalizes TTT and MP values within each
plate using negative control baselines.
kwela_bootstrap_summary computes bootstrap confidence intervals
for treatment scores. Set seed externally for reproducibility.
kwela_plate_normalize( data, plate_col = "Plate", pc_pattern = "\\bPositive\\s*Control\\b|^POS\\b|\\bPC\\b", nc_pattern = "\\bNegative\\s*Control\\b|^NEG\\b|\\bNC\\b|\\bTDB\\b|\\bBlank\\b", method = c("zscore", "median_ratio") ) kwela_bootstrap_summary(result, B = 1000, conf = 0.95)kwela_plate_normalize( data, plate_col = "Plate", pc_pattern = "\\bPositive\\s*Control\\b|^POS\\b|\\bPC\\b", nc_pattern = "\\bNegative\\s*Control\\b|^NEG\\b|\\bNC\\b|\\bTDB\\b|\\bBlank\\b", method = c("zscore", "median_ratio") ) kwela_bootstrap_summary(result, B = 1000, conf = 0.95)
data |
Data frame with Treatment, TTT, MP columns |
plate_col |
Name of plate identifier column |
pc_pattern |
Regex for positive controls |
nc_pattern |
Regex for negative controls |
method |
Normalization method: "zscore" or "median_ratio" |
result |
Output from |
B |
Number of bootstrap replicates (default 1000) |
conf |
Confidence level (default 0.95) |
kwela_plate_normalize returns a data frame with added columns:
Plate identifier
Normalized TTT values
Normalized MP values
kwela_bootstrap_summary returns a data frame with:
Treatment name
Number of wells
Mean well score
Lower CI bound for mean score
Upper CI bound for mean score
Observed positive rate
Lower CI bound for positive rate
Upper CI bound for positive rate
set.seed(42) df <- data.frame( Treatment = c(rep("Positive Control", 4), rep("Negative Control", 4), rep("Sample_A", 4)), TTT = c(rnorm(4, 8, 1), rnorm(4, 72, 5), rnorm(4, 12, 3)), MP = c(rnorm(4, 100, 10), rnorm(4, 20, 5), rnorm(4, 85, 15)), Plate = rep("Plate1", 12) ) normalized <- kwela_plate_normalize(df) # Bootstrap example result <- kwela_analyze(df, verbose = FALSE) set.seed(123) boot <- kwela_bootstrap_summary(result, B = 500)set.seed(42) df <- data.frame( Treatment = c(rep("Positive Control", 4), rep("Negative Control", 4), rep("Sample_A", 4)), TTT = c(rnorm(4, 8, 1), rnorm(4, 72, 5), rnorm(4, 12, 3)), MP = c(rnorm(4, 100, 10), rnorm(4, 20, 5), rnorm(4, 85, 15)), Plate = rep("Plate1", 12) ) normalized <- kwela_plate_normalize(df) # Bootstrap example result <- kwela_analyze(df, verbose = FALSE) set.seed(123) boot <- kwela_bootstrap_summary(result, B = 500)
kwela_summarize extracts treatment-level consensus results.
kwela_diagnostics returns detailed diagnostic information.
kwela_summarize(result) kwela_diagnostics(result)kwela_summarize(result) kwela_diagnostics(result)
result |
Output from |
kwela_summarize returns a data frame with treatment-level results:
Treatment name
Number of wells
Number of positive wells
Fraction positive
Mean well score
Treatment classification
Confidence level
Consensus rule used
Whether treatment is spiked
kwela_diagnostics returns a list with:
KWELA version string
Analysis mode ("diagnostic" or "research")
Profile used
Consensus rule used
Youden-optimized cutoff
Youden J statistic
Separation quality metrics
Well-level statistics
Treatment-level statistics
Instability detection results: check_enabled, strictness, n_inconclusive_matrix, treatments_flagged
Number of sample wells
Number of spiked wells
Number of unspiked wells
set.seed(42) df <- data.frame( Treatment = c(rep("Positive Control", 8), rep("Negative Control", 8), rep("Sample_A", 8)), TTT = c(rnorm(8, 8, 1), rnorm(8, 72, 5), rnorm(8, 12, 3)), MP = c(rnorm(8, 100, 10), rnorm(8, 20, 5), rnorm(8, 85, 15)) ) result <- kwela_analyze(df, verbose = FALSE) summary <- kwela_summarize(result) diag <- kwela_diagnostics(result)set.seed(42) df <- data.frame( Treatment = c(rep("Positive Control", 8), rep("Negative Control", 8), rep("Sample_A", 8)), TTT = c(rnorm(8, 8, 1), rnorm(8, 72, 5), rnorm(8, 12, 3)), MP = c(rnorm(8, 100, 10), rnorm(8, 20, 5), rnorm(8, 85, 15)) ) result <- kwela_analyze(df, verbose = FALSE) summary <- kwela_summarize(result) diag <- kwela_diagnostics(result)
robust_scale provides MAD-based scale estimation.
safe_sd provides a guaranteed-positive standard deviation.
robust_scale(x) safe_sd(x, na.rm = TRUE)robust_scale(x) safe_sd(x, na.rm = TRUE)
x |
Numeric vector |
na.rm |
Logical; should NA values be removed? |
robust_scale returns a robust scale estimate using MAD with fallback
to IQR/1.349 and SD. Returns NA_real_ if fewer than 2 values.
safe_sd returns standard deviation, guaranteed to be positive
(returns machine epsilon if zero or NA).
robust_scale(rnorm(50)) robust_scale(c(1, 2, 3, 100)) safe_sd(c(5, 5, 5))robust_scale(rnorm(50)) robust_scale(c(1, 2, 3, 100)) safe_sd(c(5, 5, 5))