Research

Methodology papers.
Public on purpose.

Our research, methodology papers, and industry reports. Every method we use on a Decision Book is documented and citable.

AllMethodologyResearchCase studyIndustry report

May 2026·Research·arXiv:2605.06151

Predicting civil litigation outcomes and the evolution of case complexity and settlement dynamics.

Sandro Claudio Lera, Shahrokh Firouzi, Jonathan Habshush, Robert Mahari

Legal disputes unfold through sequences of filings in which parties update their positions and may settle at any stage. Most computational studies of legal prediction, however, focus on adjudicated outcomes and treat cases as static objects observed only at the end of litigation. Here we develop a temporally structured framework for predicting outcomes in civil litigation using 835,190 court filings between 1996 and 2022. We represent each case as a sequence of documents and model litigation as a three-outcome process: plaintiff win, plaintiff loss, or settlement. Documents are encoded using structured legal features, text embeddings, and information about judges and law firms, and a classifier estimates outcome probabilities at each stage of the case. The model achieves class-specific AUC values between 0.74 and 0.81, and reaches up to 97% accuracy for high-confidence plaintiff-win predictions. To study heterogeneity in predictability, we define case complexity as the entropy of the predicted outcome distribution. Richer factual and relational information improves prediction primarily in low-complexity cases, whereas its marginal contribution declines as complexity increases, suggesting that some disputes remain difficult not because information is missing, but because outcomes are less determinate. Consistent with this interpretation, complexity increases over the course of litigation, indicating that additional filings can amplify uncertainty rather than resolve it. Settlement rates follow an inverted U-shape with respect to complexity, peaking at intermediate levels of predictive uncertainty and declining at both low and high levels of complexity. These findings suggest that predictive uncertainty is not merely model error, but an empirical signal of legal complexity, litigation dynamics, and the conditions under which disputes are resolved through adjudication or settlement.

Read on arXiv ↗Download PDF →

October 2025·Research·Nature Computational Science

Data-driven law firm rankings to reduce information asymmetry in legal disputes.

Alexandre Mojon, Robert Mahari, Sandro Claudio Lera

Selecting capable counsel can shape the outcome of litigation, yet evaluating law firm performance remains challenging. Widely used rankings prioritize prestige, size and revenue over empirical litigation outcomes, offering little practical guidance. Here, to address this gap, we build on the Bradley–Terry model and introduce a new ranking framework that treats each lawsuit as a competitive game between plaintiff and defendant law firms. Leveraging a newly constructed dataset of 60,540 US civil lawsuits involving 54,541 law firms, our findings show that existing reputation-based rankings correlate poorly with actual litigation success, while our outcome-based ranking substantially improves predictive accuracy. These findings establish a foundation for more transparent, data-driven assessments of legal performance.

Read on arXiv ↗

October 2024·Research·arXiv:2410.00725

Early career citations capture judicial idiosyncrasies and predict judgments.

Robert Mahari, Sandro Claudio Lera

Judicial impartiality is a cornerstone of well-functioning legal systems. We assemble a dataset of 112,312 civil lawsuits in U.S. District Courts to study the effect of extraneous factors on judicial decision making. We show that cases are randomly assigned to judges and that biographical judge features are predictive of judicial decisions. We use low-dimensional representations of judges' early-career citation records as generic representations of judicial idiosyncrasies. These predict future judgments with accuracies exceeding 65% for high-confidence predictions on balanced out-of-sample test cases. For 6–8% of judges, these representations are significant predictors across all judgments. These findings indicate that a small but significant group of judges routinely relies on extraneous factors and careful vetting of judges prior to appointment may partially address this issue. Our use of low-dimensional representations of citation records may also be generalized to other jurisdictions or to study other aspects of judicial decision making.

Read on arXiv ↗Download PDF →

Methodology papers.Public on purpose.

Predicting civil litigation outcomes and the evolution of case complexity and settlement dynamics.

Data-driven law firm rankings to reduce information asymmetry in legal disputes.

Early career citations capture judicial idiosyncrasies and predict judgments.

Methodology papers.
Public on purpose.