A Metric-Driven Framework for Evaluating Large Language Models in Software Testing: Insights from Industry Experts

Perera, V. I. T.

Publication:
A Metric-Driven Framework for Evaluating Large Language Models in Software Testing: Insights from Industry Experts

dc.contributor.author	Perera, V. I. T.
dc.date.accessioned	2026-02-06T06:45:00Z
dc.date.issued	2025-12
dc.description.abstract	The integration of Large Language Models (LLMs) into software testing workflows has introduced new opportunities for automation, but also raised critical questions regarding the reliability, maintainability, and effectiveness of the generated test cases. This study addresses the lack of standardized evaluation practices by proposing and validating a comprehensive metric-driven framework: STEAM-LLM (Software Test Effectiveness Assessment Model for LLM-generated tests). Through a mixed-methods research design involving structured surveys and expert interviews, the framework identifies three core independent variables: Prompt Engineering Level, Human Intervention, and Input Specification Quality and evaluates their impact on FaultRevealing Power and Maintainability & Stability. The inclusion of Edit Effectiveness as a mediator, along with contextual moderators such as Task Complexity, Developer Experience, and LLM Class, reflects the nuanced dynamics of LLM-assisted testing. Confounding variables including Baseline Project/Test Quality and Time Spent on Understanding the Task were also accounted for to ensure valid assessments. The framework was empirically validated and supported by metric thresholds (e.g., ≥80% coverage, ≤10% smell density), offering practical benchmarks for industry adoption. The findings demonstrate that structured prompts, strategic human oversight, and high-quality inputs are essential for producing reliable and maintainable LLM-generated test cases. This research contributes a theoretically robust and practically applicable model for evaluating AI-assisted software testing, laying the foundation for future experimentation, tooling, and integration into continuous development environments.
dc.identifier.uri	https://rda.sliit.lk/handle/123456789/4540
dc.language.iso	en
dc.publisher	Sri Lanka Institute of Information Technology
dc.subject	Metric-Driven Framework
dc.subject	Evaluating Large
dc.subject	Language Models
dc.subject	Software Testing
dc.subject	Industry Experts
dc.title	A Metric-Driven Framework for Evaluating Large Language Models in Software Testing: Insights from Industry Experts
dc.type	Thesis
dspace.entity.type	Publication