Publication:
A Metric-Driven Framework for Evaluating Large Language Models in Software Testing: Insights from Industry Experts

dc.contributor.authorPerera, V. I. T.
dc.date.accessioned2026-02-06T06:45:00Z
dc.date.issued2025-12
dc.description.abstractThe integration of Large Language Models (LLMs) into software testing workflows has introduced new opportunities for automation, but also raised critical questions regarding the reliability, maintainability, and effectiveness of the generated test cases. This study addresses the lack of standardized evaluation practices by proposing and validating a comprehensive metric-driven framework: STEAM-LLM (Software Test Effectiveness Assessment Model for LLM-generated tests). Through a mixed-methods research design involving structured surveys and expert interviews, the framework identifies three core independent variables: Prompt Engineering Level, Human Intervention, and Input Specification Quality and evaluates their impact on FaultRevealing Power and Maintainability & Stability. The inclusion of Edit Effectiveness as a mediator, along with contextual moderators such as Task Complexity, Developer Experience, and LLM Class, reflects the nuanced dynamics of LLM-assisted testing. Confounding variables including Baseline Project/Test Quality and Time Spent on Understanding the Task were also accounted for to ensure valid assessments. The framework was empirically validated and supported by metric thresholds (e.g., ≥80% coverage, ≤10% smell density), offering practical benchmarks for industry adoption. The findings demonstrate that structured prompts, strategic human oversight, and high-quality inputs are essential for producing reliable and maintainable LLM-generated test cases. This research contributes a theoretically robust and practically applicable model for evaluating AI-assisted software testing, laying the foundation for future experimentation, tooling, and integration into continuous development environments.
dc.identifier.urihttps://rda.sliit.lk/handle/123456789/4540
dc.language.isoen
dc.publisherSri Lanka Institute of Information Technology
dc.subjectMetric-Driven Framework
dc.subjectEvaluating Large
dc.subjectLanguage Models
dc.subjectSoftware Testing
dc.subjectIndustry Experts
dc.titleA Metric-Driven Framework for Evaluating Large Language Models in Software Testing: Insights from Industry Experts
dc.typeThesis
dspace.entity.typePublication

Files

Original bundle

Now showing 1 - 2 of 2
Thumbnail Image
Name:
A Metric-Driven Framework for Evaluating Large 1-11.pdf
Size:
553.63 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
A Metric-Driven Framework for Evaluating Large.pdf
Size:
1.26 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: