Browsing by Author "Perera, V. I. T."

Now showing 1 - 1 of 1

Open Access
A Metric-Driven Framework for Evaluating Large Language Models in Software Testing: Insights from Industry Experts
(Sri Lanka Institute of Information Technology, 2025-12) Perera, V. I. T.
The integration of Large Language Models (LLMs) into software testing workflows has introduced new opportunities for automation, but also raised critical questions regarding the reliability, maintainability, and effectiveness of the generated test cases. This study addresses the lack of standardized evaluation practices by proposing and validating a comprehensive metric-driven framework: STEAM-LLM (Software Test Effectiveness Assessment Model for LLM-generated tests). Through a mixed-methods research design involving structured surveys and expert interviews, the framework identifies three core independent variables: Prompt Engineering Level, Human Intervention, and Input Specification Quality and evaluates their impact on FaultRevealing Power and Maintainability & Stability. The inclusion of Edit Effectiveness as a mediator, along with contextual moderators such as Task Complexity, Developer Experience, and LLM Class, reflects the nuanced dynamics of LLM-assisted testing. Confounding variables including Baseline Project/Test Quality and Time Spent on Understanding the Task were also accounted for to ensure valid assessments. The framework was empirically validated and supported by metric thresholds (e.g., ≥80% coverage, ≤10% smell density), offering practical benchmarks for industry adoption. The findings demonstrate that structured prompts, strategic human oversight, and high-quality inputs are essential for producing reliable and maintainable LLM-generated test cases. This research contributes a theoretically robust and practically applicable model for evaluating AI-assisted software testing, laying the foundation for future experimentation, tooling, and integration into continuous development environments.