LLM의 성능 평가 모델에 대한 논문들을 정리해 봅니다

Summarization is (Almost) Dead

How well can large language models (LLMs) generate summaries? We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of LLMs across five distinct summarization tasks. Our findings indicate a clear preference among human evaluators for LLM-generated summaries over human-written summaries and summaries generated by fine-tuned models. Specifically, LLM-generated summaries exhibit better factual consistency and fewer instances of extrinsic hallucinations. Due to the satisfactory performance of LLMs in summarization tasks (even surpassing the benchmark of reference summaries), we believe that most conventional works in the field of text summarization are no longer necessary in the era of LLMs. However, we recognize that there are still some directions worth exploring, such as the creation of novel datasets with higher quality and more reliable evaluation methods.

https://arxiv.org/abs/2309.09558

How well can large language models (LLMs) generate summaries? We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of LLMs across five distinct summarization tasks. Our findings indicate a clear p

arxiv.org

저작자표시 (새창열림)

'- 배움이 있는 삶 > - AI | Big data' 카테고리의 다른 글

생성형(Generative) AI 평가 방법 (0)	2024.05.22
GPT-4o 성능평가 (1)	2024.05.21
Data Science 중급 (0)	2024.05.21
MSE 정의 및 계산법 - excel 활용 (0)	2024.04.19
6220. [파이썬 프로그래밍 기초(1) 파이썬의 기본 구조와 기초 문법] 6. 흐름과 제어 - If 3 (0)	2024.04.05

여유가 있는 삶

LLM 성능평가 모델 논문

Summarization is (Almost) Dead

'- 배움이 있는 삶 > - AI | Big data' 카테고리의 다른 글

티스토리툴바

LLM 성능평가 모델 논문

Summarization is (Almost) Dead

'- 배움이 있는 삶 > - AI | Big data' 카테고리의 다른 글

관련글

티스토리툴바