During the annual staff exchange program between STPI and its long-term partner Korea Institute of Science and Technology Information (KISTI), KISTI showcased significant advancements in its automated metadata generation and Large Language Model (LLM) development. These achievements illustrate the important milestones that KISTI has reached in the field of technological innovation.
In the exchange event, Dr. Wonju Choi presented the latest developments in KISTI's automatic metadata generation technology. This technology focuses on automatically extracting metadata from PDF files, primarily applied to academic papers and research reports, with the aim of enhancing information extraction efficiency and reducing costs. Previously, KISTI collected and extracted metadata manually, but this method was time-consuming and expensive. Therefore, KISTI developed an automated system utilizing optical character recognition (OCR) and named entity recognition (NER) models, enabling precise extraction of key data from Korean academic papers.
Dr. KyongHa Lee emphasized in his speech how KISTI trains large language models tailored to the specific needs of the Korean language. The model is based on the "llama2" architecture, designed to enhance Korean natural language processing capabilities. The training data for this project is rich and includes a substantial amount of Korean research and development (R&D) papers, ensuring the quality and correctness of the training data. Although these data are legally protected, limiting the model's commercial use, it proves to be a powerful tool in non-commercial applications. Dr. Lee further indicated that KISTI plans to conduct a second phase of model training in the future, releasing a new model capable of handling large amounts of text data, providing more accurate predictions and in-depth understanding, and applying it across various fields, including assisting Korean government agencies in analyzing legal documents and formulating training materials.
The annual staff exchange program between STPI and KISTI not only deepened collaboration and communication in the field of technological innovation but also showcased KISTI's advancements in data processing and natural language processing technologies. The exchange of these technologies will bring new insights to the Taiwan-Korea scientific research community and provide important references for the future trends of technological development in Asia.
(Above) Dr. KyongHa Lee of KISTI (on the right) and Dr. JinYuan Fan from STPI (on the left) exchanged their research experiences during the staff exchange program. One of the meetings is focused on establishing a Korean language model for scientific texts, and the two experts discussed and shared their latest advancements in this field.