QA4R: A QUESTION ANSWERING SYSTEM FOR R PACKAGES
Date
2022-07-26
Access
Authors
Babu, Ganesh
Journal Title
Journal ISSN
Volume Title
Publisher
East Carolina University
Abstract
There is a massive amount of data from various sources available today, and querying meaningful information from those datasets would be valuable. Question Answering Systems (QAS) implement information retrieval (IR) and Natural Language Processing (NLP) that can automatically answer the questions posed in a natural language. There are three different types of QAS as Open Domain, Closed Domain, and Restricted Domain. Following are the various types of questions: fact-based, definition, how, why, hypothetical, semantically constrained, and cross-lingual. R is a dynamic programming language widely used for statistical computing that combines functional and object-oriented programming. The R development community maintains thousands of R packages through its Comprehensive R Archive Network CRAN. However, while websites like rdrr.io, rseek.org, and search.r-project.org provide search results for R packages, no intelligent question-answering system is currently available for R.This study examines Question Answering Systems (QAS), current developments and academic research areas in the QAS field, and QAS implementations. In this research, we propose a prototype question answering system for R packages that returns R packages relevant to the user query in natural language. We created a question answering dataset (QAD4R) for R packages using web scraping and developed a question generation model. Pre-trained BERT-based language models were used to create the question-answering system for R. All the code files are available publicly at this GitHub location https://github.com/GanB/QA4R-A-Question-AnsweringSystem-for-R-Packages.