Institutional Repository Keyword Analysis with Web Crawler

dc.creatorFujita, Mariângela Spotti Lopes
dc.creatorKatahira, Isaque
dc.creatorTolare, Jessica Beatriz
dc.date2022-12-23
dc.date.accessioned2023-02-20T14:34:16Z
dc.date.available2023-02-20T14:34:16Z
dc.descriptionThis study aims at investigating procedures of semantic and linguistic extraction of keywords from metadata of documents indexed in the Institutional Repository Unesp. For that purpose, a web crawler was developed, that collected 325.181 keywords from authors, in all fields of knowledge, from February 28th, 2013 to November 10th, 2021. The preparation of the collection, extraction and analysis environment used the Python programming language, composed of three program libraries: library requests, which allows manipulation of hyperlinks of webpages visited through web crawler; BeautifulSoup library, used to extract HTML data through webpage analysis; and Pandas library, which has an open code (free software) and stands for providing tools for high performance data manipulation and analysis. The final listing consisted of 273,485 keywords, which represents 15.9% of the listing initially collected. Results indicated that the most recurring problem was the duplication of keywords, with 51,696 duplicated keywords, representing indicators of inconsistencies in the search for documents. It is concluded that the refinement of keywords assigned by authors eliminates the incorporation of a set of symbols that do not represent the authors’ keywords with the same spelling, but with upper/lower case variations or lexical variations indexing different documents.en-US
dc.formatapplication/pdf
dc.identifierhttps://ojs.lib.unideb.hu/CEJER/article/view/11395
dc.identifier10.37441/cejer/2022/4/2/11395
dc.identifier.urihttps://hdl.handle.net/2437/346003
dc.languageeng
dc.publisherDebrecen University Press (DUPress)en-US
dc.relationhttps://ojs.lib.unideb.hu/CEJER/article/view/11395/10842
dc.rightsCopyright (c) 2022 by the authorsen-US
dc.rightshttps://creativecommons.org/licenses/by/4.0en-US
dc.sourceCentral European Journal of Educational Research; Vol. 4 No. 2 (2022): Data and Information Science in Education; 54-59en-US
dc.sourceCentral European Journal of Educational Research; Évf. 4 szám 2 (2022): Data and Information Science in Education; 54-59hu-HU
dc.source2677-0326
dc.subjectinstitutional repositoriesen-US
dc.subjectweb crawleren-US
dc.subjectindexing by authoren-US
dc.titleInstitutional Repository Keyword Analysis with Web Crawleren-US
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion
Fájlok
Eredeti köteg (ORIGINAL bundle)
Megjelenítve 1 - 1 (Összesen 1)
Nem elérhető
Név:
PDF.pdf
Méret:
345.75 KB
Formátum:
Adobe Portable Document Format