Posts

Showing posts from October, 2024

Classifying E-petitions using BERT

Image
Introduction: The project focuses on training a BERT-based model for text classification specific to Japanese e-petitions. The goal is to classify petitions into 6 classes predefined by the data source. This type of classification allows for assisting organizations in their understanding of online petitions, which can be crucial for various societal causes in Japan. Data Acquisition To begin the project, data for training the BERT model was required, and since no Japanese data was available at the time, I decided to gather it myself. The data was collected through web scraping of the change.org/ja website and the Japanese regions' change.org.  The Python code leverages Selenium, a web automation tool, to interact with the webpage, extract relevant data, and save it for further processing. As the default browser on the device was Brave, the scraper was also designed to use the Brave browser as the web driver. While scraping, the scraper implements a looped scrolling mechanism to loa...