Time: Jan.2022 - Jan.2023 Location: Seattle, USA
Volunteered to work with a team of 4 engineers to develop a crawler that feeds hate incident data to the backend of Anti-Asian Hate Crime Tracker, a website designed to increase awareness of Anti-Asian hate.
Built distributed web crawlers with Scrapy. Created Docker files to run crawlers and proxy service in a portable multi-container application. Protected crawler services by setting up Nginx server authentication.
Developed integrated Scrapy item pipelines to support message push service in multiple platforms including Slack APP, Google Drive and Algolia.
Implemented distributed crawling, URL/URI deduplication and established the data access layer by applying Scrapy-Redis. It effectively filtered duplicate contents and improved the crawler efficiency by 42%.
Article Information
- Author:Shaoshuai Xu
- Link of this article:https://Leluth.github.io/2023/01/15/Anti-Asian-Hate-Crime-Tracker/