Analisando propostas

Web Scraping with Mysql Database for High-Volume Data Extraction

Publicado em 31 de Dezembro de 2024 dias na TI e Programação

Sobre este projeto

Aberto

**Project Description:** We are seeking an experienced Python developer to create a web scraping script that extracts information from a specific website containing approximately 5 million pages. The extracted data will be stored in a MySQL database and visualized in a dashboard for reporting, filtering, and data views.

The website has implemented blocking mechanisms for direct urllib requests, so a reliable approach is needed to avoid being blocked. Strategies such as rotating user agents and introducing time delays between requests may be necessary. Please note that Selenium, Requests, and Beautiful Soup have already been tested and were unsuccessful.


Requirements:

Web Scraping Framework: Anti-Bot Avoidance Techniques: Implement measures to prevent being blocked by the website, such as: Randomized user-agents, Time delays between requests
Data Storage: Store scraped data in a structured MySQL database. Database design should optimize for both data storage and retrieval efficiency.
Error Handling: Implement error handling to retry failed requests or skip pages after multiple attempts.
Scalability and Distributed Execution: The script should be capable of running on multiple computers simultaneously, allowing the scraping process to be distributed across different machines for faster completion.

Deliverables:
Script for scraping with comments and documentation.
MySQL database structure and schema for storing the scraped data.
Instructions for running the script on multiple computers and setting up the MySQL database for concurrent data collection.

Please provide:
Your relevant experience with large-scale scraping projects.
Estimated timeline and cost for the project.

Thank you!

Categoria TI e Programação
Subcategoria Programação
Qual é o alcance do projeto? Alteração média
Isso é um projeto ou uma posição de trabalho? Um projeto
Tenho, atualmente Eu tenho especificações
Disponibilidade requerida Conforme necessário
Funções necessárias Desenvolvedor

Prazo de Entrega: Não estabelecido

Habilidades necessárias