PostgreSQL

4강: 웹 크롤링 데이터 저장

알세지 2024. 5. 29. 12:19

4강: 웹 크롤링 데이터 저장

import requests
from bs4 import BeautifulSoup

# 웹 페이지 요청
url = "http://example.com"
response = requests.get(url)

# 응답 확인
if response.status_code == 200:
    # HTML 파싱
    soup = BeautifulSoup(response.text, "html.parser")
    # 페이지 제목 추출
    title = soup.title.string
    print(f"Page title: {title}")
else:
    print(f"Failed to retrieve the page. Status code: {response.status_code}")

3. 크롤링한 데이터 PostgreSQL에 저장하기

크롤링한 데이터를 PostgreSQL 데이터베이스에 저장합니다.

데이터베이스 테이블 생성

먼저 데이터를 저장할 테이블을 생성합니다.

CREATE TABLE web_data (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    url VARCHAR(255),
    retrieved_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

데이터 삽입 코드 작성

크롤링한 데이터를 데이터베이스에 삽입하는 코드를 작성합니다.

import psycopg2

# PostgreSQL 연결 설정
conn = psycopg2.connect(
    host="localhost",
    database="mydatabase",
    user="yourusername",
    password="yourpassword"
)

cur = conn.cursor()

# 데이터 삽입 함수
def insert_web_data(title, url):
    cur.execute(
        "INSERT INTO web_data (title, url) VALUES (%s, %s)",
        (title, url)
    )
    conn.commit()

# 웹 크롤러 코드
url = "http://example.com"
response = requests.get(url)
if response.status_code == 200:
    soup = BeautifulSoup(response.text, "html.parser")
    title = soup.title.string
    insert_web_data(title, url)
    print("Data inserted successfully")
else:
    print(f"Failed to retrieve the page. Status code: {response.status_code}")

# 연결 닫기
cur.close()
conn.close()

4. 오류 처리 및 예외 상황 다루기

웹 크롤링 및 데이터베이스 작업 중 발생할 수 있는 오류를 처리하고, 예외 상황을 다룹니다.

요청 오류 처리

웹 페이지 요청 중 발생할 수 있는 오류를 처리합니다.

try:
    response = requests.get(url)
    response.raise_for_status()  # HTTP 오류 발생 시 예외 발생
except requests.exceptions.HTTPError as http_err:
    print(f"HTTP error occurred: {http_err}")
except Exception as err:
    print(f"Other error occurred: {err}")

데이터베이스 오류 처리

데이터베이스 작업 중 발생할 수 있는 오류를 처리합니다.

try:
    # 데이터베이스 작업
    insert_web_data(title, url)
except psycopg2.Error as db_err:
    print(f"Database error occurred: {db_err}")

흔하게 발생하는 오류 및 해결 방법

오류: requests.exceptions.ConnectionError: Failed to establish a new connection
- 해결 방법: URL이 올바른지 확인하고, 인터넷 연결 상태를 점검하세요.
오류: psycopg2.IntegrityError: duplicate key value violates unique constraint
- 해결 방법: 데이터 삽입 전에 중복된 데이터가 있는지 확인하세요.

마무리

웹 크롤링을 통해 데이터를 수집하고, PostgreSQL에 저장하는 방법을 학습했습니다. 다음 강의에서는 FastAPI를 통해 수집한 데이터를 웹에 게시하는 방법을 학습하겠습니다.

'PostgreSQL' 카테고리의 다른 글

5강: FastAPI를 통한 웹 게시 (0)	2024.05.29
3강: FastAPI와 PostgreSQL 연동하기 (0)	2024.05.29
2강: Python에서 PostgreSQL 사용하기 (0)	2024.05.29
1강: PostgreSQL 기초 및 설치 (0)	2024.05.29
[교안]PostgreSQL과 FastAPI를 활용한 웹 크롤링 학습 커리큘럼 (0)	2024.05.29

현재글4강: 웹 크롤링 데이터 저장

알면 좋은 세상의 모든 지식

ChatGPT, ai뉴스요약, 혼자안해요GPT와함께, 파이썬프로그래밍, 파이썬자동화, 티스토리챌린지, 엑셀자동화, 웹크롤링, PostgreSQL, 알세지, 프롬프트생성기, FastAPI, 미드저니프롬프트만들기, 오블완, 크롤링, 파이썬크롤링, 미드저니스타일, Python, 네이버뉴스검색, 파이썬,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

알면 좋은 세상의 모든 지식