Python 문법 기초 8 - 정규표현식(Python)

Python

Python 문법 기초 8 - 정규표현식(Python)

코딩탕탕 2022. 10. 22. 15:18

카페 참조

정규 표현식과 표현식 사용시 r을 사용

정규표현식https://docs.python.org/3/library/re.html?highlight=re#module-rehttps://soooprmx.com/archives/7718https://regexr.com/5mhou -특정한 규칙을 가진 문자열의 집합을 표현하는데 사용하는 형식 언어.-Programming Language

cafe.daum.net

# 정규 표현식 : 다량의 데이터에서 원하는 데이터만 선택해서 처리할 때 효과적

import re // 정규식 표현 함수 import

ss = "1234 abc가나다abc_mbcABC_123555_6한국'Python is fun.'"
print(ss)
print(re.findall(r'123', ss)) # 123 나오게 하기
print(re.findall(r'가나다', ss)) # 가나다 나오게 하기
print(re.findall(r'1', ss)) # 숫자 나오게 하기(낱개)
print(re.findall(r'[1-2]', ss)) # 숫자 나오게 하기(낱개)
print(re.findall(r'[0-9]', ss)) # 숫자 나오게 하기(낱개)
print(re.findall(r'[0-9]+', ss)) # 뒤에 +를 붙이면 띄어쓰기 기준으로 한번에 반환해준다.
print(re.findall(r'[0-9]{2}', ss)) # 반복 횟수를 말한다. 2개씩 자르기
print(re.findall(r'[0-9]{2,3}', ss)) # 반복 횟수를 말한다. 3개씩 자르기
print(re.findall(r'[a-z]+', ss)) # a부터 z 까지 소문자 나오게 하기
print(re.findall(r'[A-Za-z]+', ss))
print(re.findall(r'[가-힣]+', ss)) # 한글 나오게 하기
print(re.findall(r'[^가-힣]+', ss)) # ^를 사용하면 부정의 의미
print(re.findall(r'12|34', ss))
print(re.findall(r'.bc', ss)) # 앞에는 아무거나 + bc 인 문자 나오게 하기
print(re.findall(r'...', ss)) # 3글자인 문자 나오게 하기
print(re.findall(r'[^1]+', ss))
print(re.findall(r'^1+', ss))
print(re.findall(r'fun.$', ss))

print(re.findall(r'\d', ss))
print(re.findall(r'\d+', ss))
print(re.findall(r'\s+', ss))
print(re.findall(r'\S', ss))
print(re.findall(r'\d{1,3}', ss)) # 숫자 1~3까지


<console>
1234 abc가나다abc_mbcABC_123555_6한국'Python is fun.'
['123', '123']
['가나다']
['1', '1']
['1', '2', '1', '2']
['1', '2', '3', '4', '1', '2', '3', '5', '5', '5', '6']
['1234', '123555', '6']
['12', '34', '12', '35', '55']
['123', '123', '555']
['abc', 'abc', 'mbc', 'ython', 'is', 'fun']
['abc', 'abc', 'mbcABC', 'Python', 'is', 'fun']
['가나다', '한국']
['1234 abc', 'abc_mbcABC_123555_6', "'Python is fun.'"]
['12', '34', '12']
['abc', 'abc', 'mbc']
['123', '4 a', 'bc가', '나다a', 'bc_', 'mbc', 'ABC', '_12', '355', '5_6', "한국'", 'Pyt', 'hon', ' is', ' fu', "n.'"]
['234 abc가나다abc_mbcABC_', "23555_6한국'Python is fun.'"]
['1']
[]
['1', '2', '3', '4', '1', '2', '3', '5', '5', '5', '6']
['1234', '123555', '6']
[' ', ' ', ' ']
['1', '2', '3', '4', 'a', 'b', 'c', '가', '나', '다', 'a', 'b', 'c', '_', 'm', 'b', 'c', 'A', 'B', 'C', '_', '1', '2', '3', '5', '5', '5', '_', '6', '한', '국', "'", 'P', 'y', 't', 'h', 'o', 'n', 'i', 's', 'f', 'u', 'n', '.', "'"]
['123', '4', '123', '555', '6']

정규식 표현은 먼저 re를 import 해서 사용한다. findall 로 호출하게 되면 list 타입으로 반환해준다. 앞에 r 을 적어줌으로서 ‘’ 안의 모든 것을 읽어준다.

p = re.compile('the', re.IGNORECASE)
print(p)
print(p.findall('The do the dog'))


<console>
re.compile('the', re.IGNORECASE)
['The', 'the']

대소문자를 구분한다.

ss = '''My name is tom.
I am happy'''
print(ss)
p = re.compile('^.+', re.MULTILINE)
print(p.findall(ss))


<console>
My name is tom.
I am happy
['My name is tom.', 'I am happy']

re.MULTILINE 을호출하게 되면 .을 기준으로 위의 상황처럼 2개의 문자열로 나누어준다.

ss라는 변수명에 주석처럼 '''를 사용했는데 문자열을 여러줄 적을 때도 동일하게 사용된다. 다만 에러를 발생할 경우의 수가 있으니 주석을 사용할 경우에는 # 을 사용하는 것을 추천한다.