Lee SeungHyun

code7ssage의 블로그 입니당:)

Light GBM

2024-03-04 최대 1 분 소요

목적: 사용하는 Feature와 Data를 줄여보자
Gradient based One Side Sampling (GOSS)
- Information Gain을 계산할 때 각각의 Data는 다 다른 Gradient를 갖고 있음
- 그렇다고 하면 Gradient가 큰 Data는 Keep 하고 Gradient가 낮은 Data는 Randomly Drop을 수행
- - 각 Data 마다의 Gradient를 구하고 Sorting 함
- Gradient가 높은 것은 계속 Keep 하고, Gradient가 낮은 것은 Randomly Drop을 수행
  - (1-a)/b < 1 할 때, 효과가 극대화 되게 됨: case1 a=0.1, b=0.9 vs case2 a=0.05, b=0.5 -> case2 권장
Exclusive Feature Bundling (EFB)
- 대게 0(Zero) 값을 동시에 가지는 Data는 거의 없음 (One-hot encoding)
- 따라서, 독립적인(Exclusive) Feature는 하나로 Bundling 함
- Exclusive Feature Bundling (EFB)
  - Step 1 Greedy Bundling : 어떤 Feature들을 하나로 Bundling 할 것인지 탐색함
    - Feature 간 독립적인 관계를 판단하기 위해 Graph를 구축함
    - Graph (V, E) (단, V=feature, E: total conflicts between features)
    - conflicts 강도(동시에 0이 아닌 데이터 수) example: (cut-off = 0.2) → N=1 → 10 x 2 = 2회 이상은 Edge 가 끊어지게 됨
  - Step 2 Merge Exclusive Features : 새로운 하나의 변수로 치환해 줌
    - Add offsets to the original values of the feature
실습: 11. Light GBM Code

공유하기

Twitter Facebook LinkedIn

댓글남기기

참고

코드 유사성 판단 경진 대회- private 3위

2024-04-08 8 분 소요

라이브러리 및 seed고정

25.One-class SVM Code

2024-03-27 9 분 소요

[목적] Support Vector Machine 실습 One-Class Support Vector Machine 실습 Multivariate variable (다변량)일 때 사용

24.Robust Random Cut Forest Code

2024-03-27 최대 1 분 소요

[목적] Robust Random Cut Forest Code 실습 Multivariate variable (다변량)일 때 사용 각 Data마다 Score를 계산하여 Abnormal을 산출 할 수 있음

23.Isolation Forest Code

2024-03-27 3 분 소요

[목적] Isolation Forest Code 실습 Multivariate variable (다변량)일 때 사용 각 Data마다 Score를 계산하여 Abnormal을 산출 할 수 있음