1 분 소요

  • [목적]
    1. LASSO
    • Regularized Linear Model을 활용하여 Overfitting을 방지함
    • Hyperparameter lamba를 튜닝할 때 for loop 뿐만 아니라 GridsearchCV를 통해 돌출해봄
      1. Regularized Linear Models의 경우 X’s Scaling을 필수적으로 진행해야함

[LASSO Regression]

  • Hyperparameter Tuning using for Loop
  • Hyperparameter Tuning using GridSearchCV

[LASSO Regression Parameters]

  • Package : https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
  • alpha : L1-norm Penalty Term
    • alpha : 0 일 때, Just Linear Regression
  • fit_intercept : Centering to zero
    • 베타0를 0로 보내는 것 (베타0는 상수이기 때문에)
  • max_iter : Maximum number of interation
    • Loss Function의 LASSO Penalty Term은 절대 값이기 때문에 Gradient Descent와 같은 최적화가 필요함
    • Penalty Term :   y - Xw   ^2_2 + alpha *   w   -1 w는 베타랑 동일
penelty = [0.0000001, 0.0000005, 0.000001, 0.000005,0.00001, 0.00005, 0.0001, 0.001, 0.01, 0.02, 0.03, 0.04]

# LASSO Regression
# select alpha by checking R2, MSE, RMSE
for a in penelty:
    model = Lasso(alpha=a).fit(X_scal.iloc[train_idx], Y.iloc[train_idx])
    score = model.score(X_scal.iloc[valid_idx], Y.iloc[valid_idx])
    pred_y = model.predict(X_scal.iloc[valid_idx])
    mse = mean_squared_error(Y.iloc[valid_idx], pred_y)
    print("Alpha:{0:.7f}, R2:{1:.7f}, MSE:{2:.7f}, RMSE:{3:.7f}".format(a, score, mse, np.sqrt(mse)))
model_best = Lasso(alpha=0.02).fit(X_scal.iloc[train_idx], Y.iloc[train_idx])
summary(model_best, X_scal.iloc[valid_idx], Y.iloc[valid_idx], xlabels=X.columns)
# BMI BP S5-> scaling 안하고 L.R돌려보기-> 1단위 미치는 정도 알아보기
# Cross Validation for LASSO
lasso_cv=LassoCV(alphas=penelty, cv=5)
model = lasso_cv.fit(X_scal.iloc[train_idx], Y.iloc[train_idx])
print("Best Alpha : {:.7f}".format(model.alpha_))
model_best = Lasso(alpha=model.alpha_).fit(X_scal.iloc[train_idx], Y.iloc[train_idx])
score = model_best.score(X_scal.iloc[valid_idx], Y.iloc[valid_idx])
pred_y = model_best.predict(X_scal.iloc[valid_idx])
mse = mean_squared_error(Y.iloc[valid_idx], pred_y)
print("Alpha:{0:.7f}, R2:{1:.7f}, MSE:{2:.7f}, RMSE:{3:.7f}".format(model.alpha_, score, mse, np.sqrt(mse)))
summary(model_best, X_scal.iloc[valid_idx], Y.iloc[valid_idx], xlabels=X.columns)

카테고리:

업데이트:

댓글남기기