KFold函数-sklearn

每天一点sklearn之KFold(9.8)

KFod把数据集划分成K份,返回一个索引生成器,可以用循环遍历它。

class sklearn.model_selection.KFold(n_splits='warm', shuffle=False, random_state=None)

  • n_splits: K,把数据集划分成k份
  • shuffle: 打乱顺序再划分
  • random_state: 相当于随机种子,一般都要和shuffle搭配使用,只有当shuffle=True的时候,才有意义,每次打乱的结果是一样的
1
2
3
4
5
6
from sklearn.model_selection import KFold

kf1 = KFold(n_splits=3, shuffle=True)   #把数据集划分成3份
for train_index, test_index in kf1.split(xtrain[:20]):
    print('In KFold,test_index is:{}'.format(test_index)) #第一份做验证集,剩下两份做训练集
    print('In KFold,train_index is:{}'.format(train_index)) 

分层抽样,需要传入label:

1
2
3
4
5
6
7
8
9
from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=3,shuffle=True, random_state=1)
for train_index, test_index in skf.split(xtrain[:20],ytrain[:20]):
    print('In StratifiedFold,test_index is:{}'.format(test_index)) #第一份做验证集,剩下两份做训练集
    print(ytest[test_index].value_counts()) #各类个数1:1

    print('In StratifiedFold,train_index is:{}'.format(train_index)) 
    print(ytrain[test_index].value_counts())