每天一点sklearn之KFold(9.8)
KFod把数据集划分成K份,返回一个索引生成器,可以用循环遍历它。
class sklearn.model_selection.KFold(n_splits='warm', shuffle=False, random_state=None)
n_splits: K,把数据集划分成k份
shuffle: 打乱顺序再划分
random_state: 相当于随机种子,一般都要和shuffle搭配使用,只有当shuffle=True的时候,才有意义,每次打乱的结果是一样的
1
2
3
4
5
6
|
from sklearn.model_selection import KFold
kf1 = KFold(n_splits=3, shuffle=True) #把数据集划分成3份
for train_index, test_index in kf1.split(xtrain[:20]):
print('In KFold,test_index is:{}'.format(test_index)) #第一份做验证集,剩下两份做训练集
print('In KFold,train_index is:{}'.format(train_index))
|
分层抽样,需要传入label:
1
2
3
4
5
6
7
8
9
|
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=3,shuffle=True, random_state=1)
for train_index, test_index in skf.split(xtrain[:20],ytrain[:20]):
print('In StratifiedFold,test_index is:{}'.format(test_index)) #第一份做验证集,剩下两份做训练集
print(ytest[test_index].value_counts()) #各类个数1:1
print('In StratifiedFold,train_index is:{}'.format(train_index))
print(ytrain[test_index].value_counts())
|