PRank Algorithm

The PRank algorithm is a modification of the perceptron classifier. It aims to classify items into tiers, where the tiers are ordered.

def prank(stats, tiers, n_tiers, iters):
    '''
    Implementation of the PRank algorithm (http://papers.nips.cc/paper/2023-pranking-with-ranking.pdf)
    stats: array of features, a column per feature
    tiers: array of rankings, one ranking per player
    n_tiers: number of tiers, tiers are numbered 0,1,2,...,n_tiers-1
    iters: number of iterations of loop
    '''
    n_players = len(stats)
    n_features = len(stats[0])
    w = np.zeros([n_features])   # weight of features
    b = np.zeros([n_tiers])      # cutoffs for rankings
    b[-1] = np.inf
    loss = 0    # total loss incurred by algorithm
    for i in range(iters):      # update w and b given training example x
        n = i%n_players
        x = stats[n].copy()
        y = np.argmax(b > np.dot(x,w))   # get rank of x as determined by current w and b
        y_act = tiers[n]
        loss += abs(y-y_act)
        if y != tiers[n]:      # if predicted rank is not actual rank, update w and b
            y_arr = np.repeat([1,-1],[y_act,n_tiers-1-y_act])   # create y array
            t_arr = np.zeros([n_tiers-1])
            y_n = np.argmax(b >= np.dot(x,w))
            if y_act > y_n:
                t_arr = np.repeat([0,1,0],[y_n,y_act-y_n,n_tiers-1-y_act])
            else:
                t_arr = np.repeat([0,-1,0],[y_act,y_n-y_act,n_tiers-1-y_n])
            w = w + np.sum(t_arr)*x
            b[:-1] = b[:-1] - t_arr
    return w, b, 1.0*loss/iters

def rank(x,w,b):
    '''
    Subroutine for use in PRank
    Calculates rank of example x given feature weights w and rank cutoffs b
    '''
    return np.argmax(b > np.dot(x,w))
PRank Algorithm

Knapsack Algorithm

Daily fantasy lineups contain a set number of players: for example, DraftKings has 1 QB, 2 RB, 3 WR, 1 TE, 1 flex, and 1 defense. This is the perfect setup for the multiple-choice knapsack problem (see http://www.diku.dk/users/pisinger/95-1.pdf).

We will use a modified version of the dynamic programming method used to solve the regular knapsack problem in order to solve the multiple-choice knapsack problem. We deal with the problem of flex picks by solving problems with restrictions of 3 RB, 4WR, and 2 flex, respectively, and then picking the highest-scoring solution.

The code for the algorithm is as follows. The remove_dom function is an auxiliary used to remove dominated items to speed up the algorithm and return an approximation. We do not use it.

# objects is an array of items, each item has a name, weight, and price
def remove_dom(objects):
    # sort objects in increasing order of weight
    objects = objects[objects[:,1].argsort()]
    # iterate through all pairs of objects, if there is an object i such that w_i>w_j but p_i<p_j,
    # remove object i because it is dominated by object j
    new_objects = np.empty([len(objects[0])])
    for i in range(len(objects)):
        is_dom = False
        for j in range(i):
            if objects[i][2] < objects[j][2]:
                is_dom = True
        if not is_dom:
            new_objects = np.vstack((new_objects,objects[i]))
    return new_objects[1:]

class Contents:
    items = []
    types = []
    value = 0
    def __init__(self, num_types):
        self.types = [0 for i in range(num_types)]
    def add(self, object_num, object_value, object_type):
        new_contents = Contents(len(self.types))
        new_contents.items = list(self.items)
        new_contents.items.append(object_num)
        new_contents.types = list(self.types)
        new_contents.types[object_type] += 1
        new_contents.value = self.value
        new_contents.value += object_value
        return new_contents

# objects array has form [name, weight, value, type]
# cat_indices is a list of the last indices for each category; for example, if cat_indices=[1,2],
# item 0 would be in category 0, item 1 would be in category 1, and the rest of the items would be in category 2
# cap_cap is a list of the capacity, or maximum number of items, in each category
def multi_knapsack(objects, weight_limit, cat_indices, cat_cap):
    assert len(cat_indices) == len(cat_cap)
    cat = len(cat_indices) # number of categories
    
    # don't remove dominated objects; this makes it harder to exactly fulfill the salary cap
    '''
    split_arr = np.split(objects, cat_indices)
    for i in range(len(split_arr)-1):
        split_arr[i] = remove_dom(split_arr[i])
        after_index = len(split_arr[i])
        # update cat_indices to account for dropped items
        if i==0:
            cat_indices[i] = after_index
        else:
            cat_indices[i] = cat_indices[i-1] + after_index
    new_objects = np.vstack(split_arr[:-1])
    '''
    
    new_objects = objects
    max_array = [[[Contents(len(cat_cap)) for k in range(cat_cap[int(new_objects[j][3])]+1)] for i in range(weight_limit+1)] for j in range(len(new_objects))]
    max_array = [[[Contents(len(cat_cap))] for i in range(weight_limit+1)]] + max_array
    
    for i in range(len(new_objects)):
        weight = int(new_objects[i][1])
        value = int(new_objects[i][2])
        object_type = int(new_objects[i][3])
        last_same_type = False
        if i != 0:
            last_same_type = object_type == int(new_objects[i-1][3])
        
        for w in range(1,weight_limit+1):
            for n in range(cat_cap[object_type]+1):
                if weight  max_array[i][w][n].value:
                            max_array[i+1][w][n] = max_array[i][w-weight][n-1].add(i,value,object_type)
                        else:
                            max_array[i+1][w][n] = max_array[i][w][n]
                    elif n==1:
                        max_array[i+1][w][n] = max_array[i][w-weight][-1].add(i,value,object_type)
                else:
                    if last_same_type:
                        max_array[i+1][w][n] = max_array[i][w][n]
                    else:
                        max_array[i+1][w][n] = max_array[i][w][-1]
                    
    return [new_objects[i][0] for i in max_array[len(new_objects)][weight_limit][-1].items]
Knapsack Algorithm

Accuracy of Automatic Player

At the end of this study, we wish to be 95% confident of the interval in which the true winning percentage p lies. We can model the outcome of entering head-to-heads as having a binomial distribution with parameter p (here we assume the contests are independent, which is generally true for head-to-heads). Suppose that we win w games and lose l games. Then, our current estimate of p is r=w/(w+l). We also desire that our error e=|p-r|<0.02. The variance of r is then r(1-r)/n, where n is the number of contests entered. Thus, if a z-score of z is desired, z \sqrt(r(1-r)/n)<e, or n>r(1-r)z^2/e^2. r is approximately p, and we would like z=2 and e=0.02. Thus, n>2400. There are 17 weeks in the NFL season, so we would have to enter around 150 contests per week.

We wish to maintain enough bankroll to play for an entire season. Thus, assuming that we bet x percent of our original bankroll on each week, we would like to have less than a 1% chance of losing our entire bankroll by the end of the season. To perform a worst case analysis, suppose that we lose all of our bets in losing weeks, and break even in winning weeks. Then, we lose all of our bankroll if we have 100/x losing weeks. If we assume, probability of having a losing week is 40%, then we must have x<8 (http://www.wolframalpha.com/input/?i=sum+from+i%3D12+to+17+of+%2817+choose+i%29*0.4^i*0.6^%2817-i%29). Since this is a worst-case scenario, we may set x=10, so we bet 10% of our bankroll each week. This is still a very conservative number though, so we can probably raise this to 15-20% and still be safe.
We plan to play in low-stakes head-to-head ($1, $2, or $5 buy-ins) so we expect our average buy-in to be about $3. Since we play 150 contests per week, we will bet around $500 per week, so our total bankroll will be $5000.
Accuracy of Automatic Player