|
JCapper Message Board
|
|
By |
KY DERBY 2018 |
jeff 5/5/2018 10:21:19 AM | KY DERBY 2018
With about 5.5 hrs to go before they head to the post, these are the top 7 in order according to my before the tote UPR odds line:
num horsename ol --- ------------------ ----- 3 PROMISES FULFILLED 5.90 12 ENTICED 7.63 19 NOBLE INDY 8.31 11 BOLT DORO 11.68 1 FIRENZE FIRE 16.69 7 JUSTIFY 17.67 9 HOFBURG 18.73
Good luck to all,
-jp
.
| JimmyM 5/5/2018 12:02:05 PM | Thanks Jeff... Anything change on a sloppy track?
| jeff 5/5/2018 1:13:52 PM | Imo, today's dirt surface seems to be favoring runners with middle to outside posts that also have a touch of late.
Imo, turf course wet - getting chewed up/becoming boggy along the hedge - and favoring outside posts.
-jp
.
| NYMike 5/5/2018 3:26:20 PM | Jeff, Are your picks ranked by chance of winning or by potential value?
Mike
| jeff 5/6/2018 10:59:07 AM | Mike, to answer your specific question the ranking is based on likelihood of a win in descending order.
My post was based on output generated by a method=4/mlr GroupName.
The numeric value generated for each horse by the GroupName is a decimal value between 0 and 1 that is (ususally) a pretty good prob estimate for the likelihood of a win.
The ol numbers that I posted above are based on the following formula:
ol = (1/probEstimate) - 1
The ol values that I posted represent min strike price for each horse - BEFORE the odds are known. (I'll come back to that a little later.)
The GroupName (which I developed late last year ahead of Gulfstream's so called championship meet) uses roughly 15 inputs including best speed fig last 3 and several prob expressions that I'm using to score rider, trainer, railposition, fig consensus, finish position, breeding, etc. across a handful of unique categories/factor combinations that I landed on after doing some Data Window r&d.
And while the GroupName unperformed (and badly) in the Derby itself, this is how it performed on the Saturday 05-05-2018 Derby Day card:
query start: 5/6/2018 10:14:28 AM query end: 5/6/2018 10:14:28 AM elapsed time: 0 seconds ` Data Window Settings: Connected to: C:\JCapper\exe\JCapper2.mdb 999 Divisor Odds Cap: None SQL UDM Plays Report: Hide ` SQL: SELECT * FROM STARTERHISTORY WHERE TRACK='CDX' AND [DATE] >= #05-05-2018# AND [DATE] <= #05-05-2018# ORDER BY [DATE], TRACK, RACE ` ` Data Summary Win Place Show ----------------------------------------------------- Mutuel Totals 225.60 232.00 204.80 Bet -294.00 -294.00 -294.00 ----------------------------------------------------- P/L -68.40 -62.00 -89.20 ` Wins 14 29 42 Plays 147 147 147 PCT .0952 .1973 .2857 ` ROI 0.7673 0.7891 0.6966 Avg Mut 16.11 8.00 4.88 ` ` By: UPR Rank ` Rank P/L Bet Roi Wins Plays Pct Impact AvgMut ---------------------------------------------------------------------------------- 1 13.80 28.00 1.4929 5 14 .3571 3.7500 8.36 2 -9.00 28.00 0.6786 2 14 .1429 1.5000 9.50 3 61.20 28.00 3.1857 2 14 .1429 1.5000 44.60 4 12.60 28.00 1.4500 2 14 .1429 1.5000 20.30 ` 5 -28.00 28.00 0.0000 0 14 .0000 0.0000 0.00 6 -20.20 28.00 0.2786 1 14 .0714 0.7500 7.80 7 -16.40 28.00 0.4143 1 14 .0714 0.7500 11.60 8 -26.00 26.00 0.0000 0 13 .0000 0.0000 0.00 9 -20.00 20.00 0.0000 0 10 .0000 0.0000 0.00 10 -10.00 10.00 0.0000 0 5 .0000 0.0000 0.00 11 5.60 10.00 1.5600 1 5 .2000 2.1000 15.60 12 -8.00 8.00 0.0000 0 4 .0000 0.0000 0.00 13 -6.00 6.00 0.0000 0 3 .0000 0.0000 0.00 14 -6.00 6.00 0.0000 0 3 .0000 0.0000 0.00 15 -2.00 2.00 0.0000 0 1 .0000 0.0000 0.00 16 -2.00 2.00 0.0000 0 1 .0000 0.0000 0.00 17 -2.00 2.00 0.0000 0 1 .0000 0.0000 0.00 18 -2.00 2.00 0.0000 0 1 .0000 0.0000 0.00 19+ -4.00 4.00 0.0000 0 2 .0000 0.0000 0.00
Two things I want to emphasize:
First, the GroupName itself was created using JCapper data spanning several years (2014-2015-2016-2017) of Jan 01 to Mar 31 races at Gulfstream.
I purposely did that because I was targeting the 2018 Gulfstream championship meet.
The GroupName itself was created for Gulfstream -- and the coefficients in the model have not been adjusted for Churchill in any way.
Second, I mentioned that the strike prices for each horse in my original post were based on best available info before the odds are known.
Right now as I type this I am still working on the UPR Tools interface so that (once the interface is a little further along) the JCapper user should be able to "grab" one or more "public prob estimates" based on the odds -- and from there -- work each "public prob estimate" into the model just like you would with any other factor.
That said, even though I currently lack the ability to overlay the tote on top of my mlr groupnames:
I am working towards that -- and have to say I am greatly encouraged by the performance of the model going forward in time after it was created - and am really looking forward to being able to incorporate the tote.
-jp
.
| NYMike 5/6/2018 1:44:51 PM | Jeff, That's great. You mention incorporating the tote. In regards to that and that recent Benter interview, how heavily would you suggest weighting the tote along with other factors?
Thanks,
Mike
| jeff 5/6/2018 4:41:24 PM | If I take the factor mix of the model described above and add a tote component to it -- and run the new model through the mlogit module in r and look at the output:
And if I then sum the beta coefficients for every factor in the model (including the beta coefficient for the tote component) to arrive at a total:
The beta coefficient for the tote component works out to be approximately 15.4% of the total.
But each model is different.
The % of the total for the tote component really depends on the factor mix of the model.
Imo, the more areas of the game accounted for in the model before you add a tote component: the lower the tote component's % total is going to be (and vice versa.)
-jp
.
| NYMike 5/10/2018 6:16:24 PM | Jeff, Thanks very much! This is awesome.
Mike
| NYMike 11/24/2018 8:01:22 AM | Jeff, Incorporating live odds helped Benter. Did he ever write anything on how, with looking at the live odds, he separated a viable horse going off with better value potential (good bet) vs one where the wisdom of the crowd is correctly laying off a horse (bad bet)
Thanks,
Mike
| jeff 11/24/2018 7:21:59 PM | Here's a link to a PDF published by Bill Benter in 1994 --
Computer Based Horse Race Handicapping and Wagering Systems: A Report: https://www.gwern.net/docs/statistics/decision/1994-benter.pdf
Fyi, I've probably read the above linked to PDF at least 20 times over the years. (And each time I reread it I manage to pick up something I somehow missed during all of those previous readings.)
To the best of my knowledge, Benter did not provide step by step instructions showing exactly how to use handicapping factors to create a before the odds are known model.
Nor did he provide step by step instructions for merging his before the odds are known model with the tote.
However, he does describe, in general terms, his overall process.
I'm guessing that when he wrote the paper he assumed his audience (as a prerequisite) possessed a rudimentary understanding of how to create a multinomial logit model from a dataset.
On page 184, Benter wrote several paragraphs beneath a headline that read HANDICAPPING MODEL DEVELOPMENT:
"The most difficult and time-consuming step in creating a computer based betting system is the development of the fundamental handicapping model. That is, the model whose final output is an estimate of each horse's probability of winning.
The type of model used by the author is the multinomial logit model proposed by Bolton and Chapman (1986)."
From there, he goes on to talk about a few of the basic before the odds are known handicapping factors included in his model.
On page 186, Benter wrote a chapter beneath the following headline: CREATING UNBIASED PROBABILITY ESTIMATES.
Here, he wrote that he was able to improve his overall model by merging the before the odds are known model with the tote.
He also goes on to illustrate those improvements using several tables.
Imo, the real value of having a model capable of generating unbiased probability estimates lies not in just knowing when you have a positive expectancy --
But in having the ability to accurately estimate your edge.
Because if you can do THAT:
You can use Kelly bet sizing --
Something he mentioned in the article from earlier this year that appeared in Bloomberg --
The Gambler Who Cracked the Horse-Racing Code: https://www.bloomberg.com/news/features/2018-05-03/the-gambler-who-cracked-the-horse-racing-code
"Benter was struck by the similarities between Kelly’s hypothetical tip wire and his own prediction-generating software. They amounted to the same thing: a private system of odds that was slightly more accurate than the public odds. To simplify, imagine that the gambling public can bet on a given horse at a payout of 4 to 1. Benter's model might show that the horse is more likely to win than those odds suggest—say, a chance of one in three. That means Benter can put less at risk and get the same return; a seemingly small edge can turn into a big profit. And the impact of bad luck can be diminished by betting thousands and thousands of times. Kelly's equations, applied to the scale of betting made possible by computer modeling, seemed to guarantee success.
If, that is, the model were accurate..."
-jp
.
| jeff 11/24/2018 7:52:37 PM | Following up on that...
You wrote:
"Incorporating live odds helped Benter. Did he ever write anything on how, with looking at the live odds, he separated a viable horse going off with better value potential (good bet) vs one where the wisdom of the crowd is correctly laying off a horse (bad bet)"
Assume for the sake of argument you have a model capable of generating accurate probabilities given the odds.
The EV (or expected value) calculation is as follows:
EV = (Odds + 1) x (ProbEstimate)
Example involving two horses:
Suppose the odds for Horse A are exactly 3.20 to 1 and according to the model the prob estimate for that horse given the odds is 0.26.
Your expected roi for each $1.00 bet is 1.092 (or a gain of 9.2 cents for each $1.00 wagered) calculated as follows:
EV = (3.20 + 1) x (0.26)
or
EV = 1.092
Suppose Horse B is also going off at 3.20 to 1. But because the handicapping factors part of things in your model for this horse is weaker than that of the first horse -- the prob estimate generated by your model for Horse B given the odds is only 0.21.
Your expected roi for each $1.00 bet on Horse B is 0.882 (or a loss of 11.8 cents for each $1.00 wagered) calculated as follows:
EV = (3.20 + 1) x (0.21)
or
EV = 0.882
Assuming of course you do in fact have an accurate model.
-jp
.
| edcondon 12/13/2018 5:01:01 PM | "If the everyday punters at Happy Valley and Sha Tin ever found out that foreign computer nerds were siphoning millions from the pools, they might stop playing entirely." My own theory why tracks do NOT want to tackle the problem of final odds not being displayed until well into the race.
| NYMike 9/3/2019 10:55:47 PM | Jeff, I'm revisiting this.
Say you have a UPR creating a UPRopinionProb that combined with the live odds makes for a very accurate probability estimate.
As they head to the gate what are the factors that are not already reflected in the probability that might make you adjust your estimate up or down a little?
Thanks,
Mike
| jeff 9/4/2019 8:23:05 PM | The factor UPROpinionProb is generated by an algorithm I wrote that takes (or attempts to take) individual parts you are using in your own UPR and create a calibrated probability estimate.
Only you know what's in your own UPR.
Based on that --
Imo, two great questions to ask might be:
1. What isn't reflected in my UPR?
2. What isn't reflected in the odds?
Suppose for the sake of example it's a day when early speed is king and (visually) closers are having trouble making up ground in the stretch.
Further suppose the field is starting to load.
Keep in mind I know how strongly early speed is being emphasized in my UPR.
If early speed is strongly reflected in my UPR I might be willing to go no deeper than UPR(1) or UPR(2).
If early speed is not well reflected in my UPR, and if the price were right, I might be willing to back a CPace(1) that's (say) UPR(6).
I don't know how strongly early speed is being emphasized in the odds - at least not until I look the field over.
But if the post time favorite is up against it bias-wise and paceIndex/pacePressure, etc. for the race in front of me isn't abnormally high:
That might get me to step up to the plate a little.
On the other hand, if the post time favorite has the fastest V1, top CPace, is right up there in terms of early speed points, etc. --
And at the same time the race in front of me is not abnormally high in terms of paceIndex/pacePressure, etc:
That would likely cause me to pass - or at the very least back off a bit.
There could be a thousand different data points driving decisions like that.
But Imo, that's the basic thought process.
More to come --
-jp
.
| NYMike 9/5/2019 6:53:15 AM | One last one. The difference between Benter's table 3 and 4 shows the live odds incorporated into the probability clearly improves the accuracy of the probability. That said, it makes it seem that a horse with a .25 probability going off at 4-1 is a red flag to him more so than if it is going off at 5-2. Based on this, where is Benter looking for the public's mistake? It seems the more the public accurately estimates the probability would work against him.
Thanks,
Mike
| jeff 9/10/2019 10:22:44 AM | I get what you are saying, Mike.
But one could also argue that Benter or others like him are acutely aware of what goes into the before the odds are known part of their model.
That's where I think they'd be looking.
It's possible a horse could be 4-1 in the odds because Benter or others like him haven't bet it down to 5-2 yet.
-jp
.
| NYMike 9/17/2019 2:00:29 PM | Jeff, With your r of 15.4% does it show standard deviation such as between 10% and 20%
Thanks,
Mike
| jeff 9/18/2019 11:12:05 AM | Recently, I've been working on an MLR model for UPROpinionProb.
Below is a cut and paste of the output in R for one of the test samples:
library(csvread) library(mlogit) y <- read.csv("c:/jcapper/exe/r_rgn-2-dell-1750-d.csv") map.coltypes("c:/jcapper/exe/r_rgn-2-dell-1750-d.csv", header = TRUE, nrows = 100, delimiter = ",") x <- mlogit.data(y, choice="mvp", shape="long", id.var="id", alt.var="horsename")
summary(mlogit(mvp ~ gapf07 + gapf06 + gapf31 + gapf19 + gapf22 + gapf04 -1, data = x))
nr method 2 iterations, 0h:0m:22s g'(-H)^-1g = 2.36E+07 last step couldn't find higher value
Coefficients : Estimate Std. Error z-value Pr(>|z|) gapf07 3.6885 8.6339 0.4272 0.6692 gapf06 4.4035 13.3213 0.3306 0.7410 gapf31 8.7551 32.0962 0.2728 0.7850 gapf19 17.6598 30.7165 0.5749 0.5653 gapf22 16.1274 35.8997 0.4492 0.6533 gapf04 10.5780 19.1186 0.5533 0.5801
Log-Likelihood: -1.5471
The values in the Estimate column are the beta coefficients for the factors in the model.
The output doesn't report standard deviation per se.
But it does give you other statistics about your model:
Std. Error, z-value, Pr(>|z|), and Log-Likelihood.
What are Z-Values in Logistic Regression? http://logisticregressionanalysis.com/1577-what-are-z-values-in-logistic-regression/
--quote:"Very Short Answer
The z-value is the regression coefficient divided by its standard error. It is also sometimes called the z-statistic. It is usually given in the third column of the logistic regression regression coefficient table output." --end quote
What are P-Values in Logistic Regression? Google Search Results
--quote:"A low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable." --end quote
What is a Log-Likelihood in Logistic Regression? Log-likelihood
--quote:" The log-likelihood is, as the term suggests, the natural logarithm of the likelihood.
In turn, given a sample and a parametric family of distributions (i.e., a set of distributions indexed by a parameter) that could have generated the sample, the likelihood is a function that associates to each parameter the probability (or probability density) of observing the given sample." --end quote
-jp
.
~Edited by: jeff on: 9/18/2019 at: 11:12:05 AM~
| Tony_N 9/24/2019 1:29:55 AM | I'm not sure races like the Derby offer statistical value, this year i had the winner by disqualification for $20, i played him in all sorts of exotics, as a high odds horse i knew he would be in the mix, my luck or insight?, i dont know, i did create a UDM for the Derby, it turned up nobody, so i resorted to looking at the past races, the winner by disqualification i felt had run a decent race, a kind of steady horse who keeps to the task and given the odds , would be there. Now a year of working with Jcapper, i had what i thought was a lock on claimers from way back is not the case (saves me money to know this), discovered an interesting UDM with high hit rate low odds horses that pay well in doubles, win would pay 96%. But i have an old fashioned one on four year olds, good stakes winners, that Jcapper supports very well, i eliminate things like ranked 10+ in this or that, just shaving the margins, this elimination of marginals is surprisingly good. So I'm still figuring things out with this excellent tool. I've adjusted from the overall percentage, to eliminating these edge conditions, exploring this and really seeing something new about the game. So to conclude, the Derby and statistics, I'm not sure, but ranked so low here and there are definitely something to consider. At least in my experience so far..
~Edited by: Tony_N on: 9/24/2019 at: 1:29:55 AM~
| Tony_N 9/24/2019 1:32:15 AM | In other words creating negative UDM's has become an interesting experience, but i dont have enough experience yet :) but I am feeling hopeful :) Tony
| Tony_N 9/24/2019 1:34:27 AM | The link to the article is great, thanks Jeff
| ryesteve 9/24/2019 10:26:14 AM | Negative UDMs is another one of those areas where I've seen diminishing returns over the years. For example, you'd think that your selections would be more profitable in races where the ml favorite "looks bad" (as defined by a reliable negative UDM you've developed), but the problem is that more and more, the data people who dominate the pools also know this horse is a bad ml fav, so he takes less money than he might've in the good ol' days, and your selection gets hit harder than it otherwise would've, killing a good chunk of the edge you thought you'd have.
| Tony_N 9/24/2019 2:41:41 PM | Hi Ryesteve, actually I'm not referring to negative ML favourites, just I've found creating UDM's that trim off horses with excessive negatives is an interesting approach I'm exploring. Not sure it will work, it creates an interesting scenario which appears to be profitable when i see a race where only one horse does not have negatives and they are not always the favorite. I have found this to be very good with 4/5 year olds in stakes of all levels. Simple things like should be less than 10 in rank for last race fig etc has improved what i am seeing. From this i created negative UDM's instead of positive ones and then scan the races to see whos not negative. When i retire 7 or 8 years from now i hope to have a good handle on the game and use things like this to enjoy my afternoons
| ryesteve 9/29/2019 10:13:42 AM | Ah, I see where you're coming from. Been there, done that too, and I was never able to overcome the tradeoff of not having a positive udm horse in a race where I see negs. But I still think it'd be a valuable area of investigation, to build models that reflect the makeup of an entire race, rather than just how our factors apply to each horse individually.
| Tony_N 9/30/2019 1:59:03 AM | Yes, I'm only applying to stakes races with fields of 9 or up, I trim certain factors at 10 and up based on my one year database, and prior experience. I end up with a set of horses with no negatives, then i have UDM's for 4/5 years olds that are positive, the intersection I'm looking at, seems interesting, but only time (on paper:)) will tell. Jcapper is a great tool with which to do this.
|
|