Historical Racing Data - JCapper.com
Right now as I type this it's Tuesday March 17, 2020 and now is probably a great time to be looking at historical racing data.
Over the weekend Aqueduct, Gulfstream, Tampa Bay Downs, Laurel, Fairgrounds, Oaklawn, Golden Gate, and Santa Anita conducted live racing without spectators - because of Coronavirus.
Yesterday, only one track in North America ran a live race card: Will Rogers Downs.
Three other tracks were scheduled to run yesterday: Parx, Mahoning Valley, and Turf Paradise. All three cancelled because of Coronavirus.
Keeneland announced they will cancel their Spring Meet because of Coronavirus.
Things in my own neighborhood are shutting down in a hurry.
Needless to say - whether or not tracks can conduct live racing over the next several weeks is very much up in the air.
All of this got me to thinking. Now is probably a great time to be looking at historical racing data.
HDW (Handicappers Data Warehouse) has historical JCapper V4 Race Files and HDW Text Chart Results Files for all North American thoroughbred races from July 1, 2010 forward through present day.
If you are looking for Historical Racing Data or if you have a data project in mind:
Call or drop me an email,
Jeff Platt
JCapper program author
858.766.0255 (cell)
jeff@jcapper.com
Let's look at some Data:
Below is a cut and paste of query results for a sql expression I created in the JCapper Data Window.
The database itself was compiled using JCapper Platinum and V4 racefiles from HDW (Handicappers Data Warehouse.)
It contains data for all thoroughbred races in North America from October 01, 2019 current through yesterday March 16, 2020.
The first table shows the query results broken out by columns for Win, Place, and Show:
query start: 3/17/2020 10:45:24 AM
query end: 3/17/2020 10:45:26 AM
elapsed time: 2 seconds
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
SQL UDM Plays Report: Hide
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF16 = 1
AND RANKF32 <= 2
AND RANKF31 <= 4
AND [DATE] >= #10-01-2019#
AND [DATE] <= #03-16-2020#
ORDER BY [DATE], TRACK, RACE
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 6682.30 6093.50 5813.50
Bet -6742.00 -6742.00 -6742.00
-----------------------------------------------------
P/L -59.70 -648.50 -928.50
Wins 791 1424 1837
Plays 3371 3371 3371
PCT .2346 .4224 .5449
ROI 0.9911 0.9038 0.8623
Avg Mut 8.45 4.28 3.16
The second table shows the the query results broken out by intSurface - one row per surface type:
By: intSurface
Value P/L Bet Roi Wins Plays Pct Impact AvgMut Descr
---------------------------------------------------------------------------------------------------------------
0.0000 0.00 0.00 0.0000 0 0 .0000 0.0000 0.00 Unknown
1.0000 21.80 5910.00 1.0037 703 2955 .2379 1.0139 8.44 Outer/Main Dirt Course
2.0000 0.00 0.00 0.0000 0 0 .0000 0.0000 0.00 Inner Dirt Course
3.0000 -24.10 182.00 0.8676 20 91 .2198 0.9366 7.90 Off the Turf
4.0000 -60.30 626.00 0.9037 64 313 .2045 0.8714 8.84 Outer/Main Turf Course
5.0000 2.90 24.00 1.1208 4 12 .3333 1.4206 6.73 Inner Turf Course
Figure 1
Interpreting the results:
Starters returned by the sql expression won about 23% of the time at close to break even roi in the win pool.
The concept I had in mind when creating the sql expression is pretty simple:
Identify early pace types likely to be advantaged by recent track bias.
I purposely kept the factor constraints to a minimum to avoid backfitting.
The sql expression is made up of four factor constraints plus a date range.
The Factor Constraints:
The sql expression contains the following factor constraints:
WHERE RANKF16 = 1
AND RANKF32 <= 2
AND RANKF31 <= 4
From a high level - What do the factor constraints actually do?
The factor constraints restrict the query results to horses that are:
- Rank 1 for an Early Pace Userfactor
- Rank 2 or better for an Early Track Bias Prob Expression
- Rank 4 or better for a Post Position Prob Expression
All other horses are excluded from the query results.
A detailed look at what the factor constraints actually do:
"RANKF16" (without the quotes) is a column name in the JCapper Starterhistory table. The RANKF16 field in my Starterhistory table contains rank for a UserFactor I created in JCapper Platinum that emphasizes early pace.
RANKF16 = 1 as it is used in the sql expression means that a horse must be rank=1 for the early pace Userfactor. The database driver excludes all non rank=1 horses from the query results.
"RANKF32" (without the quotes) is a column name in the JCapper Starterhistory table. The RANKF32 field in my Starterhistory table contains rank for a Prob Expression I created in JCapper Platinum.
This particular Prob Expression scores every horse in the race in terms of how well its early pace profile fits recent early-late bias (if any) based on metadata for the track-surf-dist of today's race.
RANKF32 <= 2 as it is used in the sql expression means that a horse must be ranked second or better for the early-late bias Prob Expression. The database driver excludes all other horses from the query results.
"RANKF31" (without the quotes) is a column name in the JCapper Starterhistory table. The RANKF31 field in my Starterhistory table contains rank for a Prob Expression I created in JCapper Platinum.
This particular Prob Expression scores every horse in the race in terms of how well today's gate draw or post position fits recent inside-outside path bias (if any) based on metadata for the track-surf-dist of today's race.
RANKF31 <= 4 as it is used in the sql expression means that a horse must be ranked fourth or better for the inside-outside path bias Prob Expression. The database driver excludes all other horses from the query results.
What is a Prob Expression?
In JCapper Platinum, a Prob Expression is a user defined sql expression executed on the fly during number crunching (Calc Races and Build Database routines.)
The idea behind Prob Expressions is that metadata (data about the data) for the factors in the program can add incremental improvement to what the player is already doing.
For example, if recent track profile has been predominantly early biased - horses that are rank 1 metadata for an early pace based Prob Expression should, in theory, outperform horses that are not rank=1 metadata.
In general, sql queries using historical racing data like the one on this web page (above) tend to bear this out.
To learn more about Prob Expressions, read the Prob Expressions Help Doc - here:
http://www.jcapper.com/helpDocs/ProbExpressions.html
The Scientific Method and Factor Constraints for incremental improvement:
Here's another look at the data break-out table (same as Figure 1 above) that shows the query results broken out by intSurface - one row per surface type:
By: intSurface
Value P/L Bet Roi Wins Plays Pct Impact AvgMut Descr
---------------------------------------------------------------------------------------------------------------
0.0000 0.00 0.00 0.0000 0 0 .0000 0.0000 0.00 Unknown
1.0000 21.80 5910.00 1.0037 703 2955 .2379 1.0139 8.44 Outer/Main Dirt Course
2.0000 0.00 0.00 0.0000 0 0 .0000 0.0000 0.00 Inner Dirt Course
3.0000 -24.10 182.00 0.8676 20 91 .2198 0.9366 7.90 Off the Turf
4.0000 -60.30 626.00 0.9037 64 313 .2045 0.8714 8.84 Outer/Main Turf Course
5.0000 2.90 24.00 1.1208 4 12 .3333 1.4206 6.73 Inner Turf Course
Figure 2
Figure 2 (above) shows stronger results for races on outer/main dirt courses than off turf, outer/main turf, or inner turf courses.
From that - it may be possible to realize an incremental improvement by adding a factor constraint for surface.
Note that I italicized the words may be possible in the above sentence.
When looking at Historical Racing Data - it's easy to make false assumptions about the data based on what you see. It's also easy to form cognitive biases about the way the data behaves based on false assumptions.
The Scientific Method can help you avoid this pitfall.
Instead of making assumptions about the data: The Scientific Method involves formulating hypotheses, deriving predictions from hypotheses as logical consequences, and then conducting experiments to test those predictions.
In layman's terms, instead of making assumptions about the data: The Scientific Method involves creating hypotheses about the data in the form of questions and testing them.
This is the thought process I'd use when applying the Scientific Method to the query results from Figure 2 above:
The concept behind the sql expression is early pace types likely advantaged by recent track bias.
The data suggests it may be possible to improve the outlook for early pace types advantaged by recent track bias by adding factor constraints for surface type.
(Note that this would be the derived prediction as a logical consequence.)
Instead of jumping to conclusions and assuming off turf and turf races should be eliminated - create hypotheses in the form of questions:
- Can rank 1 for early pace types advantaged by recent track bias be improved upon by emphasizing dirt races?
- Can rank 1 for early pace types advantaged by recent track bias be improved upon by eliminating off turf races?
- Can rank 1 for early pace types advantaged by recent track bias be improved upon by eliminating turf races?
The breakout data in Figure 2 suggests the answer may be yes.
However, we cannot possibly know the answers to these questions until after testing our hypotheses by looking at more data.
In a nutshell that is the mindset behind The Scientific Method.
Suggestion - instead of making assumptions about the data - train yourself to ask:
What's the best way to apply the Scientific Method to the Historical Racing Data in front of me?
For now - let's add the following line to the sql expression to restrict the query results to races on outer/main dirt courses only.
AND INTSURFACE = 1
Note that I italicized the words for now. Later on in this article, in the section about Forward Testing, I'll illustrate using the Scientific Method to test the hypotheses about whether or not early pace types advantaged by recent track bias can be improved upon by emphasizing or eliminating races based on surface type.
But, for now, let's add a factor constraint to the sql expression to restrict the query results to races on outer/main dirt courses only:
AND INTSURFACE = 1
After doing that, let's execute the revised sql expression - this time with the results broken out by HDW Runstyle.
This is what the top part of the query results (win-place-show) look like now:
query start: 3/17/2020 11:49:22 AM
query end: 3/17/2020 11:49:24 AM
elapsed time: 2 seconds
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
SQL UDM Plays Report: Hide
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF16 = 1
AND RANKF32 <= 2
AND RANKF31 <= 4
AND INTSURFACE = 1
AND [DATE] >= #10-01-2019#
AND [DATE] <= #03-16-2020#
ORDER BY [DATE], TRACK, RACE
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 5931.80 5322.70 5111.80
Bet -5910.00 -5910.00 -5910.00
-----------------------------------------------------
P/L 21.80 -587.30 -798.20
Wins 703 1269 1638
Plays 2955 2955 2955
PCT .2379 .4294 .5543
ROI 1.0037 0.9006 0.8649
Avg Mut 8.44 4.19 3.12
As you can see, adding a factor constraint for Intsurface produced incremental improvement vs. results from the previous sql expression.
The top (win-place-show) part of the revised query results are stronger than the query results from the previous sql expression.
Moving on, this is what the data breakout by HDW Runstyle looks like:
By: HDW RunStyle
Value P/L Bet Roi Wins Plays Pct Impact AvgMut
----------------------------------------------------------------------------------
0 150.20 1792.00 1.0838 210 896 .2344 0.9852 9.25 E
15 196.70 752.00 1.2616 97 376 .2580 1.0844 9.78 EP
30 38.30 576.00 1.0665 75 288 .2604 1.0946 8.19 P
45 29.50 1204.00 1.0245 155 602 .2575 1.0823 7.96 PS
60 -324.60 1322.00 0.7545 141 661 .2133 0.8966 7.07 S
75 -47.30 186.00 0.7457 19 93 .2043 0.8588 7.30 SS
90 -21.00 78.00 0.7308 6 39 .1538 0.6467 9.50 U
Figure 3
The above data breakout table suggests it may be possible to realize incremental improvement by adding a factor constraint for HDW Runstyle.
Note that I italicized the words may be possible in the above sentence.
Again, when looking at Historical Racing Data - it's easy to make false assumptions about the data based on what you see. It's also easy to form cognitive biases about the way the data behaves based on false assumptions.
Again, the Scientific Method can help you avoid this pitfall.
The Scientific Method involves formulating hypotheses, deriving predictions from hypotheses as logical consequences, and then conducting experiments to test those predictions.
Instead of making assumptions about the data: Use the Scientific Method. Create hypotheses in the form of questions and test them.
Here's thought process I'd use when applying the Scientific Method to the query results from Figure 3 above:
The concept behind the sql expression is early pace types advantaged by recent track bias.
The data suggests it may be possible to improve the outlook for early pace types advantaged by recent track bias by adding factor constraints for HDW Runstyle.
(Note that this would be the derived prediction as a logical consequence.)
Instead of jumping to conclusions and assuming late Runstyles need to be eliminated - create hypotheses in the form of questions:
- Can rank 1 for early pace types advantaged by recent track bias be improved upon by emphasizing early Runstyles?
- Can rank 1 for early pace types advantaged by recent track bias be improved upon by eliminating late Runstyles?
The breakout data in Figure 3 suggests the answer may be yes.
However, we cannot possibly know the answers to the questions we are asking until after testing our hypotheses by looking at more data.
That's the mindset behind The Scientific Method.
Instead of making assumptions about the data - train yourself to ask:
What's the best way to apply the Scientific Method to the Historical Racing Data in front of me?
For now - let's add the following line to the sql expression to eliminate HDW Runstyles S, SS, and U from the query results:
AND RUNSTYLE < 60
Note that I italicized the words for now. Later on in this article, in the section about Forward Testing, I'll illustrate using the Scientific Method to test the hypotheses about whether or not early pace types advantaged by recent track bias can be improved upon by adding factor constraints for HDW Runstyle.
After adding the factor constraint for HDW Runstyle, let's execute the revised sql expression - this time with the data broken out by a JCapper factor called PaceLow.
This is what the top part of the query results (win-place-show) look like now:
query start: 3/17/2020 1:34:59 PM
query end: 3/17/2020 1:35:00 PM
elapsed time: 1 seconds
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
SQL UDM Plays Report: Hide
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF16 = 1
AND RANKF32 <= 2
AND RANKF31 <= 4
AND RUNSTYLE < 60
AND INTSURFACE = 1
AND [DATE] >= #10-01-2019#
AND [DATE] <= #03-16-2020#
ORDER BY [DATE], TRACK, RACE
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 4738.70 4018.00 3753.30
Bet -4324.00 -4324.00 -4324.00
-----------------------------------------------------
P/L 414.70 -306.00 -570.70
Wins 537 935 1193
Plays 2162 2162 2162
PCT .2484 .4325 .5518
ROI 1.0959 0.9292 0.8680
Avg Mut 8.82 4.30 3.15
As you can see, adding a factor constraint for HDW Runstyle produced incremental improvement vs. results from the previous sql expression.
The top (win-place-show) part of the revised query results are stronger than the query results from the original sql expression.
Next, this is what the data breakout by PaceLow looks like:
By: PaceLow
Value P/L Bet Roi Wins Plays Pct Impact AvgMut
----------------------------------------------------------------------------------
0 404.30 3580.00 1.1129 457 1790 .2553 1.0279 8.72 PaceLow: FALSE FigLow: FALSE
1 148.40 450.00 1.3298 50 225 .2222 0.8947 11.97 PaceLow: FALSE FigLow: TRUE
2 -108.60 204.00 0.4676 22 102 .2157 0.8684 4.34 PaceLow: TRUE FigLow: FALSE
3 -29.40 90.00 0.6733 8 45 .1778 0.7157 7.58 PaceLow: TRUE FigLow: TRUE
Figure 4
The above data breakout table suggests it may be possible to realize incremental improvement by adding a factor constraint for JCapper PaceLow.
Note that I italicized the words may be possible in the above sentence.
Again, when looking at Historical Racing Data - it's easy to make false assumptions about the data based on what you see. It's also easy to form cognitive biases about the way the data behaves based on false assumptions.
Again, the Scientific Method can help you avoid this pitfall.
The Scientific Method involves formulating hypotheses, deriving predictions from hypotheses as logical consequences, and then conducting experiments to test those predictions.
Instead of making assumptions about the data: Use the Scientific Method. Create hypotheses in the form of questions and test them.
Here's thought process I'd use when applying the Scientific Method to the query results from Figure 4 above:
The concept behind the sql expression is early pace types advantaged by recent track bias.
The data suggests it may be possible to improve the outlook for early pace types advantaged by recent track bias by adding factor constraints for PaceLow.
(Note that this would be the derived prediction as a logical consequence.)
Instead of jumping to conclusions and assuming late Runstyles need to be eliminated - create hypotheses in the form of questions:
- Can rank 1 for early pace types advantaged by recent track bias be improved upon by emphasizing PaceLow = False?
- Can rank 1 for early pace types advantaged by recent track bias be improved upon by eliminating PaceLow = True?
The breakout data in Figure 3 suggests the answer may be yes.
However, we cannot possibly know the answers to the questions we are asking until after testing our hypotheses by looking at more data.
That's the mindset behind The Scientific Method.
Instead of making assumptions about the data - train yourself to ask:
What's the best way to apply the Scientific Method to the Historical Racing Data in front of me?
For now, let's add the following line to the sql expression to eliminate horses with PaceLow of 2 and higher from the query results.
AND PACELOW < 2
Note that I italicized the words for now. Later on in this article, in the section about Forward Testing, I'll illustrate using the Scientific Method to test the hypotheses about whether or not early pace types advantaged by recent track bias can be improved upon by adding factor constraints for PaceLow.
After adding a factor constraint for PaceLow, let's execute the revised sql expression - this time without breaking the data out by another factor.
This is what the top part of the query results look like now:
query start: 3/17/2020 3:28:26 PM
query end: 3/17/2020 3:28:27 PM
elapsed time: 1 seconds
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
SQL UDM Plays Report: Hide
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF16 = 1
AND RANKF32 <= 2
AND RANKF31 <= 4
AND RUNSTYLE < 60
AND PACELOW < 2
AND INTSURFACE = 1
AND [DATE] >= #10-01-2019#
AND [DATE] <= #03-16-2020#
ORDER BY [DATE], TRACK, RACE
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 4582.70 3809.00 3526.60
Bet -4030.00 -4030.00 -4030.00
-----------------------------------------------------
P/L 552.70 -221.00 -503.40
Wins 507 873 1114
Plays 2015 2015 2015
PCT .2516 .4333 .5529
ROI 1.1371 0.9452 0.8751
Avg Mut 9.04 4.36 3.17
Figure 5
As you can see, adding a factor constraint for Pacelow produced incremental improvement vs. results from the previous sql expression.
The top (win-place-show) part of the revised query results are stronger than the query results from the previous sql expression.
Backfitting and staying on top of an ever changing game:
My write-up (above) illustrates a basic process. Create a sql expression based on a concept that produces generally solid results.
From there, add factor constraints that produce incremental improvements.
Although my write-up (above) makes this look easy: I guarantee you it is not.
Have you ever seen the following phrase?
"Past Performance is no guarantee of future results."
In my opinion, truer words have never been spoken. The implications of that phrase are literally why every mutual fund you can name is required by the SEC to provide a disclaimer telling investors that a fund's past performance does not necessarily predict future results.
In my opinion, the same thing applies to horse racing data.
Keep in mind that any time you are querying horse racing data: You are looking at what happened in the past.
It's not terribly difficult to come up an under the radar concept that produced generally solid results in the past - and create a sql expression for it.
From there it's not terribly difficult to add factor constraint after factor constraint to your sql expression - until what you have created has zero chance of producing good results going forward in time - because what you ended up creating is literally more of a backfitting effort than a way to flag horses fitting the original concept you started with.
Read the above paragraph about five times - and let it sink in. Then go back and read it again.
I repeat:
"Past Performance is no guarantee of future results."
So how do you guard against backfitting? And how do you stay on top of subtle changes within the game that are always taking place?
First, you must understand the following:
- The game itself constantly evolves over time.
- Conditions that cause under the radar areas of the game to be overlooked change over time.
- Players chase roi.
This means: Every Plus EV model or method of play you create will have a shelf life of 'duration unknown' from the day you create it.
If you are using a Model - going forward in time from the date you create it - you should consider:
- Forward testing the concept the model is based on.
- The game is slowly changing over time.
- What impact (if any) do the game's subtle changes have on the concept the model is based on?
- Does a big picture look at the data suggest it's time to rethink the concept the model is based on?
- Forward testing the individual factors in the model:
- The game is slowly changing over time.
- What impact (if any) do the game's subtle changes have on the individual factors in the model?
- Does a big picture look at the data suggest it's time to adjust the coefficients for the factors in the model?
- Does a big picture look at the data suggest it's time to add new factors to the model?
- Does a big picture look at the data suggest it's time to remove factors from the model?
- Remember, the whole is equal to the sum of its parts:
- Model = (Concept the model is based on) + (the individual factors in the model)
- Stay on top of things. Evaluate the effect of the game's subtle changes on both.
- Consider stopping live play if a big picture look at the data suggests the model is at the end of its shelf life.
If you are using a SQL Expression - going forward in time from the date you create it - you should consider:
- Forward testing the concept the sql expression is based on.
- The game is slowly changing over time.
- What impact (if any) do the game's subtle changes have on the concept the sql expression is based on?
- Does a big picture look at the data suggest it's time to rethink the concept the sql expression is based on?
- Forward testing the 'incremental improvement' factor constraints subsequently added to the original concept sql expression:
- The game is slowly changing over time.
- What impact (if any) do the game's subtle changes have on each of your 'incremental improvement' factor constraints?
- Does a big picture look at the data suggest it's time to adjust numeric cutoffs for the factor constraints in the sql expression?
- Does a big picture look at changes to the game suggest it's time to add new factor constraints to the sql expression?
- Does a big picture look at changes to the game suggest it's time to remove factor constraints from the sql expression?
- Remember, the whole is equal to the sum of its parts:
- SQL Expression = (Original Concept SQL) + (incremental improvement factor constraints subsequently added)
- Stay on top of things. Evaluate the effect of the game's subtle changes on both.
- Consider suspending live play if a big picture look at the data suggests your sql expression is at the end of its shelf life.
Forward Testing the final sql expression:
Below is a direct cut and paste of query results from the JCapper Data Window.
The sql expression contains the original factor constraints for early pace Userfactor, early-late bias Prob Expression, and inside-outside path bias Prob Expression.
The sql expression also contains the incremental improvement factor constraints for intSurface, HDW Runstyle, and Pacelow.
The only difference between the sql expression below and the sql expression from Figure 5 (above) is the date range:
query start: 3/28/2020 11:46:21 AM
query end: 3/28/2020 11:46:21 AM
elapsed time: 0 seconds
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
SQL UDM Plays Report: Hide
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF16 = 1
AND RANKF32 <= 2
AND RANKF31 <= 4
AND RUNSTYLE < 60
AND PACELOW < 2
AND INTSURFACE = 1
AND [DATE] >= #03-17-2020#
AND [DATE] <= #03-27-2020#
ORDER BY [DATE], TRACK, RACE
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 184.80 135.20 131.50
Bet -148.00 -148.00 -148.00
-----------------------------------------------------
P/L 36.80 -12.80 -16.50
Wins 21 34 42
Plays 74 74 74
PCT .2838 .4595 .5676
ROI 1.2486 0.9135 0.8885
Avg Mut 8.80 3.98 3.13
Figure 6
The date range in Figure 6 (immediately above) is from March 17, 2020 to March 27, 2020 inclusive:
The first 11 days immediately following the query results from Figure 5 (farther above.)
Overall results for the first 11 days of forward testing appear to be promising.
However, 74 plays and 21 wins isn't what I was talking about when I used the words 'a big picture look at the data' (above.)
Before I'd be willing to use this sql expression as the basis for live play: I'd need to see more data.
How much more data?
For me, there isn't a single one size fits all correct answer. There isn't a magic number of plays or set number of winners that makes the decision to use a given method as the basis for live play a reliable one.
For me, it's more of a 'the whole is equal to the sum of its parts' thing.
I generally want to see enough data to verify all three of the following elements are performing well during forward testing:
- The concept the sql expression is based on.
- The 'incremental improvement' factor constraints subsequently added to the original concept sql expression.
- The resulting 'whole' or 'sum of the parts' sql expression.
Specifically, by verify all three - I mean forward test all three both separately and collectively.
Even then - there are times when conditons causing the good past results that caught my eye in the first place will have undergone subtle changes without me being aware of them.
As a result, there are times when I later discover, to the detriment of my bankroll - that what I decided to use as the basis for live play was nearing the end of its shelf life about the time I began playing it.
Forward Testing the original factor constraints and intSurface:
The sql expression contains the original factor constraints for early pace Userfactor, early-late bias Prob Expression, and inside-outside path bias Prob Expression only.
The sql expression does not contain the incremental improvement factor constraints for intSurface, HDW Runstyle, and Pacelow.
The only difference between the sql expression below and the sql expression from Figure 1 (above) is the date range:
query start: 3/28/2020 1:48:02 PM
query end: 3/28/2020 1:48:02 PM
elapsed time: 0 seconds
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
SQL UDM Plays Report: Hide
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF16 = 1
AND RANKF32 <= 2
AND RANKF31 <= 4
AND [DATE] >= #03-17-2020#
AND [DATE] <= #03-27-2020#
ORDER BY [DATE], TRACK, RACE
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 232.80 226.20 237.80
Bet -272.00 -272.00 -272.00
-----------------------------------------------------
P/L -39.20 -45.80 -34.20
Wins 31 57 75
Plays 136 136 136
PCT .2279 .4191 .5515
ROI 0.8559 0.8316 0.8743
Avg Mut 7.51 3.97 3.17
By: intSurface
Value P/L Bet Roi Wins Plays Pct Impact AvgMut Descr
---------------------------------------------------------------------------------------------------------------
0.0000 0.00 0.00 0.0000 0 0 .0000 0.0000 0.00 Unknown
1.0000 5.40 224.00 1.0241 30 112 .2679 1.1751 7.65 Outer/Main Dirt Course
2.0000 0.00 0.00 0.0000 0 0 .0000 0.0000 0.00 Inner Dirt Course
3.0000 -2.00 2.00 0.0000 0 1 .0000 0.0000 0.00 Off the Turf
4.0000 -42.60 46.00 0.0739 1 23 .0435 0.1907 3.40 Outer/Main Turf Course
5.0000 0.00 0.00 0.0000 0 0 .0000 0.0000 0.00 Inner Turf Course
Figure 7
The date range in Figure 7 (immediately above) is from March 17, 2020 to March 27, 2020 inclusive:
The first 11 days immediately following the query results from Figure 5 (farther above.)
The top part of the query results shows the data broken out beneath columns for Win, Place, and Show.
Win percent is slightly below (but very much in line with) win percent from Figure 1 above.
However, average win mutuel is about 11 percent lower than average win mutuel from Figure 1 above.
This could simply be a short term blip in the data. This could also be caused by emergence of new a new public betting trend. Players are constantly chasing roi. The concept behind the sql expression is early pace types advantaged by recent track bias.
It's entirely possible players have (collectively) started recognizing early-late biases at the limited number of tracks running live cards these past few weeks.
Many times a drop in average win mutuel will turn out to be the first early warning sign a sql expression is nearing the end of its shelf life as a result of players chasing roi.
At this point, I'm mostly ok with the original sql expression. I'd need to see more data before thinking my hypothesis about the original sql expression has failed.
A continued decline in average win mutuel would nudge me towards thinking the concept behind the original sql expression is at the end of its shelf life. On the other hand - a cluster of good paying winners would nudge me towards thinking the drop in average win mutuel was merely short term noise in the data.
Q. What about the hypotheses for intSurface?
A look at the forward testing data shown in Figure 7 above suggests to me the hypotheses about surface type is likely true. Dirt results have continued to outperform turf results.
Forward Testing hypotheses about HDW Runstyle and Pacelow:
The sql expression below contains the original factor constraints for early pace Userfactor, early-late bias Prob Expression, inside-outside path bias Prob Expression, and intSurface.
The sql expression does not contain the incremental improvement factor constraints for HDW Runstyle and Pacelow.
The date range in the sql expression is from March 17, 2020 to March 27, 2020 inclusive, and the data is broken out by both HDW Runstyle and Pacelow:
query start: 3/28/2020 2:08:36 PM
query end: 3/28/2020 2:08:36 PM
elapsed time: 0 seconds
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
SQL UDM Plays Report: Hide
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF16 = 1
AND RANKF32 <= 2
AND RANKF31 <= 4
AND INTSURFACE = 1
AND [DATE] >= #03-17-2020#
AND [DATE] <= #03-27-2020#
ORDER BY [DATE], TRACK, RACE
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 229.40 190.40 207.60
Bet -224.00 -224.00 -224.00
-----------------------------------------------------
P/L 5.40 -33.60 -16.40
Wins 30 51 68
Plays 112 112 112
PCT .2679 .4554 .6071
ROI 1.0241 0.8500 0.9268
Avg Mut 7.65 3.73 3.05
By: RunStyle
Value P/L Bet Roi Wins Plays Pct Impact AvgMut
----------------------------------------------------------------------------------
0 17.40 62.00 1.2806 13 31 .4194 1.5656 6.11 E
15 -2.40 16.00 0.8500 2 8 .2500 0.9333 6.80 EP
30 48.60 16.00 4.0375 2 8 .2500 0.9333 32.30 P
45 -30.20 60.00 0.4967 5 30 .1667 0.6222 5.96 PS
60 -28.20 62.00 0.5452 7 31 .2258 0.8430 4.83 S
75 -4.00 4.00 0.0000 0 2 .0000 0.0000 0.00 SS
90 4.20 4.00 2.0500 1 2 .5000 1.8667 8.20 U
By: PaceLow
Value P/L Bet Roi Wins Plays Pct Impact AvgMut
----------------------------------------------------------------------------------
0 17.20 192.00 1.0896 27 96 .2813 1.0500 7.75 PaceLow: FALSE FigLow: FALSE
1 -2.40 20.00 0.8800 2 10 .2000 0.7467 8.80 PaceLow: FALSE FigLow: TRUE
2 -10.00 10.00 0.0000 0 5 .0000 0.0000 0.00 PaceLow: TRUE FigLow: FALSE
3 0.60 2.00 1.3000 1 1 1.0000 3.7333 2.60 PaceLow: TRUE FigLow: TRUE
Figure 8
Q. What about the hypotheses for HDW Runstyle?
A look at the data from forward testing suggests to me the hypotheses about HDW Runstyle are mostly true.
The general trend still shows Early Runstyles continuing to outperform late Runtyles. But not in the same exact manner as the data from Figure 3 above.
The data during forward testing shows PS Runstyles underperforming vs. data for PS Runstyles during initial testing.
The number of plays is small, and this might simply be a short term blip in the data.
The root cause could also be subtle changes in the game. Much of the data in Figure 3 (farther above) comes from races run during the months of January and February when the weather was colder.
Now that it's spring the weather is warmer. It's entirely possible that the limited number of tracks running over the past few weeks are more early biased than tracks that were running several weeks ago because of changes in temperature.
I'd need to see more data before I could decide between short term noise or subtle changes to the game in the form of early-late track bias.
Q. What about the hypotheses for Pacelow?
A look at the data from forward testing data suggests to me the hypotheses about Pacelow are likely true. Horses with Pacelow = False have continued to outperform horses with Pacelow = True.
Tying up loose ends:
The above examples serve to illustrate various areas of uncertainty you will likely encounter as you apply the Scientific Method and forward test hypotheses of your own.
At some point you are going to be faced with answering guestions similar to these:
- Are forward testing results for the original sql expression 'in line' with initial results?
- Does forward testing of the original sql expression show a drop in avg win mutuel vs. initial testing?
- Are forward testing results for incremental improvement factor constraints 'in line' with initial results?
- Does forward testing of incremental improvement factor constraints suggest subtle changes are taking place?
Not only are you going to be faced with answering questions like those above, but you are also going to be faced with making decisions about how best to respond to what you think the data is telling you.
For example:
- If you think the concept behind the original sql expression has reached the end of its shelf life - consider stopping live play.
- If you think subtle changes have taken place - consider adjusting your incremental improvement factor constraints.
At the end of the day:
- Past Performance is no guarantee of future results.
- Whether or not to use something as the basis for live play involves risk - and is a decision you must make for yourself.
- Whether or not to adjust something you are using as the basis for live play based on data from forward testing is a decision you must make for yourself.
- The best you can do is make an effort to apply the Scientific Method to the best of your ability.
I hope I managed to type most of that out in a way that makes sense.
-jp
.
-- NOTES:
Field Mapping for JCapper files and database tables:
JCapper Comma Delimited Text .JCP Past Performance Data File:
http://www.jcapper.com/MessageBoard/Reports/JCPFieldMap_05042017.txt
HDW Text Chart Results File:
http://www.jcapper.com/messageboard/reports/Comma_Delimited_Charts_Spec.pdf
JCapper StarterHistory (Access) table created during JCapper SQL Mode Build Database routines:
http://www.jcapper.com/helpdocs/starterhistorytableschema.html
JCapper pl_profile.txt file created during JCapper Build Database routines:
http://www.jcapper.com/messageboard/pdfs/fieldmap_dec012011.pdf
JCapper ForBill.txt and Forjeff.txt files created during JCapper Calc Races routines:
http://www.jcapper.com/HelpDocs/ForBillDoc.htm
The SQL Expression - Explaining the Commands:
Starting with the first line - we have:
SELECT * FROM STARTERHISTORY
Breaking this down, one element at a time, first we have:
SELECT
Here we are using the select keyword -- "SELECT" (without the quotes) to tell the database driver to READ data from the database.
Next we have:
*
Here we are using the star or wildcard character -- "*" (without the quotes) to tell the database driver to include data from ALL columns in the requested table.
Next we have:
FROM STARTERHISTORY
Here we are using the from keyword -- "FROM" (without the quotes) immediately followed by the name of the table -- "STARTERHISTORY" (without the quotes.)
Put the two together -- "FROM STARTERHISTORY" (without the quotes) to tell the database driver to where it will be getting data from the database. (From the Starterhistory table.)
Here's the first line of the sql expression again:
SELECT * FROM STARTERHISTORY
Which in the sql programming language means: Read data from every column of the Starterhistory table.
Moving on to the next line of the sql expression we have:
WHERE RANKF16 = 1
Let's break that line down. First, we have:
WHERE
Here we are using the a keyword -- "WHERE" (without the quotes) to filter the query results.
Note that everything between "WHERE" (without the quotes) and "ORDER BY" (without the quotes) which can be found in the very last line of the sql expression -- contains specific instructions telling the database driver HOW to filter the query results.
Note that if we were to execute "SELECT * FROM STARTERHISTORY" (without the quotes) by itself without a Where clause:
The database driver would return query results for every record in the table.
There are times when this can actually be useful. For example, if you wanted to see what would have happened had you wagered $2.00 to win, place, and show on every horse in the table!
Getting back to:
WHERE RANKF16 = 1
Here, we are using the Where keyword -- "WHERE" (without the quotes) to filter the query results.
We immediately follow that with "RANKF16 = 1" (without the quotes.)
We are simply adding a factor constraint to the sql expression.
In this case, the constraint "RANKF16 = 1" (without the quotes) simply means the following:
If you look at the Table Schema for the Starterhistory table there is a column or data field named "RANKF16" (without the quotes.)
This is one of the user defined "F" columns in the table. Basically, the user gets to define which factor is inserted into that column by the JCapper program during number crunching.
We are instructing the database driver to evaluate the contents of the RANKF16 field and make a Yes or No decision before inlcuding a record as part of the query results.
If the data for the current record in the RANKF16 field is equal to 1: Include that record in the query results. Otherwise, skip that record and do not include it in the query results.
The result when the database driver executes the query is: "RANKF16 = 1" (without the quotes) causes the returned query results to only include horses that are rank = 1 for the factor stored in the RANKF16 field of the Starterhistory table.
Take a minute to go over that again. Let it sink in before you move on.
Next we have:
AND RANKF32 <= 2
Here, we are using the And keyword -- "AND" (without the quotes) to add another factor contraint - or further filter - to the query results.
If the data for the current record in the RANKF32 field is less than or equal to 2: Include that record in the query results. Otherwise, skip that record and do not include it in the query results.
Next we have:
AND RANKF31 <= 4
Here, we are using the And keyword -- "AND" (without the quotes) to add another factor contraint - or further filter - to the query results.
If the data for the current record in the RANKF31 field is less than or equal to 4: Include that record in the query results. Otherwise, skip that record and do not include it in the query results.
Next we have date ranges:
AND [DATE] >= #10-01-2019#
AND [DATE] <= #03-16-2020#
Here, we are using the And keyword -- "AND" (without the quotes) to add another contraint - or further filter out - the query results.
In this case, we are filtering by date range: From 10-01-2019 to 03-16-2020 inclusive.
If you look at the Table Schema for the Starterhistory table there is a column or data field named "DATE" (without the quotes.)
Note that "DATE" (without the quotes) is also a keyword in the sql programming language.
About those square brackets:
Note that I have enclosed "DATE" (without the quotes) inside of square brackets -- like this "[DATE]" (without the quotes.)
This is done to tell the database driver I am referring to the column in the table named "DATE" (without the quotes) instead of the "DATE" keyword (without the quotes) in the sql programming language.
Rule: In JCapper, always use square brackets like this "[DATE]" (without the quotes) whenever you are using date constriants in a sql expression and are using the Data Window to query the Starterhistory table.
About those pound sign characters:
Note that I have enclosed the date text inside of square bracket characters -- like this: #10-01-2019# (without the quotes) -- and like this: #03-16-2020# (again without the quotes.)
Rule: In JCapper, always enclose date text inside of pound characters. (It's part of the sql programming language.)
Finally, we close out the sql expression with instructions for sorting:
ORDER BY [DATE], TRACK, RACE
Here, we use "ORDER BY" (without the quotes) to tell the database driver the query results need to be sorted.
After that we have "[DATE], TRACK, RACE" (without the quotes) to tell the database driver HOW to sort the query results.
In this case, we are telling the database driver to sort the query results using:
- Data from the [DATE] field first.
- Data from the TRACK field next.
- And data from the RACE field last.
Recapping the Order By clause - we have:
ORDER BY [DATE], TRACK, RACE
Sort the query results first by date, then by track code, and then by race number.
Looking for Historical Racing Data?
HDW (Handicappers Data Warehouse) has historical JCapper V4 Race Files and HDW Text Chart Results Files for all North American thoroughbred races from July 1, 2010 forward through present day.
If you are looking for Historical Racing Data or if you have a data project in mind:
Call or drop me an email,
Jeff Platt
JCapper program author
858.766.0255 (cell)
jeff@jcapper.com
|