瀏覽代碼

basic strategy

master
gtheler 5 年之前
父節點
當前提交
7f2b461110
共有 7 個文件被更改,包括 261 次插入46 次删除
  1. +1
    -1
      players/08-mimic-the-dealer/run.sh
  2. +133
    -0
      players/20-basic-strategy/README.m4
  3. +35
    -0
      players/20-basic-strategy/html_cell.awk
  4. +2
    -1
      players/20-basic-strategy/options.conf
  5. +79
    -41
      players/20-basic-strategy/run.sh
  6. +3
    -3
      players/README.md
  7. +8
    -0
      src/conf.cpp

+ 1
- 1
players/08-mimic-the-dealer/run.sh 查看文件

@@ -1,4 +1,4 @@
rm -f d2p p2d
mkfifo d2p p2d
gawk -f mimic-the-dealer.awk < d2p > p2d &
../../blackjack -n100000 > d2p < p2d
blackjack -n1e5 > d2p < p2d

+ 133
- 0
players/20-basic-strategy/README.m4 查看文件

@@ -0,0 +1,133 @@
define(case_title, Derivation of the basic strategy)
---
title: case_title
...

# case_title

> Difficulty: case_difficulty/100

## Quick run

Execute the `run.sh` script. It should take a few minutes:

```terminal
$ ./run.sh
h20-2 (10 10) 8.0e+04 +63.23 (1.1) -171.17 (1.1) -85.32 (0.5) stand
h20-3 (10 10) 8.0e+04 +64.54 (1.1) -171.50 (1.1) -85.50 (0.5) stand
h20-4 (10 10) 8.0e+04 +65.55 (1.1) -170.33 (1.1) -85.50 (0.5) stand
h20-5 (10 10) 8.0e+04 +66.65 (1.1) -171.25 (1.1) -85.51 (0.5) stand
h20-6 (10 10) 8.0e+04 +67.80 (1.1) -171.07 (1.1) -85.59 (0.5) stand
h20-7 (10 10) 8.0e+04 +77.44 (1.1) -170.53 (1.1) -85.44 (0.5) stand
h20-8 (10 10) 8.0e+04 +79.11 (1.1) -170.08 (1.1) -85.02 (0.6) stand
h20-9 (10 10) 8.0e+04 +75.77 (1.1) -170.31 (1.1) -84.87 (0.6) stand
[...]
p2-6 8e+04 +24.78 (2.9) +3.07 (1.0) yes
p2-7 8e+04 +1.48 (2.0) -8.90 (1.0) yes
p2-8 8e+04 -17.57 (2.0) -16.33 (1.0) uncertain
p2-8 3e+05 -17.88 (1.0) -16.10 (0.5) no
p2-9 8e+04 -38.73 (2.0) -24.38 (1.0) no
p2-T 8e+04 -54.45 (1.8) -34.92 (0.9) no
p2-A 8e+04 -67.11 (1.5) -51.59 (0.9) no
```

A new text file called `bs.txt` with the strategy should be created from scratch:

```
include(bs.txt)dnl
```

## Full table with results

The script computes the expected value of each combination

1. Player’s hand (hard, soft and pair)
2. Dealer upcard
3. Hit, double or stand (for hard and soft hands) and splitting or not (for pairs)
The results are given as the expected value in percentage with the uncertainty (one standard deviation) in the last significant digit.
define(table_head,
<thead>
<tr>
<th class="text-center" width="10%" colspan="2">Hand</th>
<th class="text-center" width="9%">2</th>
<th class="text-center" width="9%">3</th>
<th class="text-center" width="9%">4</th>
<th class="text-center" width="9%">5</th>
<th class="text-center" width="9%">6</th>
<th class="text-center" width="9%">7</th>
<th class="text-center" width="9%">8</th>
<th class="text-center" width="9%">9</th>
<th class="text-center" width="9%">T</th>
<th class="text-center" width="9%">A</th>
</tr>
</thead>
)
```{=html}
<table class="table table-sm table-responsive table-hover small w-100">
table_head
<tbody>
include(pair.html)
</tbody>
table_head
<tbody>
include(soft.html)
</tbody>
table_head
<tbody>
include(hard.html)
</tbody>
</table>
```

include(table.md)

## Detailed explanation

We want to derive the basic strategy from scratch, i.e. without making any assumption. What we are going to do is to play a large (more on what _large_ means below) number of hands by fixing our first two cards and the dealer upcard and sequentially standing, doubling or hitting the first card. Then we will compare the results for the three cases and select as the proper strategy the best one of the three possible choices.

Standing and doubling are easy plays, because after we stand or double down then the dealer plays accordingly to the rules: she hits until seventeen, possibly hitting soft seventeen. But if we hit on our hand, we might need to make another decision whether to stand or hit again. As we do not want to assume anything, we have to play in such an order that if we do need to make another decision, we already know which is the best one.

### Hard hands

So we start by arranging the shoe so that the user gets hard twenty (i.e. two faces) and the dealer gets successively upcards of two to ace. So we play each combination of dealer upcard (ten) three times each playing either

1. always standing
2. always doubling
3. always hitting
In general the first two plays are easy, because the game stops either after standing or after receiving only one card. The last one might lead to further hitting, but since we are starting with a hard twenty, that would either give the player twenty one or a bust. In any case, the game also ends.
So we play a certain number of hands (say one thousand hands) each of these three plays for each of the ten upcard faces and record the outcome. The correct play for hard twenty against each of the ten upcards is the play that gave the better result, which is of course standing.

Next, we do the same for a hard nineteen. In this case, the hitting play might not end after one card is drawn (i.e. we hit on nineteen and get and ace). But if that was the case, we would already know what the best play is from the previous step so we play accordingly and we stand. Repeating this procedure down to hard four we can build the basic strategy table for any hard total against any dealer upcard.

### Soft hands

We can now switch to analyze soft hands. Starting from soft twenty (i.e. an ace and a nine) we do the same we did for the hard case. The only difference is that when hitting, we might end either in another soft hand which we would already analyzed because we start from twenty and go down, or in a hard hand, which we also already analyzed so we can play accordingly.

### Pairs

When dealing with pairs, we have to decide whether to split or not. When we do not split, we end up in one of the already-analyzed cases: either a soft twelve of any even hard hand. When we split, we might end in a hard or soft hand (already analyzed) or in a new pair. But since the new pair can be only the same pair we are analyzing, we have to treat it like we treated the first pair: either to split it or not, so we know how to deal with it.

### Number of hands

The output is the expected value\ $e$ of the bankroll, which is a random variable with an associated uncertainty\ $\Delta e$ (i.e. a certain numbers of standard deviations). For example, if we received only blackjacks, the expected value would be 1.5 (provided blackjacks pay\ 3 to\ 2 of course). If we busted all of our hands without doubling or splitting, the expected value would be -1. In order to say that the best strategy is, let’s say stand and not hitting or doubling down, we have to make sure that $e_h-\Delta e_h > e_s+\Delta e_s$ and $e_h-\Delta e_h > e_d+\Delta e_d$. If there is no play that can give a better expected value than the other two taking into account the uncertainties, then we have to play more hands in order to reduce the random uncertainty.


## Implementation

The steps above can be written in a [Bash](https://en.wikipedia.org/wiki/Bash_%28Unix_shell%29) script that

* loops over hands and upcards,
* creates a strategy file for each possible play hit, double or stand (or split or not),
* runs [Libre Blackjack](https://www.seamplex.com/blackjack),
* checks the results and picks the best play,
* updates the strategy file

```bash
include(run.sh)
```

case_nav

+ 35
- 0
players/20-basic-strategy/html_cell.awk 查看文件

@@ -0,0 +1,35 @@
#function abs(v) {return v < 0 ? -v : v}
function ceil(x, y){y=int(x); return(x>y?y+1:y)}
{
ev=1e-2*$1
error=1e-2*$2

if (ev < -1)
x=-1
else if (ev > 1)
x=1
else
x=ev
# r=0.5-0.5*x
# g=0.5+0.5*x
# b=1-abs(x)

pi = atan2(0, -1)
r=cos((x+1)*pi/4)
g=cos((x-1)*pi/4)
b=0.4+0.2*cos(x*pi/2)

if (error < 1e-6) {
error=1e-4;
}
precision = (ceil(-log(error)/log(10)))-2;
printf("<div class=\"text-center %s\" style='background-color: rgb(%d,%d,%d)'>", (ev<0)?"text-white":"", 255*r, 255*g, 255*b);
printf(sprintf("%%+.%df", precision), 100*ev);
printf("(%.0g)", 10^(precision+2) * error);
printf("</div>");
}

+ 2
- 1
players/20-basic-strategy/options.conf 查看文件

@@ -1,5 +1,6 @@
decks = -1
decks = 0 # infinite decks
flat_bet = 1
no_insurance = true
&error_standard_deviations = 2
; hit_soft_17 = 0
; rng_seed = 1

+ 79
- 41
players/20-basic-strategy/run.sh 查看文件

@@ -1,8 +1,18 @@
#!/bin/bash

n_max=9999999
n0=80000
n_max=9000000

for i in grep awk; do
RED="\033[0;31m"
GREEN="\033[0;32m"

BROWN="\033[0;33m"
MAGENTA="\e[0;35m"
CYAN="\e[0;36m"

NC="\033[0m" # No Color

for i in grep awk printf blackjack; do
if [ -z "$(which $i)" ]; then
echo "error: $i not installed"
exit 1
@@ -18,16 +28,16 @@ declare -A min
min["hard"]=4 # from 20 to 4 in hards
min["soft"]=12 # from 20 to 12 in softs

rm -f hard.html soft.html pair.html
rm -f table.md hard.html soft.html pair.html

# --------------------------------------------------------------
# start with standing
cp hard-stand.txt hard.txt
cp soft-stand.txt soft.txt

cat << EOF > table.md
| Hand | \$n\$ | Stand | Double | Hit |
| ---- | ----- | ----- | ------ | --- |
cat << EOF >> table.md
| Hand | \$n\$ | Stand [%] | Double [%] | Hit [%] | Play |
|:------:|:-----:|:-----------:|:------------:|:--------:|:---------:|
EOF


@@ -70,12 +80,12 @@ EOF
upcard_n=$(($upcard))
fi
n=10000 # start with n hands
n=${n0} # start with n0 hands
best="x" # x means don't know what to so, so play
while [ "${best}" = "x" ]; do
# tell the user which combination we are trying and how many we will play
echo -ne "${t}${hand}-${upcard} ($card1 $card2)\t"$(printf %.0e ${n})
echo -ne "${t}${hand}-${upcard} ($card1 $card2)\t"$(printf %.1e ${n})
for play in s d h; do
# start with options.conf as a template and add some custom stuff
@@ -147,9 +157,9 @@ EOF
if [ ${n} -le ${n_max} ]; then
# if we still have room, take into account errors
error_s=$(echo ${error[${t}${hand},${upcard},s]} | awk '{printf("%+.1f", 100*$1)}')
error_d=$(echo ${error[${t}${hand},${upcard},d]} | awk '{printf("%+.1f", 100*$1)}')
error_h=$(echo ${error[${t}${hand},${upcard},h]} | awk '{printf("%+.1f", 100*$1)}')
error_s=$(echo ${error[${t}${hand},${upcard},s]} | awk '{printf("%.1f", 100*$1)}')
error_d=$(echo ${error[${t}${hand},${upcard},d]} | awk '{printf("%.1f", 100*$1)}')
error_h=$(echo ${error[${t}${hand},${upcard},h]} | awk '{printf("%.1f", 100*$1)}')
else
# instead of running infinite hands, above a threshold asume errors are zero
error_s=0
@@ -163,35 +173,51 @@ EOF
if (( $(echo ${ev_s} ${error_s} ${ev_d} ${error_d} | awk '{print (($1-$2) > ($3+$4))}') )) &&
(( $(echo ${ev_s} ${error_s} ${ev_h} ${error_h} | awk '{print (($1-$2) > ($3+$4))}') )); then
best="s"
echo -e "\tstand"
color=${BROWN}
best_string="stand"
elif (( $(echo ${ev_d} ${error_d} ${ev_s} ${error_s} | awk '{print (($1-$2) > ($3+$4))}') )) &&
(( $(echo ${ev_d} ${error_d} ${ev_h} ${error_h} | awk '{print (($1-$2) > ($3+$4))}') )); then
best="d"
echo -e "\tdouble"
color=${CYAN}
best_string="double"
elif (( $(echo ${ev_h}-${error_h} ${ev_s} ${error_s} | awk '{print (($1-$2) > ($3+$4))}') )) &&
(( $(echo ${ev_h}-${error_h} ${ev_d} ${error_d} | awk '{print (($1-$2) > ($3+$4))}') )); then
best="h"
echo -e "\thit"
color=${MAGENTA}
best_string="hit"
else
best="x"
n=$((${n} * 10))
echo -e "\tuncertain"
color=${NC}
best_string="uncertain"
n=$((${n} * 4))
fi
echo -e ${color}"\t"${best_string}${NC}
done

strategy[${t}${hand},${upcard}]=${best}
# echo "| ${t}${hand}-${upcard} | ${n} | ${ev_s} (${error_s}) | ${ev_h} (${error_h}) | ${ev_d} (${error_d}) |" >> table.md
#
# echo " <!-- ${upcard} -->" >> ${type}.html
# echo " <td>" >> ${type}.html
# echo ${ev_s} ${error_s} | awk -f cell.awk >> ${type}.html
# echo ${ev_h} ${error_h} | awk -f cell.awk >> ${type}.html
# echo ${ev_d} ${error_d} | awk -f cell.awk >> ${type}.html
# echo " </td>" >> ${type}.html
echo "| ${t}${hand}-${upcard} | $(printf %.1e ${n}) | ${ev_s} (${error_s}) | ${ev_h} (${error_h}) | ${ev_d} (${error_d}) | ${best_string} | " >> table.md
echo " <!-- ${upcard} -->" >> ${type}.html
echo " <td>" >> ${type}.html
echo ${ev_s} ${error_s} | awk -f html_cell.awk >> ${type}.html
echo ${ev_h} ${error_h} | awk -f html_cell.awk >> ${type}.html
echo ${ev_d} ${error_d} | awk -f html_cell.awk >> ${type}.html
echo " </td>" >> ${type}.html
# save the strategy again with the best strategy
@@ -213,7 +239,7 @@ EOF
done
done
echo "</tr>" >> ${type}.html
# echo "</tr>" >> ${type}.html
done
done
@@ -222,8 +248,8 @@ done
cat << EOF >> table.md


| Hand | \$n\$ | Yes | No |
| ---- | ----- | ----- | ---- |
| Hand | \$n\$ | Yes [%] | No [%] |
|:------:|:-------:|:----------:|:----------:|
EOF

# --------------------------------------------------------------------
@@ -259,7 +285,7 @@ for hand in A T $(seq 9 -1 2); do
upcard_n=$(($upcard))
fi
n=10000 # start with n hands
n=${n0} # start with n0 hands
best="x" # x means don't know what to so, so play
while [ "${best}" = "x" ]; do
@@ -328,8 +354,8 @@ EOF
if [ $n -le ${n_max} ]; then
# if we still have room, take into account errors
error_y=$(echo ${error[${t}${hand},${upcard},y]} | awk '{printf("%+.1f", 100*$1)}')
error_n=$(echo ${error[${t}${hand},${upcard},n]} | awk '{printf("%+.1f", 100*$1)}')
error_y=$(echo ${error[${t}${hand},${upcard},y]} | awk '{printf("%.1f", 100*$1)}')
error_n=$(echo ${error[${t}${hand},${upcard},n]} | awk '{printf("%.1f", 100*$1)}')
else
# instead of running infinite hands, above a threshold asume errors are zero
error_y=0
@@ -340,25 +366,37 @@ EOF
echo -ne "\t${ev_n}\t(${error_n})"
if (( $(echo ${ev_y} ${error_y} ${ev_n} ${error_n} | awk '{print (($1-$2) > ($3+$4))}') )); then
best="y"
echo -e "\tyes"
color=${GREEN}
best_string="yes"
elif (( $(echo ${ev_n} ${error_n} ${ev_y} ${error_y} | awk '{print (($1-$2) > ($3+$4))}') )); then
best="n"
echo -e "\tno"
color=${RED}
best_string="no"
else
best="x"
n=$((${n} * 10))
echo -e "\tuncertain"
color=${NC}
best_string="uncertain"
n=$((${n} * 4))
fi
echo -e ${color}"\t"${best_string}${NC}
done

echo "| ${t}${hand}-${upcard} | ${n} | ${ev_y} (${error_y}) | ${ev_n} (${error_n}) |" >> table.md
echo "| ${t}${hand}-${upcard} | $(printf %.1e ${n}) | ${ev_y} (${error_y}) | ${ev_n} (${error_n}) | ${best_string} | " >> table.md
# echo " <!-- ${upcard} -->" >> ${type}.html
# echo " <td>" >> ${type}.html
# echo ${ev_y} ${error_y} | awk -f cell.awk >> ${type}.html
# echo ${ev_n} ${error_n} | awk -f cell.awk >> ${type}.html
# echo " </td>" >> ${type}.html
echo " <!-- ${upcard} -->" >> ${type}.html
echo " <td>" >> ${type}.html
echo ${ev_y} ${error_y} | awk -f html_cell.awk >> ${type}.html
echo ${ev_n} ${error_n} | awk -f html_cell.awk >> ${type}.html
echo " </td>" >> ${type}.html
strategy[${t}${hand},${upcard}]=${best}
@@ -377,7 +415,7 @@ done

cat header.txt hard.txt header.txt soft.txt header.txt pair.txt > bs.txt
rm -f blackjack.conf
rm -f hard.txt soft.txt pair.txt blackjack.conf
if [ "${debug}" == "0" ]; then
rm -f *.yaml
rm -f *.str

+ 3
- 3
players/README.md 查看文件

@@ -6,9 +6,9 @@ title: Example players for LibreBlackjack

The subdirectory `players` contains some automatic players that play against LibreBlackjack. These players are coded in different languages and communicate with LibreBlackjack in a variety of ways in order to illustrate the design basis:

* [`00-internal`](00-internal) uses the internal player that defaults to playing one million hands of basic strategy
* [`00-internal`](00-internal) uses the internal player (coded in C++) that defaults to playing one million hands of basic strategy
* [`02-always-stand`](02-always-stand), using the UNIX tool `yes` this player always says “stand” into the standard output (which is piped to libreblackjack’s standard input) no matter what the cards are
* [`05-no-bust`](05-no-bust) is a PERL-based player does not bust (i.e. hits if the hard total is less than twelve) that receives tha cards through the standard input but draws or stands using a FIFO to talk back to the dealer
* [`05-no-bust`](05-no-bust) is a PERL-based player does not bust (i.e. hits if the hard total is less than twelve) that receives the cards through the standard input but draws or stands using a FIFO to talk back to the dealer. There are also implementation in AWK (similar speed) and in a pure Bash script (10x slower)
* [`08-mimic-the-dealer`](08-mimic-the-dealer) does tha same the dealer do (hits soft seventeens). It is implemented in AWK using two FIFOs.
* [`20-basic-strategy`](20-basic-strategy) derives the basic strategy from scratch in less than one minute.
* [`20-basic-strategy`](20-basic-strategy) derives the basic strategy from scratch in less than one minute by calling the internal player successively with different strategy files from a shell script.


+ 8
- 0
src/conf.cpp 查看文件

@@ -218,8 +218,16 @@ bool Configuration::set(int *value, std::list<std::string> key) {
}

bool Configuration::set(unsigned int *value, std::list<std::string> key) {
// check for negative values
for (auto it : key) {
if (exists(*(&it))) {
int tmp = std::stoi(data[*(&it)]);
if (tmp < 0) {
std::cerr << "key " << *(&it) << " cannot be negative" << std::endl;
exit(-1);
}
*value = std::stoi(data[*(&it)]);
return true;
}

Loading…
取消
儲存