MST by Kruskal & Prim @sed

2008-03-21 (Fri) 23:16
プログラミング ( sed/wake/awk )

追いコンで「Primの方が単純ではないか」という指摘をもらったので試しに sed でも prim を実装することにした．

とりあえずオリジナルの Kruskal バージョン．大半がソートに取られている．もっと簡単で速いソートがほしい．

# input:
#  a set of edges
#   each line: v_in v_out weight
#
# output:
#  a set of edges marked with use-flag
#   eachline: useflag v_in v_out weight
#
#   useflag = T    if the edge is used
#             F    if the edge is not used
#  
#   the order of edges in the output is not the same as the input
#   edges are sorted by weights
#
 
 
H
$!d
s/.*//
x
s/\n//
 
h
 
# make a set of vertices
 
s/^\(\w* \w* \).*$/\1/gm
 
:Z
s/\b\(\w\+\)\b\(.*\b\1\b\)/\2/
tZ
 
s/[ \n]\+/\n/g
s/^\n*\|\n*$//g
 
x
 
# here,
#   hold space: vertices
#   pattern space: edges
 
 
# rearrange to weight,v_in,v_out
s/^\(\w*\) \(\w*\) \(.*\)$/\3,\1,\2/gm
 
#=================== start of merge sort by key ==============
# input: key1,value11,value12,...value1n\nkey2,value21,value22,...value2n\n...
# output: sorted items
#
# a key is a number (i.e., key = [0-9]\+)
# a value does not contain symbols (i.e., value = \w*)
#
 
 
s/$/ /gm
 
 
:a
 
/\n/!bq
 
# each line is sorted: key1,value11,value12,...,value1n key2,value21,...
#   consequtive items are separated by a space
#   each item has a key and values
#     key is on the head, followed by values
 
s/^\(.*\)\n\(.*\)$/@\1: %\2/gm
 
# each line: @item11 item12 ... item1n : %item21 item22 ... item2m
# ':' is a separator of two lists
# '@' is the head of non-merged part of the first list
# '%' is the head of non-merged part of the second list
 
:b
 
# each line: sortedpart @item1i ... item1n : %item2j ... item2m
 
# already done 
s/^[^%@:\n]*$/~&/gm                  
 
 
# done (2nd list is consumed completely)
s/^\(.*\)@\(.*\): % *$/~\1\2/gm  
 
# done (1st list is consumed completely)    
s/^\(.*\)@: \(.*\)%\(.*\)/~\1\2\3/gm
 
/@/!be
x
s/$/\n==mergesort_by_num_key==/
x
H
 
 
# ignore merged lists
s/^~.*$/~/gm         
 
# prepare to comparation
s/^.*@\(\w*\)\(,\w*\)*.*%\(\w*\)\(,\w*\)*.*$/\1 \3/gm  
 
#================= start of num_comparator ================
#
#  num11 num12\nnum21 num22\n...\n~\n...\nnumn1 numn2
# 
#  each line is replaced with a relation
#    numk1 < numk2 -> < 
#    numk1 > numk2 -> > 
#    numk1 = numk2 -> = 
#  a line of '~' is ignored
 
# preserve the hold space
x
s/$/\n==num_comparator==/
x
H
 
# comparation by lengths
 
:1
s/^ $/=/gm
s/^ .*/</gm
s/.* $/>/gm
s/^[0-9]\|\b[0-9]//gm
t1
s/\n//g
G
s/\n.*==num_comparator==//
x
s/\n==num_comparator==.*//
x
 
:2
s/\`=\(.*\)\n\(.*\)$/\1#\2/m
s/\`<\(.*\)\n\(.*\)$/\1#</m
s/\`>\(.*\)\n\(.*\)$/\1#>/m
s/\`~\(.*\)\n.*$/\1#~/m
t2
# comparation by values
 
 
s/#/\n/g
s/\n//
s/^/#/gm
:3
s/#\(.\)\(.*\) \(.\)\(.*\)/\1\3 #\2 \4/gm
t3
s/#//g
:4
s/00/=/g
s/0[1-9]/</g
s/[1-9]0/>/g
y/123456789/012345678/
/[0-9]/b4
 
s/ //gm
s/=*\(.\).*/\1/gm
 
#================= end of num_comparator ================
 
s/\n//g
G
s/\n.*==mergesort_by_num_key==/#/
x
s/\n==mergesort_by_num_key==.*//
x                     
 
 
# [<>=~]*#\nmergedpart @rest_of_list1 %rest_of_list2\n... 
 
# according to the results of comparations held in the head,
# move a head of either the rest of list1 or list2 to the tail of merged_part.
 
:m
s/^[<=]\(.*\)\n\(.*\)@\(\(\w\|,\)*\) \(.*\)%\(\(\w\|,\)*\)/\1-\2\3 @\5%\6/m
s/^>\(.*\)\n\(.*\)@\(\(\w\|,\)*\)\(.*\)%\(\(\w\|,\)*\) /\1-\2\6 @\3\5%/m
s/^~\([^\n]*\)\n~/\1-/
tm
s/-/\n/g
s/#\n//
bb
 
:e
s/~//g
 
 
ba
 
:q
#=================== end of merge sort by key ==============
 
s/ $//
s/ /\n/g
 
# restore the order
s/^\(.*\),\(\w*\),\(\w*\)$/\2 \3 \1/gm
x
 
#   hold space: sorted edges
#   pattern space: vertices
 
# make singleton sets
 
s/^.*$/& &/gm
 
x
s/^/#/
s/$/\n/
x
 
# main loop
# 
# use marked edge ('#' on the head) to union sets 
 
:Y
 
# finished?
x
/#$/bX
x
 
s/$/\n==end_of_vertices==/
 
G
s/==end_of_vertices==.*#\(\w* \w*\) .*/\1/
 
s/\(.*\)\n\(.*\)/\2\n\1\n/
 
#============== start of unionfind ========================
#
# input: namex namey\nname1 parent1\nname2 parent2\n...\n
#
# output: root flag\nname1 parent1\n...\nnamex root\n...\nnamey root\n...
#    where root is the representive name of the set
#          parents of namex and namey are set to root
#          flag is F if namex and namey have already been in the same set
#                  T otherwise (i.e., sets are unioned)
#
 
# find the root of the first element
:A
s/^\(\w\+\) \(.*\n\1 \(\w\+\)\b\)\([^#]\)/\3 \2#\4/
tA
 
# find the root of the second element
:B
s/^\(\w\+\) \(\w\+\)\b\(.*\n\2 \(\w\+\)\b\)\([^#]\)/\1 \4\3#\5/
tB
 
/^\(\w*\) \1\b/{s//\1 F/;bD}
/^\w\+ \(\w\+\)\b.*\n\1 \1#/{s/^\(\w*\) \w*\b/\1 T/;bD}
s/^\(\w*\) \w*\b/\1 F/
:D
 
# refine the parent-relation
 
:E
s/^\(\w*\)\( .* \)\w*#/\1\2\1/
tE
 
# add flags 
 
s/\n$//
 
#============== end of unionfind ========================
 
# marking the edge
 
x
s/$/\n==end_of_edges==/
 
G
s/#\([^\n]*\n\)\(.*\)\n==end_of_edges==\n\w* \(.\).*/\3 \1#\2/
 
x
 
# remove return value of unonfind
s/.*\n//m
 
bY
 
:X
s///
s/\n$//

これのコメントをなくしてホールドスペースの保護用のマークを小さくすれば 1600B 弱になる．

次に，Kruskal のエッジソート後の処理を変えて作った Prim の実装．

# input:
#  a set of edges
#   each line: v_in v_out weight
#
# output:
#  a set of edges marked with use-flag
#   eachline: useflag v_in v_out weight
#
#   useflag = T    if the edge is used
#             F    if the edge is not used
#  
#   the order of edges in the output is not the same as the input
#   edges are sorted by weights
#
 
 
H
$!d
s/.*//
x
s/\n//
 
h
 
# make a set of vertices
 
s/^\(\w* \w* \).*$/\1/gm
 
:Z
s/\b\(\w\+\)\b\(.*\b\1\b\)/\2/
tZ
 
s/[ \n]\+/ /g
s/^ *\| *$//g
 
x
 
# here,
#   hold space: vertices
#   pattern space: edges
 
 
# rearrange to weight,v_in,v_out
s/^\(\w*\) \(\w*\) \(.*\)$/\3,\1,\2/gm
 
#=================== start of merge sort by key ==============
# input: key1,value11,value12,...value1n\nkey2,value21,value22,...value2n\n...
# output: sorted items
#
# a key is a number (i.e., key = [0-9]\+)
# a value does not contain symbols (i.e., value = \w*)
#
 
 
s/$/ /gm
 
 
:a
 
/\n/!bq
 
# each line is sorted: key1,value11,value12,...,value1n key2,value21,...
#   consequtive items are separated by a space
#   each item has a key and values
#     key is on the head, followed by values
 
s/^\(.*\)\n\(.*\)$/@\1: %\2/gm
 
# each line: @item11 item12 ... item1n : %item21 item22 ... item2m
# ':' is a separator of two lists
# '@' is the head of non-merged part of the first list
# '%' is the head of non-merged part of the second list
 
:b
 
# each line: sortedpart @item1i ... item1n : %item2j ... item2m
 
# already done 
s/^[^%@:\n]*$/~&/gm                  
 
 
# done (2nd list is consumed completely)
s/^\(.*\)@\(.*\): % *$/~\1\2/gm  
 
# done (1st list is consumed completely)    
s/^\(.*\)@: \(.*\)%\(.*\)/~\1\2\3/gm
 
/@/!be
x
s/$/\n==mergesort_by_num_key==/
x
H
 
 
# ignore merged lists
s/^~.*$/~/gm         
 
# prepare to comparation
s/^.*@\(\w*\)\(,\w*\)*.*%\(\w*\)\(,\w*\)*.*$/\1 \3/gm  
 
#================= start of num_comparator ================
#
#  num11 num12\nnum21 num22\n...\n~\n...\nnumn1 numn2
# 
#  each line is replaced with a relation
#    numk1 < numk2 -> < 
#    numk1 > numk2 -> > 
#    numk1 = numk2 -> = 
#  a line of '~' is ignored
 
# preserve the hold space
x
s/$/\n==num_comparator==/
x
H
 
# comparation by lengths
 
:1
s/^ $/=/gm
s/^ .*/</gm
s/.* $/>/gm
s/^[0-9]\|\b[0-9]//gm
t1
s/\n//g
G
s/\n.*==num_comparator==//
x
s/\n==num_comparator==.*//
x
 
:2
s/\`=\(.*\)\n\(.*\)$/\1#\2/m
s/\`<\(.*\)\n\(.*\)$/\1#</m
s/\`>\(.*\)\n\(.*\)$/\1#>/m
s/\`~\(.*\)\n.*$/\1#~/m
t2
# comparation by values
 
 
s/#/\n/g
s/\n//
s/^/#/gm
:3
s/#\(.\)\(.*\) \(.\)\(.*\)/\1\3 #\2 \4/gm
t3
s/#//g
:4
s/00/=/g
s/0[1-9]/</g
s/[1-9]0/>/g
y/123456789/012345678/
/[0-9]/b4
 
s/ //gm
s/=*\(.\).*/\1/gm
 
#================= end of num_comparator ================
 
s/\n//g
G
s/\n.*==mergesort_by_num_key==/#/
x
s/\n==mergesort_by_num_key==.*//
x                     
 
 
# [<>=~]*#\nmergedpart @rest_of_list1 %rest_of_list2\n... 
 
# according to the results of comparations held in the head,
# move a head of either the rest of list1 or list2 to the tail of merged_part.
 
:m
s/^[<=]\(.*\)\n\(.*\)@\(\(\w\|,\)*\) \(.*\)%\(\(\w\|,\)*\)/\1-\2\3 @\5%\6/m
s/^>\(.*\)\n\(.*\)@\(\(\w\|,\)*\)\(.*\)%\(\(\w\|,\)*\) /\1-\2\6 @\3\5%/m
s/^~\([^\n]*\)\n~/\1-/
tm
s/-/\n/g
s/#\n//
bb
 
:e
s/~//g
 
 
ba
 
:q
#=================== end of merge sort by key ==============
 
s/ $//
s/ /\n/g
 
# restore the order
s/^\(.*\),\(\w*\),\(\w*\)$/\2 \3 \1/gm
 
#   pattern space: sorted edges
#   hold space: vertices
 
s/^/F /gm
 
s/$/%/
G
s/%./\n% /
# the first vertex
s/ \w*$//
 
 
 
#======== start of main loop
:Y
s/#//
 
/%$/bX
 
s/F/#F/
:V
 
# marked with # is the current
 
# used v1, unused v2 
/#F \(\w\+\)\b.*%.* \1\b/!{/#F\( \w* \(\w\+\)\b.*%.*\) \2\b/{s//#T\1/
bY}
bW}
# used v2 unused v1
/#F \w* \(\w\+\)\b.*%.* \1\b/!{/#F\( \(\w\+\)\b.*%.*\) \2\b/{s//#T\1/
bY}}
:W
 
/#\(.[^F]*\)F/!bY
s//\1#F/
 
bV
 
#======== end of main loop
 
:X
 
s/.%//

こいつのコメントなどの無駄を省くと 1350B強．

ということで，Prim のほうが短く出来た．union/find なんか使わずに単純な処理で追加するエッジが選べるので短い．

が，しかし，上のPrimの実装はKruskalに比べて２倍近く遅い．ソート部分が同じなので，エッジの選択だけだと３～４倍は遅い．おかげで timeout をくらい提出できず．まだまだ改良の余地があるけれど，打数が伸びそうなので放っておくことにする．