String#underscore を Haskell で

cf. HaHaHa! - ハイフンで区切られた文字をキャピタライズ
cf. 趣味的にっき - ハイフンで区切られた文字をキャピタライズ

↑ここら辺を見て思い出したのが,Ruby on Rails (より正確には ActiveSupport)にある String#underscore。
String#underscore は大文字/小文字/数字からなる文字列を小文字/数字/アンダースコアからなる文字列に変換するメソッドで……つまりこんな感じ。

 D:\>irb -rrubygems -ractive_support --simple-prompt
 >> [ "Railes",
 ?> "ActiveSupport",
 ?> "Active1Support",
 ?> "Active1support",
 ?> "ToDoList",
 ?> "XML",
 ?> "XML2",
 ?> "XMLData",
 ?> "XMLdata",
 ?> "XML2Data",
 ?> "Iso2022jpMailer"
 >> ].each do |s| p s.underscore end
 "railes"
 "active_support"
 "active1_support"
 "active1support"
 "to_do_list"
 "xml"
 "xml2"
 "xml_data"
 "xm_ldata"
 "xml2_data"
 "iso2022jp_mailer"
 => ["Railes", "ActiveSupport", "Active1Support", "Active1support", "ToDoList", "
 XML", "XML2", "XMLData", "XMLdata", "XML2Data", "Iso2022jpMailer"]

これを Haskell でやってみた。


'_' は大文字の直前に入るんだけど,前後の文字によって入ったり入らなかったりで,ちょっと複雑。なので State モナドを使ってみた。

 import Control.Monad.State
 import Data.Char
 import System.Environment
 
 
 
 data CharCase = UpperB4Upper | UpperB4Lower | UpperB4Number | UpperAtEnd | Lower | Number | None
 
 
 cState :: Char -> CharCase
 cState c | isUpper c = UpperB4Upper
          | isLower c = Lower
          | otherwise = Number
 
 
 procChar :: Char -> State CharCase [Char]
 procChar c = get >>= p
   where
     p UpperB4Upper  = do put $ cState c
                          return [toLower c]
     p UpperB4Lower  = do put $ cState c
                          return $ "_" ++ [toLower c]
     p UpperB4Number = do put $ cState c
                          if isUpper c then return [toLower c]
                                       else return $ "_" ++ [c]
     p UpperAtEnd    = do put $ cState c
                          if isUpper c then return [toLower c]
                                       else return $ "_" ++ [c]
     p Lower  | isUpper c = do put (UpperB4Lower)
                               return [toLower c]
              | otherwise = do put $ cState c
                               return [c]
     p Number | isUpper c = do put (UpperB4Number)
                               return [toLower c]
              | otherwise = do put $ cState c
                               return [c]
     p None   = do put $ if isUpper c then UpperAtEnd
                                      else cState c
                   return [toLower c]
 
 
 underScoreR :: String -> State CharCase String
 underScoreR s = do strs <- mapM procChar $ reverse s
                    return $ reverse $ concat strs
 
 
 underScore :: String -> String
 underScore s = evalState (underScoreR s) None
 

UpperAtEnd は UpperB4Number と扱いが同じだから無くてもいいかも。


チェック用関数。

 samples = [ ( "Rails"           , "rails"            )
           , ( "ActiveSupport"   , "active_support"   )
           , ( "Active1Support"  , "active1_support"  )
           , ( "Active1support"  , "active1support"   )
           , ( "ToDoList"        , "to_do_list"       )
           , ( "XML"             , "xml"              )
           , ( "XML2"            , "xml2"             )
           , ( "XMLData"         , "xml_data"         )
           , ( "XMLdata"         , "xm_ldata"         )
           , ( "XML2Data"        , "xml2_data"        )
           , ( "Iso2022jpMailer" , "iso2022jp_mailer" )
           ]
 
 
 checkSamples = mapM_ (putStrLn . check) samples
 
 check s = (show $ underScore org == uds) ++ "    " ++ org ++ " => " ++ uds
   where
     org = fst s
     uds = snd s


実際にやってみると

 *Main> checkSamples
 Loading package mtl-1.0 ... linking ... done.
 True    Rails => rails
 True    ActiveSupport => active_support
 True    Active1Support => active1_support
 True    Active1support => active1support
 True    ToDoList => to_do_list
 True    XML => xml
 True    XML2 => xml2
 True    XMLData => xml_data
 True    XMLdata => xm_ldata
 True    XML2Data => xml2_data
 True    Iso2022jpMailer => iso2022jp_mailer

OKみたい。