Abstract:The current OD mining method has some ubiquitous problem that unable to analyze multiple lines in parallel, low efficiency, and low prediction rate. Considering the query performance advantages of Hive on massive data, OD mining based on Hive overcomes the problems above. The time threshold was used to match the boarding station, the failed matched record will be matched again base on the number of boarding passenger. Trip-chaining method base on table joining was used to match alighting station, the failed predicted record will be matched twice base on probability. The IC card consumption data and the scheduling data in Shijiazhuang city from January 1, 2018, to March 27, 2018, were used to do OD mining, 11,270,037 OD records were mined from cleaned 11,312,505 trip records. The matching rate reached 99.6%, with the high quality of travel and attraction checking results. Spend 17829.04s on running this method with Hive parallel on. The results show that the method satisfies the business requirements of offline OD mining in a production environment.