Re: W train returning when Phase I of the 2nd Av subway opens in December 2016?. (1377947) | |||
Home > SubChat | |||
[ Read Responses | Post a New Response | Return to the Index ] |
|
Re: W train returning when Phase I of the 2nd Av subway opens in December 2016?. |
|
Posted by Stephen Bauman on Sun Dec 20 21:36:52 2015, in response to Re: W train returning when Phase I of the 2nd Av subway opens in December 2016?., posted by BusRider on Sun Dec 20 19:32:43 2015. I placed the csv files into a database (Postgresql). That's what's needed to analyze the data in a meaningful way.Here are the SQL statements I used to generate the results. First, I need to figure out what the id's are of the individual stations. select distinct * from stops where sq_seq = 7 order by stop_name, stop_id This just reads the stops file and sorts it alphabetically by name. N.B. I've been keeping track of a lot of data. Each one has its own sequence index. The latest one is 7. This gives me the id for the Lex/59th station, which is what I'm interested in. It's R11S for the southbound direction. Next I want to figure out the headways and arrange them in ascending order. That way I can spot the exceptional values. Here's the SQL. select aa.* from (select a.arrival_time, b.route_id, b.trip_headsign, d.departure_time, d.stop_id, e.stop_name, case when d.departure_time > a.arrival_time then a.arrival_time - d.departure_time + '24 hour' else a.arrival_time - d.departure_time end as travel_time , a.arrival_time - lag(a.arrival_time) over (order by a.arrival_time) as headway from stop_times a join trips b on (a.sq_seq = b.sq_seq and a.trip_id = b.trip_id) join calendar c on (b.sq_seq = c.sq_seq and b.service_id = c.service_id) join stop_times d on (a.sq_seq = d.sq_seq and a.trip_id = d.trip_id) join stops e on (d.sq_seq = e.sq_seq and d.stop_id = e.stop_id) where a.sq_seq = 7 and c.wednesday is true and d.stop_sequence = 1 and a.stop_id = 'R11S' and b.route_id in ('N', 'Q', 'R'))aa order by aa.headway, aa.arrival_time This is an nested SQL statement so, it's better to work from the inside out. I'm scanning the stop_times table for the 'R11S' station. Each entry has a trip id, which I use to link to the specific entry in the trips table. The trips table has the route id and the destination sign. I limit the search only to the N, Q and R routes. The trips table also has an entry to the calendar table via the service id. I further limit the search to weekdays (namely Wednesdays). Next I go back to the stop_times table to pick up where the train originated. That's linked by the trip id. The stop where the train originated has a stop sequence of 1. That allows me to pick up the train's departure time and calculate the travel time. I'd like to know the name of that first station, so I link to the stops table via the first station's stop id. Finally, I need to calculate the difference in arrival times between trains at Lex/59 to calculate the headway. That's where the lag function comes in. The lag function orders a particular field in a given order and returns the previous row's value. Subtracting that value from the current row's arrival time gives me the headway. I sort the table by headway and station arrival time to give me hints where to look. I run the same query again and order them only by the arrival time to get the data I presented. I nested the SQL so that I could use the order statement with the column names I created in the inner SQL. |