
Skyscanner
Tools:
Python, SQL, Database, Automation, Data Pipeline, Google Cloud
An automated data pipeline and visualization of flights over Las Vegas, NV
Goal: Use a raspberry pi and the PiAware (https://flightaware.com/adsb/piaware/) software to scan planes flying over Las Vegas, NV, store the data using a relational database, and visualize the movement of planes using Power BI.
Data Collection: The PiAware software had a live JSON site on the local network that listed overhead aircraft by callsign and included flight statistics such as altitude, latitude, and longitude. A python script was used to automatically access this JSON site every 60 seconds, scrape the data, reformat as a dataframe, and input into the database. Separate tables were used for flights that had a callsign and ones without a callsign (this was used as a “dump” table). Five separate insert statements were used to populate various tables with the scanned data.
Additional scripts were developed using the AeroDataBox API from RapidAPI (https://rapidapi.com/aedbx-aedbx/api/aerodatabox/) to get additional information about the flight based on the flight’s callsign and departure times. As the same callsign might have upwards of 3-4 different flights per day, the callsign was used to pull data from the API and the scanned time of the flight was used in conjunction with the arrival and departure times from the API to identify the correct flight. API data that was retrieved included airline, aircraft, and departure and arrival airport codes. This data was then used to further ping the API for additional information about the airports, airlines, and aircraft types.
The API pulls were done manually to closely monitor the API limits. Cost constraints limited the number of flights that additional data could be pulled for.
Data Storage: A relational database was designed and deployed using the Google Cloud Platform. The database schema can be downloaded below.
Data Cleaning: Data cleaning for this project was minimal as data from the JSON site and APIs was relatively clean. Any data cleaning that was necessary was done prior to uploading to the database.
Data Visualization: As needed, the scanned flights were pulled from the database and the data was input into a Power BI dashboard. The purpose of the dashboard was to monitor the performance of the skyscanner. The visuals that were developed were intended to do two things:
1. Identify anomalies in data, identify instances where the skyscanner was down, and identify any blind spots in the skyscanner’s abilities.
· Anomalies: A small number of flights were scanned with altitudes over 60k feet and upwards of 100k feet. Commercial flights will not fly over 50k feet, which indicates these flights may have been bad scans. However, the number of flights was small enough on a percentage basis to disregard.
· Scanner Outage: The skyscanner was down on December 24, 2021, for an unknown reason.
· Blind spots: No apparent blind spots were identified
2. Identify flight trends based on volume of flights overhead
· Unsurprisingly, the Friday after thanksgiving had the highest number of flights.
The Power BI visualization dashboard and the last data pull can be downloaded below.
Current Status: The skyscanner was shut down due to both a move away from Las Vegas and operating costs to maintain the Google Cloud database.
Project Gallery




