THM day 2 - Advent of Cyber
Contents
The second day is about log analysis. I take a look at data science, and the appliance in cyber security. In the first part there is an introduction to Jupyter Notebooks, Python, Pandas, and Mathplotlib.
The data
I get access to the pre-setup Jupyter notebooks, which contains the following data I need for the challenges.
Network_traffic.csv
Workbook.ipynb
The challenges
- How many packets were captured (looking at the PacketNumber)?
- What IP address sent the most amount of traffic during the packet capture?
- What was the most frequent protocol?
Challenge 1
Using df.count()
shows the amount of captured packets.
Challenge 2
df.groupby(['Source']).size()
solves the challenge, to the the column name I looked in the CSV file.
Challenge 3
df['Protocol'].value_counts()
solves this challenge. I used the following Stack Overflow link for an explanation https://stackoverflow.com/questions/35523635/extract-values-in-pandas-value-counts
Takeaways
- Jupyter Notebooks are great; easy to share/execute, easy to demonstrate POC’s
- Pandas is good to analyse csv files
- Python is the best
- Basic of Mathplotlib