THM day 2 - Advent of Cyber

The second day is about log analysis. I take a look at data science, and the appliance in cyber security. In the first part there is an introduction to Jupyter Notebooks, Python, Pandas, and Mathplotlib.

I get access to the pre-setup Jupyter notebooks, which contains the following data I need for the challenges.

Network_traffic.csv
Pasted image 20231213202756

Workbook.ipynb
Pasted image 20231210140337

  1. How many packets were captured (looking at the PacketNumber)?
  2. What IP address sent the most amount of traffic during the packet capture?
  3. What was the most frequent protocol?

Using df.count() shows the amount of captured packets.
Pasted image 20231213201843

df.groupby(['Source']).size() solves the challenge, to the the column name I looked in the CSV file.
Pasted image 20231213202143

df['Protocol'].value_counts() solves this challenge. I used the following Stack Overflow link for an explanation https://stackoverflow.com/questions/35523635/extract-values-in-pandas-value-counts
Pasted image 20231213202457

  • Jupyter Notebooks are great; easy to share/execute, easy to demonstrate POC’s
  • Pandas is good to analyse csv files
  • Python is the best
  • Basic of Mathplotlib