THM day 2 - Advent of Cyber

Contents
The second day is about log analysis. I take a look at data science, and the appliance in cyber security. In the first part there is an introduction to Jupyter Notebooks, Python, Pandas, and Mathplotlib.
The data
I get access to the pre-setup Jupyter notebooks, which contains the following data I need for the challenges.
Network_traffic.csv

Workbook.ipynb

The challenges
- How many packets were captured (looking at the PacketNumber)?
- What IP address sent the most amount of traffic during the packet capture?
- What was the most frequent protocol?
Challenge 1
Using df.count() shows the amount of captured packets.

Challenge 2
df.groupby(['Source']).size() solves the challenge, to the the column name I looked in the CSV file.

Challenge 3
df['Protocol'].value_counts() solves this challenge. I used the following Stack Overflow link for an explanation https://stackoverflow.com/questions/35523635/extract-values-in-pandas-value-counts

Takeaways
- Jupyter Notebooks are great; easy to share/execute, easy to demonstrate POC’s
- Pandas is good to analyse csv files
- Python is the best
- Basic of Mathplotlib