Towards in-situ Psychological Profiling of Cybercriminals Using Dynamically Generated Deception Environments

Jacob Quibell [email protected] 0000-0001-9830-9839

(2024)

Abstract.

Cybercrime is estimated to cost the global economy almost $10 trillion annually and with businesses and governments reporting an ever-increasing number of successful cyber-attacks there is a growing demand to rethink the strategy towards cyber security. The traditional, perimeter security approach to cyber defence has so far proved inadequate to combat the growing threat of cybercrime. Cyber deception offers a promising alternative by creating a dynamic defence environment. Deceptive techniques aim to mislead attackers, diverting them from critical assets whilst simultaneously gathering cyber threat intelligence on the threat actor. This article presents a proof-of-concept (POC) cyber deception system that has been developed to capture the profile of an attacker in-situ, during a simulated cyber-attack in real time. By dynamically and autonomously generating deception material based on the observed attacker behaviour and analysing how the attacker interacts with the deception material; the system outputs a prediction on the attacker’s motive. The article also explores how this POC can be expanded to infer other features of the attacker’s profile such as psychological characteristics.document cl

Cyber Deception, Criminal Profiling, Cybercrime

^†^†copyright: acmlicensed^†^†journalyear: 2024^†^†doi: XXXXXXX.XXXXXXX^†^†ccs: Security and privacy^†^†ccs: Security and privacy Network security

1. Introduction

Cyber Deception is emerging as a promising defensive strategy that can augment traditional solutions (Zhang and Thing, 2021). Instead of targeting the attacker’s actions, Cyber Deception targets their perception. By confusing or misleading attackers using decoy assets and information, attacks can be delayed and disrupted, increasing the cost to the adversary. Cyber Deception also offers an opportunity to engage with the attacker and manipulate activity in order to gain unique outcomes such as long-term deterrence and collection of precise threat intelligence (Ferguson-Walter et al., 2021b). This has also opened up the cyber defence domain to other lines of scientific study, particularly in the field of psychology, (Ferguson-Walter et al., 2021a) where studies investigating the application of human factors (Ferguson-Walter et al., 2021a) and game theory (Ferguson-Walter et al., 2019) show the rich, multidisciplinary nature of Cyber Deception and the potential for knowledge in other domains to be applied to cyber security.

Typically, Cyber Deception campaigns deploy static deception assets such as honeypots or honeytokens, which can serve as alarms to intrusions or distract attackers, consuming time and resources (Rowe and Rrushi, 2016). While these techniques can be useful to disrupt cyber-attacks, Cyber Deception can perform potentially more valuable activities such as threat intelligence collection (Cho et al., 2020). Cyber threat intelligence has emerged over the past few years to help security practitioners recognise the indicators of cyber-attacks, extract information about the attack methods, and consequently responding to the attack with greater accuracy and speed (Conti et al., 2018).

This work presents a Proof-of-Concept (POC) system that employs a novel cyber deception technique to capture cyber threat intelligence. The POC has been built and deployed on cloud infrastructure and basic testing has been performed successfully. The impacts of these results and the potential avenues for future research is also discussed.

The rest of the article is organised as follows: In Section 2 the current methods for profiling cybercriminals are evaluated. In Section 3 the approach to achieving the research goals is presented, including the attack scenario, system design, architecture and workflow. In Section 4 the outcomes of the work will be discussed along with the constraints and future work before concluding in Section 6. Appendices with supporting figures and tables are found at the end of the article.

2. Related Work

There exists a range of techniques to collect threat intelligence, but it is generally done by analysing large datasets of previous cyber-attacks or open-source information on the internet to detect attack approaches, behaviours and patterns (Sun et al., 2023). Deception environments, or honeypots, have also been used to capture raw data to then process into threat intelligence, but again the threat intelligence is only developed after the attack has occurred on the deception environment (Almohannadi et al., 2018; Sokol et al., [n. d.]). There has never been an attempt to collect this data using deception in-situ, whilst the attack is ongoing, and crucially, leverage that information to achieve the goals of the live deception campaign.

A key element of the threat intelligence picture is the criminal profile. This is a key tool to investigators and is used to narrow the range of suspects and evaluate the likelihood of a suspect committing a crime. It consists of a set of characteristics likely to be shared by criminals who commit a particular type of crime and combines: personal traits of the cybercriminal, behavioural patterns, demographic data, motivations and psychological traits (Martineau et al., 2023; Rogers, 2016). It is typically used in criminal investigations for attribution but can have impact on improving network defence by developing a more mature threat model, and therefore implementing more targeted defences based on skills/motivations of the threat actor.

There are many methods to profile a cybercriminal which approach the task from a variety of angles, however one key factor is common to them all; they are all conducted after the attack has taken place (Bada and Nurse, 2021). This is understandable given that the traditional approach to cyber incident response eradicates the threat before any forensic activities take place (Lickiewicz, 2011).

Another gap in the literature is the extent to which psychological data is captured in the cybercriminal profile. Psychological profiling is of importance in criminal investigations to determine a relationship between the attacker’s personality and the crime committed. This leads to an inference of the modus operandi – a key piece of information used in law enforcement to attribute crimes to individuals or groups (Rogers, 2016). There have been a number of articles describing the importance of this and have proposed frameworks and methodologies demonstrating how psychological profiling can be conducted using forensics collected from the scene of the cybercrime (Lickiewicz, 2011; Chng et al., 2022), however few collected primary data and none use data gathered from deception environments as the primary data source (Bada and Nurse, 2021). Usually, this data is gathered through interviews, questionnaires and other psychometric tests with known cyber offenders; however, as cyber criminals are notoriously difficult to arrest, there is a distinct lack of opportunity to collect psychological data through these means (Rogers et al., 2006).

There have been several studies investigating how to programmatically profile an attacker based on the behaviour captured across a network. Examples of these include using methods such as Fuzzy Inference (Mallikarjunan et al., 2018), hidden Markov models (Katipally et al., 2011) and attack graph analysis (Casey, 2007) to infer characteristics of the attacker, and demonstrate that there is a burgeoning body of knowledge in how different mathematical models can be applied to various datasets to achieve reliable predictions on the attacker’s profile. Honeypots have also been used to capture threat actor behaviour which is then analysed to infer the profile (Fraunholz et al., 2017). The results of these studies are promising, however they all use static captures of the malicious behaviour and do not respond to the threat in real time.

3. Approach

Given the lack of opportunity to profile cybercriminals by traditional means, it would therefore be beneficial if this information could be captured at the most critical opportunity to interact with attackers – during a live attack. This proof-of-concept looks to explore the possibility of this in-situ data collection by delivering targeted deception material to the attacker, the way in which the attacker interacts with this material, elucidates the attacker’s profile. This real-time behavioural analysis could provide a unique and highly valuable method to collect intelligence on attackers.

A crucial addition to the methodology proposed here is the dynamic generation of deceptive content based on the information captured throughout the attack. The system will then automatically deploy the new deceptive content to a new environment and the process iterates (Figure 1). By iterating over several deception environments, the intelligence gained is refined and reinforced. This iterative, dynamic and automated approach to in-situ threat intelligence collection is a core concept of this research and currently unexplored in previous academic work. There may exist the possibility to collect richer psychological data with this method such as biases, personality traits, or even affective states such as confusion or self-doubt; this is discussed in Section 5. However, for the purposes of this proof-of-concept (POC), the scope of the criminal profile is restricted to motive (Martineau et al., 2023).

Refer to caption — Figure 1. High-level workflow of prediction system

3.1. Attack Scenario

To begin developing this deceptive system that reacts to malicious activity, the scope was narrowed to target a specific attacker behaviour. To do this, a particular attack scenario was selected to drive the design of the simulated environment. The scenario proceeds as such: the attacker has gained access to a corporate network and is moving laterally through the system to conduct discovery and/or collection activities. During this lateral movement, the attacker accesses a subnet consisting of a set of fileshare servers, which host a variety of different word documents containing different types of information on the victim organisation. Specifically, the types of information held on these fileshares are Financial, Operational, Human Resources (HR), Legal and Information Technology (IT).

The attacker then initiates a systematic exploration of the fileshares, examining each one in sequence, aiming to identify information within the word documents pertinent to their underlying motive. In this attack scenario, the subnet accessed by the attacker is a deception environment where the fileshare hosts are instrumented, so that all attacker activity on the host is captured without the knowledge of the attacker. It is when the attacker accesses this environment and begins exploring the Word documents that the predictive system is activated.

3.2. System Design

The objective of the system is to generate a prediction of the attacker motive based on the type of information that is accessed. To enable this inference, a relationship between the types of documents present on the system and threat actor motives must first be established. These relationships are mapped in Table 1. One of the constraints in the system design of this POC is the 1-to-1 relationship between motive and document type. A more rigorous and realistic approach would be to establish a 1-to-many relationship between motive and document type with weightings to reflect the relative value of each document type to a threat actor with a particular motive. Due to a lack of evidence in the literature describing such relationships, and to enable the timely development of the POC, the mapping between motive and document type was chosen to be a 1-to-1 relationship determined by intuitive use cases. Examples of these are described in Table 1.

Table 1. Mapping of Motive to Document Type with explanatory Use Case

Motive	Document Type	Use Case
Profit	Financial	The attacker is part of a criminal organisation specialising in ransomware. The group’s motive is generating profit and so are particularly interested in financial documents
Ideological	HR	The attacker is a hacktivist and is ideologically motivated. They want to expose the suspected immoral practices of the victim organization and the individuals behind them. The attacker is therefore interested in HR documents that reveal employee details.
Geopolitical	Operational	The attacker is part of an APT group backed by a nation state who are geopolitically motivated. The group is instructed to steal Intellectual Property from the victim organisation and so are interested in Operational documents.
Satisfaction	IT	The attacker is lone thrill seeker and is motivated by satisfaction. The attacker’s goal is to compromise the most secure server on the network and so is interested in IT documentation.
Discontent	Legal	The attacker is a disgruntled employee presenting an insider threat and wants to cause harm to the victim organisation’s reputation. The attacker is therefore interested in any Legal documents that detail embarrassing disputes the organisation was involved in.

An initial set of deception documents was then generated, the subjects for each document type is shown in Appendix A. The text for these documents was generated by OpenAI’s gpt-3.5-turbo model. To maximize the authenticity of the document the model was asked to produce text that resembled files from the Pandora Papers leak in 2021 (202, 2021). To generate the desired text the guardrails required circumventing by asking the model to behave like a ‘movie prop text writer’. The prompt can be found in Appendix B. More convincing text can undoubtedly be generated by populating the prompt with greater detail on the background of the victim organization but for the purposes of this POC, these results were sufficient. An example document can be found in Appendix C. The files were given generic names according to the subject they belonged to (e.g. IT Asset Inventory01) and then downloaded to a single directory on the first fileshare: Deception Env 1.

When a document is generated a set of attributes is automatically assigned to it. These attributes are described in Table 2. The attributes are stored in an external database that can be queried when a particular document is accessed in the deception environment. Thus, the system can identify which documents have been accessed and what motive a particular document relates to.

Table 2. Document Metadata

Attribute	Type	Description
locHash	String	Sha256 hash of the document location that provide unique identifier for the document. Location described in JSON format with the absolute path on the host and the host name i.e. { path: absolute_path, host: hostname }
deception_host	Integer	Identifier for the deception host the document is deployed to
motives	JSON Object	Identifies which motive the document is related to (e.g. ideological, discontent, geopolitical, satisfaction, profit)
subject	String	The subject the document relates to (e.g. Asset Inventory)
type	String	The type of document produced (e.g. Legal, Financial, HR, Operations or IT)

3.3. System Architecture

The infrastructure on which this system was deployed was Amazon Web Services (AWS), with the exception of OpenAI’s ChatGPT API service, which was accessed over the internet. Figure 2 depicts the architecture of the system with the individual components summarized in Table 3. The system leverages proprietary software from CounterCraft © to instrument the deception environments. This selection was due to the availability and familiarity of the software. Other monitoring tools (e.g. ELK, Splunk) could easily be substituted to perform the same function.

Table 3. System Components

Component	Deployment	Function
Deception Environments	AWS EC2 (Windows server)	Hosting deception material
Deception Director (CounterCraft software)	AWS EC2 (Ubuntu server)	Instrumentation of deception environments. Monitoring for specific events. Hosting logic to send SNS message to Lambda function containing event data. Hosting logic to download deception material to deception environments.
Lambda Function	AWS Lambda	Hosting logic to analyse threat actor behaviour and perform programmatic profiling of threat actor
DynamoDB	AWS DynamoDB	Storage of deception material attributes
ChatGPT API	OpenAI Servers	Generation of deceptive text
S3 Bucket	AWS S3	Storage of deceptive files

3.4. Workflow

Phase 1: Monitor

As mentioned, the fileshares are instrumented so that every event on the deception host is monitored by the Deception Director. A rule is then deployed in the Deception Director which sends a SNS message to trigger a Lambda function, any time a word document is opened on the deception host. The details of the event including the absolute path of the file opened is contained in the SNS message.

Phase 2: Behaviour Analysis

The lambda function then performs a series of steps to identify which motive is assigned to that document. A score is given to the motive by examining the sequence in which the documents are accessed in a given attack on a particular deception environment (e.g. 1st document accessed in deception environment – motive score = 100, 2nd document accessed in deception environment – motive score = 80, etc.) For this POC, the attack on the deception environment is set to be complete when 6 documents are accessed. This was an arbitrary decision and the number of documents accessed before the next phase of the workflow is triggered can be easily reconfigured. Once the 6th document is accessed in the deception environment the motive scores are aggregated and ranked. An example of the scoring is shown in Table 4 with the final aggregated scores shown in Table 5.

Table 4. Example Scoring

Position	Document Subject	Document Type	Associated Motive	Score
1/6	Annual Budget	Financial	Profit	100
2/6	IT Asset Inventory	IT	Satisfaction	80
3/6	Standard Operating Procedures	Operational	Geopolitical	60
4/6	Corporate Governance Documents	Legal	Discontent	40
5/6	Tax Document	Financial	Profit	20
6/6	Employment Contracts	HR	Ideological	0

Table 5. Example Aggregated Scores

Motive	Aggregated Score
Profit	120
Satisfaction	80
Geopolitical	60
Discontent	40
Ideological	0

Phase 3: Motive Elimination and Document Generation

In this phase, the system ranks the motives according to their score and eliminates the lowest scoring motive from the motive list. This revised list is then used to generate a new set of documents in the same way as described in Section 3.2. This set of documents will not contain any document types associated with the lowest scoring motive. In the example given above this would be HR documents associated with the Ideological motive. These newly generated documents will then be deployed to the subsequent deception environment.

Phase 4: Iteration

The previous steps now repeat for the remaining deception environments. After each iteration is complete a motive is eliminated from the motive list and new documents are deployed. The workflow ends when only one motive remains. This is the system’s final prediction of the attackers motive.

3.5. Testing

Informal testing of the system was undertaken to investigate the basic functionality. In this activity five individuals were asked to read a selected use case from Table 1 and play the role of the attacker with the profile described in the use case. The participants were then set the task of exploring the filshares for information pertinent to the goals of attacker role they assumed. This activity was supervised to ensure the participants proceeded through the exercise in the way that it was intended. Within a particular fileshare, the participants were allowed to access the files in any order. Once they had completed exploring a fileshare, they were guided by the supervisor to the next deception environment where new deception material had just been deployed. This continues until all deception environments had been explored. Pleasingly the system predicted the correct motive in every exercise. This result is not indicative of the system’s potential success in realistic attack scenarios, but does demonstrate that the POC functions as intended in dynamically generating targeted deception material and providing an inference on attacker motive.

4. Discussion

4.1. Outcomes

This POC has demonstrated a method to map low-level attacker behaviour (file access) to higher level attacker characteristics (motive). This is a challenging task even with advanced machine learning analytics (Kotenko et al., 2023). The novel approach presented in this paper - analysing attacker behaviour over several iterations, enabled by deception, has been shown to be a promising method for maximising the probability of an accurate inference of higher-level attacker characteristics. Additionally, the system is entirely automated removing the need for human interaction and thus allows for a response to occur in real time, as the attack proceeds.

4.2. Constraints

This POC system is highly constrained and not reflective of a realistic attack. The constraints were put on the system to allow for initial development and should the system continue development to a production ready application, these constraints will need to be addressed. The most significant constraints identified are described below.

This deceptive system currently only targets a very narrow range of the cyber kill chain (Hutchins et al., 2011), namely the Actions on Objective or the Collection Phase in the MITRE ATT&CK framework (MIT, [n. d.]). At this point the attacker has conducted most of their activities and usually about to exfiltrate the selected data and end the attack. To be an effective countermeasure against cyber-attacks the scope should be expanded to disrupt earlier stages of the kill chain.

In addition, this POC relies on specific attacker behaviour i.e. moving from one deception environment to the next in sequence. In reality, the way in which attackers access environment may be unpredictable, this means that the deception campaign must be carefully planned to maximise the chances of the attacker behaving in the intended way. One method for this would be to plant breadcrumbs in initial deception environments that contain information (e.g. credentials) that leads the attacker to a desired deception environment. Other challenges, such as the latency associated with generating and deploying the deception material and the authenticity of the deception material also need to be investigated.

Finally, the predictive function used to infer the motive behind the attack is a rudimentary process of elimination method. This will require a significant revision to reflect the much more complicated relationship between the types of information accessed by the attacker and the underlying motive. Approaches to addressing these limitations are discussed in more detail in the following section.

5. Future Work

Given this system is a POC, there exist many avenues for further development. In this section, the core areas of future research are explored. While this is not an exhaustive list, completion of the following work would provide a much clearer perspective on which direction to focus future research.

5.1. Expansion of system to include network services

As described, the POC only examines attacker interaction with Word documents. For any utility in a real attack, the system must contain deceptive versions of services and assets that are expected in an enterprise network. This may include deceptive email and web services, databases, workstations and active directories, all of which need to be monitored for activity. Importantly, these services need be deployed dynamically, and populated with deceptive content based on the observed attacker behaviour. As such, an API should be developed with the capability to facilitate this dynamic deployment. As with the research presented earlier in this article, the deceptive content could be generated by ChatGPT.

5.2. System validation and data harvesting

To fully validate the effectiveness of the deception strategy, exercises must be carried out with human participants playing the role of the attacker. The participants should not be aware of the deceptive nature of the system and explore the system for information pertinent to the motive of that threat actor. This could reveal which actions the attacker is likely to take and in what order, providing insights into how to manipulate attacker behaviour to ensure the desired outcomes are reached.

As with the POC, a matured system with deceptive network services would react to the behaviour of the participants and deploy targeted deception assets in continually evolving deception environments. The primary goal of these exercises would be to generate data to inform further development. Thus, the entirety of the participant behaviour across the network throughout the exercise should be captured. This dataset would provide a clear link between malicious activity on a network and the threat actor motive.

5.3. Development of model for behaviour analysis

Undoubtedly the predictive power of this system would be significantly enhanced by the application of a machine learning model. The data captured in the validation phase could be used to train this model which could then output predictions of the attacker motive.

The use of neural networks to profile attackers based on network behaviour has been attempted before by Kotenko et. al (Kotenko et al., 2023). This research only yielded moderate results due to the lack of appropriate datasets; however, it demonstrates the feasibility of a machine learning approach. Future work here could look to use similar techniques as described by Kotenko but would benefit from having a labelled dataset of network behaviour generated specifically for training the model. This should maximise the potential performance of any model developed, and deliver a robust prediction on the profile on the attacker. Additionally, this prediction can be reinforced as the attacker iterates over multiple deception environments.

5.4. Psychological profiling

The most impactful and interesting application of this system would be the investigation into how psychological features of the attacker’s profile can be inferred though the dynamic deployment of deceptive content and the attacker’s interaction with it. To enable this investigation of psychological profiling, the capture of psychological information at certain points in the data gathering exercises described above should be carried out. Capturing psychological information when then participant is exposed to a particular deceptive artefact will provide an association between their cognitive state and the network behaviour exhibited on the system. Capturing this information through Cyber Task Questionnaires has been demonstrated before in the literature (Ferguson-Walter et al., 2021a). Questionnaires can also be completed after the exercises to capture the experiences of the participants (Gutzwiller et al., 2019). The aim is to map psychological states and/or biases to network behaviour to so that a network behaviour dataset, labelled with psychological information can be developed. These datasets could then be leveraged in the development of a model to output predictions on the psychological profile of the attacker.

Some exploratory investigations into the future work described here have been conducted, but much of the research remains aspirational. The question of whether attackers are susceptible to decision making biases, and does the interaction with specific deception artefacts reveal these cognitive biases, remains unanswered. Answering this fascinating question requires a multidisciplinary approach and must combine the knowledge in behavioural science with programming and data science techniques.

6. Conclusion

A fully automated and responsive cyber deception system has been developed. This system analyses attacker behaviour and generates deception material based on what is observed. Through iterating this activity over several deception environments, the system can produce a refined prediction of the underlying motive of the attacker.

At its most fundamental level, what the system presents is a communicative interface with the attacker. The selection of deceptive material by the attacker is comparable to a multiple-choice questionnaire that with each selection reveals more about the human behind the attack. This communication with the attackers could open the door to a whole raft of potential experiments, bringing in knowledge and techniques from psychology, criminology and game theory.

The challenges with implementing this research are not underestimated, validating the effectiveness of this deception technique would need careful experimental design. In addition, how this highly constrained system could be translated to real world networks would require significant effort. Nevertheless it is the author’s hope that the work described, and the proposals contained in the article will provide a rich ground for discussion in attempting to move beyond purely technical responses to attacks and usher in a more psychological focus to cyber security.

Acknowledgements.

I would like to thank my colleague Darren Lawrence for his invaluable support and guidance. I would also like to thank the CounterCraft team for providing a license to their software free of charge for this research. Finally, I would like to thank my wife Dejana for her unwavering love and support.

References

(1)
MIT ([n. d.]) [n. d.]. MITRE ATT&CK®. https://attack.mitre.org/ [Online; accessed 2024-05-16].
202 (2021) 2021. Pandora Papers - ICIJ. https://www.icij.org/investigations/pandora-papers/ [Online; accessed 2024-05-16].
Almohannadi et al. (2018) Hamad Almohannadi, Irfan Awan, Jassim Al Hamar, Andrea Cullen, Jules Pagan Disso, and Lorna Armitage. 2018. Cyber Threat Intelligence from Honeypot Data Using Elasticsearch. 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA), 900–906. https://doi.org/10.1109/AINA.2018.00132 ISSN: 2332-5658.
Bada and Nurse (2021) Maria Bada and Jason R.C. Nurse. 2021. Profiling the Cybercriminal: A Systematic Review of Research. 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 1–8. https://doi.org/10.1109/CyberSA52016.2021.9478246
Casey (2007) Timothy Casey. 2007. Threat Agent Library Helps Identify Information Security Risks. DOI: 10.13140/RG.2.2.30094.46406.
Chng et al. (2022) Samuel Chng, Han Yu Lu, Ayush Kumar, and David Yau. 2022. Hacker types, motivations and strategies: A comprehensive framework. Computers in Human Behavior Reports 5 (3 2022), 100167. https://doi.org/10.1016/j.chbr.2022.100167
Cho et al. (2020) Jin-Hee Cho, Dilli P. Sharma, Hooman Alavizadeh, Seunghyun Yoon, Noam Ben-Asher, Terrence J. Moore, Dong Seong Kim, Hyuk Lim, and Frederica F. Nelson. 2020. Toward Proactive, Adaptive Defense: A Survey on Moving Target Defense. IEEE Communications Surveys & Tutorials 22, 1 (2020), 709–745. https://doi.org/10.1109/COMST.2019.2963791 event-title: IEEE Communications Surveys & Tutorials.
Conti et al. (2018) Mauro Conti, Tooska Dargahi, and Ali Dehghantanha. 2018. Cyber Threat Intelligence: Challenges and Opportunities. In Cyber Threat Intelligence, Ali Dehghantanha, Mauro Conti, and Tooska Dargahi (Eds.). Springer International Publishing, Cham, 1–6. https://doi.org/10.1007/978-3-319-73951-9_1 DOI: 10.1007/978-3-319-73951-9_1.
Ferguson-Walter et al. (2019) Kimberly Ferguson-Walter, Sunny Fugate, Justin Mauger, and Maxine Major. 2019. Game theory for adaptive defensive cyber deception. Proceedings of the 6th Annual Symposium on Hot Topics in the Science of Security, 1–8. https://doi.org/10.1145/3314058.3314063 [Online; accessed 2023-08-30].
Ferguson-Walter et al. (2021a) Kimberly J. Ferguson-Walter, Robert S. Gutzwiller, Dakota D. Scott, and Craig J. Johnson. 2021a. Oppositional Human Factors in Cybersecurity: A Preliminary Analysis of Affective States. 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), 153–158. https://doi.org/10.1109/ASEW52652.2021.00040 ISSN: 2151-0830.
Ferguson-Walter et al. (2021b) Kimberly J. Ferguson-Walter, Maxine M. Major, Chelsea K. Johnson, and Daniel H. Muhleman. 2021b. Examining the Efficacy of Decoy-based and Psychological Cyber Deception. 1127–1144. https://www.usenix.org/conference/usenixsecurity21/presentation/ferguson-walter [Online; accessed 2023-08-30].
Fraunholz et al. (2017) Daniel Fraunholz, Daniel Krohmer, Simon Duque Antón, and Hans Dieter Schotten. 2017. YAAS – On the Attribution of Honeypot Data. International Journal on Cyber Situational Awareness 2, 1 (9 12 2017), 31–48. https://doi.org/10.22619/IJCSA.2017.100113
Gutzwiller et al. (2019) Robert S. Gutzwiller, Kimberly J. Ferguson-Walter, and Sunny J. Fugate. 2019. Are Cyber Attackers Thinking Fast and Slow? Exploratory Analysis Reveals Evidence of Decision-Making Biases in Red Teamers. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 63, 1 (1 11 2019), 427–431. https://doi.org/10.1177/1071181319631096 publisher: SAGE Publications Inc.
Hutchins et al. (2011) Eric M. Hutchins, Michael J. Cloppert, and Rohan M. Amin. 2011. Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains. Leading Issues in Information Warfare & Security Research 1, 1 (2011), 80.
Katipally et al. (2011) Rajeshwar Katipally, Li Yang, and Anyi Liu. 2011. Attacker behavior analysis in multi-stage attack detection system. Proceedings of the Seventh Annual Workshop on Cyber Security and Information Intelligence Research, 1–1. https://doi.org/10.1145/2179298.2179369 [Online; accessed 2023-09-14].
Kotenko et al. (2023) Igor Kotenko, Elena Fedorchenko, Evgenia Novikova, and Ashish Jha. 2023. Cyber Attacker Profiling for Risk Analysis Based on Machine Learning. Sensors 23, 4 (1 2023), 2028. https://doi.org/10.3390/s23042028 number: 4 publisher: Multidisciplinary Digital Publishing Institute.
Lickiewicz (2011) Jakub Lickiewicz. 2011. Cyber crime psychology - Proposal of an offender psychological profile. Problems of Forensic Science 87 (1 1 2011), 239–252.
Mallikarjunan et al. (2018) K. Narasimha Mallikarjunan, S. Mercy Shalinie, and G. Preetha. 2018. Real Time Attacker Behavior Pattern Discovery and Profiling Using Fuzzy Rules. Journal of Internet Technology 19, 5 (1 9 2018), 1567–1575. number: 5.
Martineau et al. (2023) Melissa Martineau, Elena Spiridon, and Mary Aiken. 2023. A Comprehensive Framework for Cyber Behavioral Analysis Based on a Systematic Review of Cyber Profiling Literature. Forensic Sciences 3, 3 (9 2023), 452–477. https://doi.org/10.3390/forensicsci3030032 number: 3 publisher: Multidisciplinary Digital Publishing Institute.
Rogers (2016) Marcus K. Rogers. 2016. Chapter 3 - Psychological profiling as an investigative tool for digital forensics. In Digital Forensics, John Sammons (Ed.). Syngress, Boston, 45–58. https://www.sciencedirect.com/science/article/pii/B9780128045268000034 DOI: 10.1016/B978-0-12-804526-8.00003-4.
Rogers et al. (2006) Marcus K. Rogers, Kathryn Seigfried, and Kirti Tidke. 2006. Self-reported computer criminal behavior: A psychological analysis. Digital Investigation 3 (1 9 2006), 116–120. https://doi.org/10.1016/j.diin.2006.06.002
Rowe and Rrushi (2016) Neil C. Rowe and Julian Rrushi. 2016. Introduction to Cyberdeception. Springer International Publishing, Cham. http://link.springer.com/10.1007/978-3-319-41187-3 DOI: 10.1007/978-3-319-41187-3.
Sokol et al. ([n. d.]) Pavol Sokol, Patrik Pekarčík, and Tomáš Bajtoš. [n. d.]. Data Collection and Data Analysis in Honeypots and Honeynets. ([n. d.]).
Sun et al. (2023) Nan Sun, Ming Ding, Jiaojiao Jiang, Weikang Xu, Xiaoxing Mo, Yonghang Tai, and Jun Zhang. 2023. Cyber Threat Intelligence Mining for Proactive Cybersecurity Defense: A Survey and New Perspectives. IEEE Communications Surveys & Tutorials 25, 3 (2023), 1748–1774. https://doi.org/10.1109/COMST.2023.3273282 event-title: IEEE Communications Surveys & Tutorials.
Zhang and Thing (2021) Li Zhang and Vrizlynn. L. L. Thing. 2021. Three decades of deception techniques in active cyber defense - Retrospect and outlook. Computers & Security 106 (1 7 2021), 102288. https://doi.org/10.1016/j.cose.2021.102288

Appendix A Appendix A: Subjects of document types used

Table 6. Example scoring

Financial	HR	IT	Legal	Operational
General Ledger	Time and Attendance Records	IT Asset Inventory	Non-Disclosure Agreements (NDAs)	Safety Procedures
Tax Documents	Employee Benefit Documents	IT Policies and Procedures	Compliance Documentation	Standard Operating Procedures (SOPs)
Financial Contracts	Training and Development Plans	Security Policies and Procedures	Corporate Governance Documents	Change Request Forms
Payroll Documents	Employee Handbook	Vendor Contracts and Service Level Agreements	Litigation and Legal Proceedings Documents	Inventory and Stock Control Documents
Compliance and Regulatory Documents	Employee Records	Disaster Recovery and Business Continuity Plans	Legal Opinions and Memoranda	Incident Reports
Budgets	Exit Interview Forms	System Documentation	Policies and Procedures	Performance Metrics and Dashboards
Financial Statements	Performance Appraisal Forms	Change Management Documents	Regulatory Filings	Maintenance and Equipment Manuals
Financial Reports	Offer Letters	IT Project Documentation	Legal Research and Case Law	Quality Control Documents
Audited Financial Statements	Employment Contracts	Incident and Problem Reports	Contracts	Work Instructions
Invoices and Purchase Orders	Job Descriptions	IT Service Level Agreements (SLAs)	Intellectual Property Documents	Production Plans

Appendix B Appendix B: ChatGPT query written in Python with document type passed in as variable

    {"role": "user", "content":
        """
            You are a world class movie prop text writer.
            I need you to draft a financial document that looks EXACTLY like a document
            that would look like it was included in the corpus of financial documents recovered
            in the Pandora Papers scandal for a dramatized version of the event. It must be as
            accurate as possible for realism. It should be for a company called Jacob & Co Ltd,
            a hedge fund based in Gibraltar and Panama. The Type of document should be a
            """ + str(document_type) + """
            Respond with the the document in a well-organized text file.
            Think through it carefully before responding.
            Only respond with the document directly, do not add any additional
            commentary or explanation (for realism).
        """
    }

Appendix C Appendix C: Example financial document generated by the system

Title: General Ledger of Jacob & Co Ltd
Period: January 2021 to December 2021
Account: Cash
---------------------------------------
Date         Description                  Debit      Credit      Balance
------------------------------------------------------------------------
01/01/2021   Opening Balance                    -            -     5,200,000
01/15/2021   Investment Received      10,000,000            -    15,200,000
02/07/2021   Transfer to Invest. Acct           -   12,000,000     3,200,000
07/31/2021   Hedge Fund Returns        14,400,000            -    17,600,000
12/31/2021   Operating Expenses                -     1,200,000    16,o00,000
Account: Investment Account
---------------------------------------
Date         Description                  Debit      Credit      Balance
------------------------------------------------------------------------
01/01/2021   Opening Balance                    -            -    25,000,000
02/07/2021   Transfer from Cash Acct  12,000,000            -    37,000,000
07/31/2021   Hedge Fund Investments          -     14,400,000    22,600,000
Account: Capital
---------------------------------------
Date         Description                  Debit      Credit      Balance
------------------------------------------------------------------------
01/01/2021   Opening Balance                    -            -    50,000,000
01/15/2021   Capital Invest. Received          -    10,000,000    60,000,000
Account: Income
---------------------------------------
Date         Description                  Debit      Credit      Balance
------------------------------------------------------------------------
01/01/2021   Opening Balance                    -            -             0
07/31/2021   Hedge Fund Income                 -    14,400,000    14,400,000
Account: Expenses
---------------------------------------
Date         Description                  Debit      Credit      Balance
------------------------------------------------------------------------
01/01/2021   Opening Balance                    -            -             0
12/31/2021   Operating Expenses        1,200,000            -     1,200,000
=-=-=-=-=-=-=-=-=-=
End of General Ledger
=-=-=-=-=-=-=-=-=-=