Automating Cost Concerns of Fortis Redesigns

Abhijit Paul

<2024-09-17 মঙ্গল>

We completed 3 approaches before, involving UML to LTL parsing, converting big formally defined systems into UML and finding more case studies. We document the next approaches here.

Approach 1: Update Optimization function to address cost concerns

I am trying to change codebase of Fortis - a research paper made by a PhD student in CMU. So I need to run that codebase for that, firstly.

We mainly encounter these issues after somehow resolving dependency issue.

Error type
Unresolved reference
Cannot infer a type for this parameter
Not enough information to infer type variable
Overload resolution ambiguity

And they occur from:

Unresolved reference: X Count
Word 306
NFA 108
Alphabets 34
Util 108
automaton 120
variable S 70
AutomatonBuilders 22
TSTraversal 976

These classes are in net.automatalib. It imports properly and we have also tried with v0.9.0-0.11.0. And most importantly, it does not show dependency error that it can't find a library. It just says:

Dependency not found!

These dependencies are not available in maven repository. So they must be available as jar files online. But I could not find any such jar files! And I checked .gitignore file of the fortis-core repository and it excluded bin folder that includes the jar files.

Possible reasons:

  1. It is available somewhere online and I cannot find it, yet.
  2. It is an internal file of cmu.soda
  3. Remove those dependencies and see what happens: Many other issues are still there, sigh

common package not found!

As you can see in the list of packages, there is no common package. I checked last 30 commits randomly and did not find this folder in ANY of the commits! Its also not in .gitignore

Possible solutions:

  1. My friend Radium found this fortis-api repository. It has THE common package! It worked!
  2. And here is LTL Learning repository.
  3. And here is automatalib-core repository. It did not work so we found this.
  4. Also, automatalib-common may also be needed.

Their pom file is also incomplete! We had to manually add 1 dependency, but it still shows error. The eternally long list of errors can be found here

Approach 2: Use ChatGPT to generate costs

The goal is to generate costs and use that to generate graphs. Then we will inspect how effective those generations are.

The input and prompts can be found in the dataset. Now let us consutrct a dataset.

Dataset Size Result ChatGPT
Promptv1 5 systems Failed to find a solution. chatGPT logs

We skip totalcost for now. It makes the process complex. Now we need to run Fortis to understand how good the cost assignments are.

Approach 3: Eye gaze tracking Literature Review

There are existing works that uses eye gaze tracking to assess the human comprehension complexity of a diagram.

Experiment 1: Creating Eye Gaze Tracking validation dataset

We now start collecting data for evaluating it. Our possible data sources are:

Dataset source N Challenge
Fortis Codebase 7 systems Big and complex
LTSA tool examples 97 examples Algorithmic examples
2 papers using LTL systems as part of process 1, 2 2 systems Extremely complex

But now we do have a a confusion. How do express the LTL codebase as a graph? Perhaps Finite State Machine translations can help. We found a solution! LTSA tool has a feature of exporting LTL code to aut format, which is basically transitions between states. Using that, we can draw the UML diagrams. The graph itself is enough!

Pairwise Comparison Method

Techniques from psychometrics and cognitive psychology, such as the paired comparison method, can be utilized. This approach has been used to assess preferences or perceived complexity between design alternatives. Participants are asked to assess which item in each pair is preferred, better, or in this context, more complex. For each design, count how many times it was preferred over others. This count will be the design's score. Rank the designs based on their scores from highest to lowest. The higher the score, the more complex the design is perceived to be.

A/B Testing in Design

We can use it to estimate design comprehension of users - what do they understand by reading the problem details and the design?

A/B testing, also known as split testing, is a method used to compare two or more variations of a design (such as a website, user interface, or product feature) to determine which one performs better based on specific metrics.Hypothesis here can be: "Changing the button color from blue to green will increase click-through rates." We can use it in our case with the hypothesis, "Which design takes longer to understand?". Annotators will annotate and note down their understanding. Using original problem definition (hidden) and user's understanding, we will analyze how well they understood the design within that time. If a design takes long time to understand but still the user understands it wrong, then its probably a bad design.

Planning

Now we need to formalize the annotation process.

  1. Pairwise Comparison Methods or A/B Testing in Design?
  2. Show 2 design and user will select which he finds easier. So it will be fast but need to create a tool for it.
  3. For the second one, we first need to scrape example definition from the book for increasing problem definition accuracy. But I can't seem to download the book from libgen! :(

References


  1. R. Wang, Y. Zhou, S. Chen, S. Qadeer, D. Evans, and Y. Gurevich, “Explicating sdks: Uncovering assumptions underlying secure authenti- cation and authorization,” in Proceedings of the 22th USENIX Security Symposium, Washington, DC, USA, August 14-16, 2013, 2013, pp. 399– 314.↩︎

  2. S. Sun and K. Beznosov, “The devil is in the (implementation) details: an empirical analysis of OAuth SSO systems,” in the ACM Conference on Computer and Communications Security, CCS’12, Raleigh, NC, USA, October 16-18, 2012, 2012, pp. 378–390.↩︎