Introduction to AI
Project 6: Second learning project
Summary
Note: Through LOTS of discussion in slack, this project is now EXTRA CREDIT ONLY! You have two options: do the full project and replace a lower project grade or do the smaller extra-credit only project.
This is the second of two projects where you will bring machine learning into your agents. This project is estimated to be 10-15 hours of coding but will take additional time to actually produce the learning curves. Note, as with all of the projects, graduate students have additional requirements.
By the end of this project, you will have accomplished the following objectives.
- Use machine learning inside your spacesettlers agent effectively
- Reused your code from the previous projects as needed (primarily to navigate around the environment)
Project 6 tasks
By popular request, we stay in the regular spacesettlers environment for this task. This project will focus on reinforcement learning (module 9).
- Your job is to create a spacewar agent that uses reinforcement learning to control some aspect of its behavior. You can choose a task but here are some example ideas to get you started.
- Learn to navigate efficiently from point A to point B
- Learn when and how to shoot at an opponent
- Learn which high level actions to take to maximize game score
- To keep you from going astray on something that likely will not work, you have two due dates: the first is to propose the task you intend to solve (including the state space and reward function) and the second is the regular project deadline.
- Note: RL is easiest if you have a discrete state space. To make this much easier, you should propose an approach to discretizing the state space in your proposal. This approach can be very simple (gridding the variables you need for the task you chose) or it could rely on other methods such as clustering.
- First due date is Nov 19 11:59pm: you MUST propose your project idea by this date
- Second due date is Dec 3 11:59pm: regular project deadline
- Note that CS 5033 students must either create an agent that integrates methods from project 5 and RL or two RL tasks
Implementing learning
Several important notes for you to make your learning successful:
- You can implement the learning offline, outside of the spacesettlers system so long as you can read in your model back into your agent and use it.
- In order to learn anything, you will need to collect a LOT of data. Use the initialize function to open a file handle, save whatever data you need for learning during the agent’s lifetime, and then use the shutdown function to close the file handle.
heuristic agents
Since we move back to the regular non-CTF environment, the heuristic agents go back to the original heuristics as described in projects 1-3.
Multiple agents
While not required, you are allowed to have multiple ships this project if you want!
Class-wide ladders – Extra credit
The extra credit ladders remain the same as with Project 1 through 3. You are welcome to choose a different ladder path than you chose for either of the previous projects. The class-wide ladders will start on Nov 18, 2021.
extra credit
The extra credit opportunities for being creative and finding bugs remain the same as in Project 1. Remember you have to document it in your writeup to get the extra credit!
How to download and turn in your project
- Update your code from the last project. You can update your code at the command line with “git pull”. If you did not get the code checked out for project 0, follow the instructions to check out the code in Project 0.
- Note: the directories for config files changes BACK to the files we used for projects 1-3 for this project! As do the targets you will want to run in build! Change the SpaceSettlersConfig.xml file in spacesettlers/config/heuristicCompetitive or heuristicCooperative to point to your agent in src/4×4. The detailed instructions for this are in project 0. Make sure to copy over a spacesettlersinit.xml in the src/4×4 directory so your agent knows how to start. In spacesettlersinit.xml change the line <ladderName>Random Client</ ladderName> to the team name you chose in Canvas.
- Write your learning code as described above
- Build and test your code using the ant compilation system within eclipse or using ant on the command line if you are not using eclipse (we highly recommend eclipse or another IDE!). Make sure you use the spacesettlers.graphics system to draw your graph on the screen as well as the path your ship chose using your search method. You can write your own graphics as well but the provided classes should enable you to draw the graph quickly.
- Submit your project on spacesettlers.cs.ou.edu using the submit script as described below. You can submit as many times as you want and we will only grade the last submission.
- Submit ONLY the writeup to the correct Project 5 on canvas:
- Copy your code from your laptop to spacesettlers.cs.ou.edu using the account that was created for you for this class (your username is your 4×4 and the password that you chose in project 0). You can copy using scp or winscp or pscp.
- ssh into spacesettlers.cs.ou.edu
- Make sure your working directory contains all the files you want to turn in. All files should live in the package 4×4. Note: The spacesettlersinit.xml file is required to run your client!
- Submit your file using one of the following commands (be sure your java files come last). You can submit to only ONE ladder. If you submit to both, small green monsters will track you down and deal with you appropriately.
/home/spacewar/bin/submit --config_file spacesettlersinit.xml \ --project project6_coop \ --java_files *.java
/home/spacewar/bin/submit --config_file spacesettlersinit.xml \ --project project6_compete \ --java_files *.java
-
- After the project deadline, the above command will not accept submissions. If you want to turn in your project late, use:
/home/spacewar/bin/submit --config_file spacesettlersinit.xml \ --project project6_coop_late \ --java_files *.java
/home/spacewar/bin/submit --config_file spacesettlersinit.xml \ --project project6_compete_late \ --java_files *.java
Rubric – Part 1 Due Nov 19 11:59pm
- If you are doing this for the project replacement, this is still 10 points. For extra credit, this is not needed (see below). Note you can NOT do the same project as is listed for extra credit below!
- First due date: Nov 19 11:59pm 10 points for project proposal
-
10 points for turning in a ONE paragraph project proposal on canvas here
- 0 points for not turning in a proposal
-
Rubric for full project replacement – Part 2 Due Dec 3 11:59pm
- Reinforcement learning
- 20 points for correctly implementing the RL method that you proposed and got feedback on (if you were told to choose a different method, you need to implement the method you were told to adjust to). A correct learner uses learning in a way to improve performance and learning will be demonstrated in the writeup (though the curve is graded separately) using a learning curve. Learning code should be well documented to receive full credit.
-
15 points if there is only one minor mistake.
-
10 points if there are several minor mistakes or if documentation is missing.
-
5 points if you have one major mistake
- State space
- 10 points for a state space representation that is appropriate to the task being solved and is correctly implemented
- 5 points for bugs
- Reward function
- 10 points for a reward function appropriate to the task being solved and is correctly implemented
- 5 points for bugs
- Graphics
- 10 points for correctly drawing graphics (or using printouts) that enable you to debug your learning and that help us to grade it.
- 7 points for drawing something useful for debugging and grading but with bugs in it
- 3 points for major graphical/printing bugs
- CS 5013 students only: You must EITHER implement a second RL task or integrate something from your project 5 agent into your project 6 RL agent as well. Both must be documented in the writeup
- 20 points for correctly implementing a second RL task or integrating a learning method from project 5 into project 6 and documenting with a learning curve and paragraph describing it in the writeup
- 10 points if you implement it but do not give a second learning curve
- 5 points for bugs
- Good coding practices: We will randomly choose from one of the following good coding practices to grade for these 10 points. Note that this will be included on every project. Are your files well commented? Are your variable names descriptive (or are they all i, j, and k)? Do you make good use of classes and methods or is the entire project in one big flat file? This will be graded as follows:
- 10 points for well commented code, descriptive variables names or making good use of classes and methods
- 5 points if you have partially commented code, semi-descriptive variable names, or partial use of classes and methods
- 0 points if you have no comments in your code, variables are obscurely named, or all your code is in a single flat method
- Writeup: 30 points total. Your writeup is limited to 2 pages maximum. Any writeup over 2 pages will be automatically given a 0. Turn your writeup in to canvas and your code into spacesettlers.
- 20 points for collecting data and demonstrating learning using a learning curve (in the writeup). For full credit, make sure you explain why it is learning or not learning (if it isn’t learning, you will not lose your points if you can explain WHY it is not learning)
- 10 points for describing your RL approach including your state space and reward function in a paragraph or two
Rubric for extra credit project – Part 2 Due Dec 3 11:59pm
- Reinforcement learning
- 15 points for correctly implementing Q-learning on the task of learning how to orient correctly to fire a bullet at another ship or base. A correct learner uses learning in a way to improve performance and learning will be demonstrated in the writeup (though the curve is graded separately) using a learning curve. Learning code should be well documented to receive full credit.
-
10 points if there is only one minor mistake.
-
5 points if there are several minor mistakes or if documentation is missing.
- State space
- 5 points for implementing: Your state space should be a discretized set of relative angles between you and the other ship or base. You should break it into no smaller than 5 degree increments (but do it in radians, all the math in java is in radians!). You can break into larger groupings but no larger than 22.5 degrees.
- Reward function
- 2 points for implementing: Reward your agent +1 for hitting the ship, -1 for missing, and 0 if the bullet gets lost in some other way (e.g. hits a beacon or asteroid)
- Graphics
- 3 points for correctly drawing graphics (or using printouts) that enable you to debug your learning and that help us to grade it.
- Good coding practices: We will randomly choose from one of the following good coding practices to grade for these 10 points. Note that this will be included on every project. Are your files well commented? Are your variable names descriptive (or are they all i, j, and k)? Do you make good use of classes and methods or is the entire project in one big flat file? This will be graded as follows:
- 5 points for well commented code, descriptive variables names or making good use of classes and methods
- Writeup: 5 points total. Your writeup is limited to 2 pages maximum. Any writeup over 2 pages will be automatically given a 0. Turn your writeup in to canvas and your code into spacesettlers.
- 5 points for collecting data and demonstrating learning using a learning curve (in the writeup). For full credit, make sure you explain why it is learning or not learning (if it isn’t learning, you will not lose your points if you can explain WHY it is not learning)