Home > Casper, CasperCheck, Mac administration, Mac OS X, Scripting > CasperCheck – an auto-repair process for Casper agents

CasperCheck – an auto-repair process for Casper agents

One of the issues that I occasionally run into in my shop is that sometimes the Casper agent on individual Macs stops working properly. They stop checking in with the Casper server, or check in but can’t run policies anymore. I’ve set up smart groups on my Casper server to help me identify these machines, but actually fixing them has not been an automated process.

While at the JAMF Nation User Conference in October 2013, I was fortunate enough to hear Mike Dodge and Ajay Chand talk about the challenges they faced at Facebook with keeping Casper agents working in an environment where users are encouraged to break down any obstacle that gets in their way (sometimes, the obstacles in question were perceived to include the Casper agent.) As part of their talk, they mentioned they had a scripted way to verify that the Casper agent was running properly and automatically fix it if it wasn’t. This was a capability that I wanted to include in my own environment, so I asked them if this was going to be available at some point. They said it would be, so I waited to see what would be released.

At this point, the story fast forwards to March 2014, where the Facebook team was able to release their code to GitHub and I was able to take a look and see what they had done. I saw that I could adapt some of their work, but I would need to do additional work on my end to develop a solution that not only worked in my environment, but would be relatively straightforward to adapt to work in others’.

After a lot of work and testing, I’m happy to announce the release of CasperCheck. This is a script-driven solution that will do the following:

A. Check to see if a Casper-managed Mac’s network connection is live

B. If the network is working, check to see if the machine is on a network where the Mac’s Casper JSS is accessible.

C. If both of the conditions above are true, check to see if the Casper agent on the machine can contact the JSS and run a policy.

D. If the Casper agent on the machine cannot run a policy, the appropriate functions run and repair the Casper agent on the machine.

For more details, see below the jump.

As written currently, CasperCheck has several components that work together:

1. A Casper policy that runs when called by a manual trigger.

2. A zipped Casper QuickAdd installer package, available for download from a web server.

3. A LaunchDaemon, which triggers the CasperCheck script to run

4. The CasperCheck script

Here’s how the various parts are set up:

Casper policy

The Casper policy check which is written into the script needs to be set up as follows:

  • Name: Casper Online
  • Scope: All Computers
  • Trigger: Manually triggered by “iscasperup” (no quotes)
  • Frequency: Ongoing
  • Plan: Run Script iscasperonline.sh
  •  
    Screen Shot 2014-04-12 at 3.39.30 PM

    Screen Shot 2014-04-12 at 3.39.57 PM

    Screen Shot 2014-04-12 at 3.40.33 PM

    The iscasperonline.sh script contains the following:

    #!/bin/sh
    
    echo "up"
    
    exit 0
    

    Screen Shot 2014-04-12 at 3.42.46 PM

    When run, the policy will return “Script result: up” among other output. The CasperCheck script verifies if it’s received the “Script result: up” output and will use that as the indicator that policies can be successfully run by the Casper agent.

    Zipped QuickAdd installer posted to web server

    For the QuickAdd installer, I generated a QuickAdd installer using Casper Recon. This is because QuickAdds made by Recon include an unlimited enrollment invitation, which means that the same package can be used to enroll multiple machines with the JSS in question.

    Screen Shot 2014-04-12 at 4.01.23 PM

    Once the QuickAdd package was created by Recon, I then used OS X’s built-in compression app to generate a zip archive of the QuickAdd installer.

    Once I had the zip file ready, I copied it to a location on my Casper server where it would be accessible for download via the Apache web service already running on my Casper server. The zipped QuickAdd could have been posted to any web server; it was just convenient for me to host the QuickAdd zip on the same server that hosted my JSS.

    LaunchDaemon

    As currently written, I have CasperCheck set to run on startup and then once every week. To facilitate this, I’m using a LaunchDaemon similar to the one below.

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
    	<key>Label</key>
    	<string>com.company.caspercheck</string>
    	<key>ProgramArguments</key>
    	<array>
    		<string>sh</string>
    		<string>/Library/Scripts/caspercheck.sh</string>
    	</array>
    	<key>RunAtLoad</key>
    	<true/>
    	<key>StartInterval</key>
    	<integer>604800</integer>
    </dict>
    </plist>
    

    The LaunchDaemon will run on the following command on startup. After startup, the script will then run every seven days:

    sh /Library/Scripts/caspercheck.sh

    CasperCheck script

    The CasperCheck script includes functions to do the following:

    1. Check to verify that the Mac has a network connection that does not use a loopback address (like 127.0.0.1 or 0.0.0.0)

    2. Verify that it can resolve the JSS’s server address and that the appropriate network port is accepting connections.

    3. As needed, download and store new QuickAdd installers from the web server where the zipped QuickAdds are posted to.

    4. Check to see if the JAMF binary is present. If not, reinstall using the QuickAdd installer stored on the Mac.

    5. If the JAMF binary is present, verify that it has the proper permissions and automatically fix any permissions that are incorrect.

    6. Check to see if the Mac can communicate with the JSS server using the “jamf checkJSSConnection” command. If not, reinstall using the QuickAdd installer stored on the Mac.

    7. Check to see if the Mac can run a specified policy using a manual trigger. If not, reinstall using the QuickAdd installer stored on the Mac.

    Assuming that you have set up the Casper Online policy described above on your JSS, you will need to edit the CasperCheck script to set the following variables before using it in your environment:

  • fileURL – put the complete address of the zipped Casper QuickAdd installer package
  • jss_server_address – put the complete fully qualified domain name address of your Casper server
  • jss_server_port – put the appropriate port number for your Casper server. This is usually 8443 or 443; change as appropriate.
  • log_location – put the preferred location of the log file for this script. If you don’t have a preference, using the default setting of /var/log/caspercheck.log should be fine.
  •  
    NOTE: Use caution when editing the functions or variables below the User-editable variables section of the script.

    CasperCheck in operation

    There’s a number of checks built into the CasperCheck script. Here’s how the script works in operation:

    1. The script will run a check to see if it has a network address that is not a loopback address (like 127.0.0.1 or 0.0.0.0). If needed, the script will wait up to 60 minutes for a network connection to become available which doesn’t use a loopback address.

    The network connection check will occur every 5 seconds until the 60 minute limit is reached. If no network connection is found within 60 minutes, the script will exit at that point.

    2. Once a network connection is established that passes the initial connection check, the script then pauses for two minutes to allow WiFi connections and DNS to come online and begin working.

    3. A check is then run to ensure that the Mac is on the correct network by verifying that it can resolve the fully qualified domain name of the Casper server. If the verification check fails, the script will exit at that point.

    4. Once the “correct network” check is passed, a check is then run to verify that the JSS’s Tomcat service is responding via its port number.

    5. Once the Tomcat service check is passed, a check is then run to verify that the latest available QuickAdd installer has been downloaded to the Mac. If not, a new QuickAdd installer is downloaded as a .zip file from the web server which hosts the zipped QuickAdd.

    Once downloaded, the zip file is then checked to see if it is a valid zip archive. If the zip file check fails, the script will exit at that point.

    If all of the above checks described above are passed, the CasperCheck script has verified the following:

    A. It’s got a network connection
    B. It can actually see the Casper server
    C. The Tomcat web service used by the JSS for communication between the server and the Casper agent on the Mac is up and running.
    D. The current version of the QuickAdd installer is stored on the Mac

    At this point, the script will proceed with verifying whether the Casper agent on the Mac is working properly.

    6. A check is run to ensure that the JAMF binary used by the Casper agent is present. If not, the CasperCheck script will reinstall the Casper agent using the QuickAdd installer stored on the Mac.

    7. If the JAMF binary is present, the CasperCheck script runs commands to verify that it has the proper permissions and automatically fix any permissions that are incorrect.

    8. A check is run using the “jamf checkJSSConnection” command to make sure that the Casper agent can communicate with the JSS service. This check should usually succeed, but may fail in the following circumstances:

    A. The Casper agent on the machine was originally talking to the JSS at a different DNS address – In the event that the Casper server has moved to a different DNS address from the one that the Casper agent is expecting, this check will fail.

    B. The Casper agent is present but so broken that it cannot contact the JSS service using the checkJSSConnection function.

    If the check fails, the CasperCheck script will reinstall the Casper agent using the QuickAdd installer stored on the Mac.

    9. The final check verifies if the Mac can run the specified policy. If the check fails, the CasperCheck script will reinstall the Casper agent using the QuickAdd installer stored on the Mac.

    Assuming all of the checks are passed successfully, the script exits until the next weekly check.

    Logging

    Here’s what some of the various log statuses should look like:

    If the check for a network connection with non-loopback address fails:

    Screen Shot 2014-04-12 at 5.26.20 PM

    If the check for the network with the appropriate Casper server fails:

    Screen Shot 2014-04-12 at 5.23.53 PM

    If the Tomcat service check fails:

    Screen Shot 2014-04-13 at 1.25.33 PM

    If the zip file check fails:

    Screen Shot 2014-04-13 at 1.30.56 PM

    If the check for the jamf binary finds it missing:

    Screen Shot 2014-04-12 at 7.08.41 PM

    If the check for the jamf binary finds it missing and a new QuickAdd needs to be downloaded:

    Screen Shot 2014-04-12 at 5.53.14 PM

    If the JSS service communication check fails:

    Screen Shot 2014-04-12 at 6.04.16 PM

    If the JSS service communication check fails and a new QuickAdd needs to be downloaded:

    Screen Shot 2014-04-12 at 6.21.52 PM

    If the JSS policy check fails:

    Screen Shot 2014-04-12 at 5.40.07 PM

    If the JSS policy check fails and a new QuickAdd needs to be downloaded:

    Screen Shot 2014-04-12 at 6.30.44 PM

    If all the checks succeed and there’s nothing to fix:

    Screen Shot 2014-04-12 at 6.19.30 PM

    If all the checks succeed and there’s nothing to fix, but a new QuickAdd needs to be downloaded:

    Screen Shot 2014-04-12 at 6.37.38 PM

    The script and an example LaunchDaemon are available here on my GitHub repo:
    https://github.com/rtrouton/rtrouton_scripts/tree/master/rtrouton_scripts/Casper_Scripts/CasperCheck

     
    A Casper Extension Attribute for detecting if CasperCheck is installed is available here:
    https://github.com/rtrouton/rtrouton_scripts/tree/master/rtrouton_scripts/Casper_Extension_Attributes/CasperCheck

     
    The CasperCheck script is also available below:

    1. Manny
      April 23, 2014 at 6:53 pm

      Rich,

      Congratulations on a fine job of converting the Facebook code into something more general and usable.

      But I have to ask:
      1. what’s going on in your environment that might lead to Casper clients losing manageability in the first place?!
      2. Do your users have admin privileges, for example, and are they making changes that trigger loss of manageability?
      3. Or might software be running with leaks in them (e.g. memory leaks, process leaks) that eventually starve the clients of resources, which in turn causes loss of manageability?

      Whatever the cause, it seems to me that JAMF Software should be the one implementing functionality to detect loss of manageability and then restoring it.

      • April 23, 2014 at 8:24 pm

        Manny,

        If you haven’t watched watched the Facebook folks’ presentation from the 2013 JAMF Nation User Conference, I recommend it. I’ll just say that some of the challenges they’ve faced sound familiar to me.

      • Ben LeRoy
        April 25, 2014 at 1:33 pm

        I can’t speak for Rich but in our environment we see this happening in our environment has to do with at times an issue occurs with the keychain and Casper CA. We would find clients in a catch 22:
        1) The certificate chain to the JSS fails for any number of reasons
        2) The client checks into the JSS
        3) The client does not trust the JSS so it fails to correct the issue so the process just repeats

        This self healing solution is great because it caches the solution on the client to correct the problem automatically.

    2. April 24, 2014 at 3:28 pm

      Rich what is the name of that talk? It doesn’t look like any of the videos on JAMF’s Youtube have Facebook in the title. 🙂

    3. May 2, 2014 at 9:15 pm

      Rich, this looks awesome! You did a great job generalizing the code. Would love to see people defining re-usable code libs though.
      The one thing I would add is, make sure that when you write the script to the machine, to set the permissions to be root : 500 otherwise anyone can add malicious code that would get ran by root.

      Manny, I wont lie, I know they docs on our git repo need work. I make alot of assumptions, I hope to revisit it soon. But for more context watch this : http://www.ustream.tv/recorded/45958936

    4. Tom
      May 6, 2014 at 8:19 pm

      Hey Rich,

      I sent you an email earlier with some ideas. Let me know if you didn’t get it.

      Tom

    5. physh
      June 2, 2014 at 9:01 pm

      For what it’s worth, I have added a check in the /etc/hosts file since I have found users redirecting casper traffic to 127.0.0.1. Sigh.

      DeleteEtcHostEntries(){

      sudo sed -i “” “/casper*/d” /etc/hosts

      }

    6. dmueller
      June 4, 2014 at 1:54 am

      Hi Rich, I’ve run into a minor issue with the InstallCasper () section of your script. Modifying “-d” to “-f” allows the previously downloaded file to be found. For whatever reason, the resulting renamed casper.pkg came across as a file instead of a folder. Not sure if anyone else has seen this occur (on 10.8.5)

      • June 4, 2014 at 2:56 am

        What version of Casper are you using? It’s been my experience that Recon creates a bundle-style QuickAdd package, but a flat package would explain the issue you’re seeing.

      • June 4, 2014 at 3:22 am

        It just occurred to me what the issue may be. Do you sign your QuickAdd packages? If so, that would likely cause Recon to produce a flat package instead of the usual bundle-style package.

    7. dmueller
      June 4, 2014 at 3:25 am

      Using version 9.3.1. The Recon created package is indeed “bundle”, but between the curling of the zip and the move, it no longer behaves that way. Weird, but that modification in the script allows for it to properly install instead of constantly re-downloading. Seems to behave the same on my 10.9.x testers as well.

    8. dmueller
      June 4, 2014 at 3:33 am

      Yes, they are signed.

      • dmueller
        June 4, 2014 at 5:26 pm

        Confirmed that it is a flat package this morning. Thanks. I added some logic to the script to check for either condition.

        • June 5, 2014 at 10:18 am

          Would you mind sharing the logic you added? I may be able to add it to the script.

    9. dmueller
      June 5, 2014 at 4:33 pm

      Gladly. Here are the changes I made to the if statements in the InstallCasper function in order of appearance:

      if [[ ! -d “$quickadd_installer” ]] && [[ ! -f “$quickadd_installer” ]] ; then

      if [[ -d “$quickadd_installer” ]] || [[ -f “$quickadd_installer” ]] ; then

      Hopefully they come across ok. Thanks again.

    10. dmueller
      June 5, 2014 at 4:37 pm

      “-e” may also work well for both, just haven’t tried it with a bundle package.

      • June 7, 2014 at 6:32 pm

        It looks like using “-e” for the file test operator works fine for both the flat and bundle-style QuickAdd packages.

        I’ve now updated the script in this post and in my GitHub repo to include this change. Thanks for bringing this to my attention; hopefully this will save someone else some trouble going forward.

    11. June 17, 2014 at 1:43 pm

      Thank you for this! Quick question though, the quickadd_dir/_zip/installer/etc. My quickadd package after I zipped it, was quickadd.pkg.zip would I need to modify anything to those? Also, where does the “casper.pkg” come from?

      • June 17, 2014 at 8:06 pm

        John,

        I’d encourage you to read through the script. Hopefully the comments in the script should answer your questions.

    12. July 10, 2014 at 7:56 pm

      Excellent work here. Just tried out today, but getting a corrupted zip file on my compressed QuickAdd.pkg. Manually download/running works fine. Could this be related back to the issue brought up by dmueller? Tried both signed and unsigned version of the QuickAdd, same result.

      • July 11, 2014 at 11:15 am

        Cliff,

        When you’re manually downloading, are you using curl? If not, please use the curl command used by the script to manually download and see if you’re receiving any errors from curl.

    13. Hank
      July 21, 2014 at 5:45 pm

      Thanks fore the work. I am getting ready to deploy CasperCheck, but working through I realized that we have not migrated our distribution point to the 9.x version, meaning scripts are still located on the distribution point.
      Our JSS is globally accessible, but not our DP (we get occasional errors as a policy fails because it cannot mount the DP, but we have mostly configured it so that policies requiring a DP only run internally)
      As a work-around, I might look at using 9.x’s “Execute Command” portion of a policy. Is there any reason you went with a script rather than just executing an “echo ‘up'” command directly from the policy?
      Thanks again,
      Hank

      • July 21, 2014 at 6:56 pm

        Hank,

        Using a script was a design choice, and I also knew it worked because of the original work done by the Facebook team.

        If you want to go a different route, feel free to modify the script as you see fit.

    14. September 27, 2014 at 5:34 am

      Unfortunately I am getting a corrupt zip file as well. Tried zipping through Apples’ compression utility, then just tried to upload to CasperShare since the Windows DP zips it as well. In both cases it came back saying the file is corrupted. Any ideas why or how to fix? Thanks!

    15. JP
      March 2, 2015 at 8:29 pm

      I’m getting the same corrupted zip file error as well. Anyone know a way around this?

    16. Jason
      March 3, 2015 at 9:23 am

      Hey rich,

      “Once I had the zip file ready, I copied it to a location on my Casper server where it would be accessible for download via the Apache web service already running on my Casper server. The zipped QuickAdd could have been posted to any web server; it was just convenient for me to host the QuickAdd zip on the same server that hosted my JSS.”

      Would you mind including some quick instructions on how you did this on your existing JSS server? Sorry I am just not too familiar with apache webservers. I am running my JSS on a windows server wth apache tomcat.

    17. JP
      March 3, 2015 at 3:30 pm

      So, curl is downloading a 336 byte file, instead of a 3.2 MB file. Curl isn’t returning any errors otherwise. Is there something I’m missing using curl?

    18. JP
      March 3, 2015 at 4:17 pm

      Ok, I found our problem with the QuickAdd.zip giving the error message, at least on my set up. Our web server was redirecting http traffic to https. As a result, curl was downloading the literal alias file at the http address, rather than following the redirect (the 336 byte file was the hint). My solution was to edit the script to make the fileURL point to the https address. Now it is working perfectly!

    19. Matt
      April 30, 2015 at 9:27 pm

      Rich,

      Thanks for the awesome script! I’m having issues with the InstallCasper portion. The casper.pkg file returns that it successfully installs, but then the log shows that it’s immediately reinstalling because Casper can’t run policies. It tries to install 3 times and then finishes, but it never installs correctly.

      If I manually run the script in terminal I can see the error that the install failed when trying to run the postflight script. This is the error it gets each of the 3 times before it quits. If I double-click the quickadd zip file that curl downloads to /tmp to extract the pkg, and then move it into /var/root/quickadd, the script runs fine from terminal. However, if I restart to have the script run from the LaunchDaemon it fails after 3 tries again.

      It seems like it may be a permissions issue, but I’m kind of lost as to what to try next. Do you have any suggestions?

      • Matt
        May 1, 2015 at 4:41 pm

        My bad. I zipped the QuickAdd.pkg on my Windows machine instead of on the Mac. I re-uploaded the Mac zipped file to our web server and everything is working as intended.

    20. roy
      June 22, 2015 at 4:31 am

      hi rtrouton,

      since version 9, Casper change from using trigger to event
      https://jamfnation.jamfsoftware.com/article.html?id=52

      old
      jamf_policy_chk=`/usr/sbin/jamf policy -trigger iscasperup | grep “Script result: up”`

      jamf_policy_chk=`/usr/sbin/jamf policy -event iscasperup | grep “Script result: up”`

      Thanks for the solution.

    21. kheenan halvorson
      August 11, 2015 at 4:05 pm

      I have two questions, if its easier to direct me to a place where I can figure it out that works to but any help is appreciated.

      1. I see the launch daemon, i know what it is but how do i create? or place it in the correct place?
      2. How do i start up a webserver, id assume i would put it on my current jss since that is available via https. I am just not sure how i start up the webserver and how to place the zip file. Again any help would be greatly appreciated.

    22. John Klimeck
      September 12, 2015 at 5:47 pm

      rich, this a great piece of work, really saves the day. I have this working in our environment, but it only works on Mac reboot, not for the interval. I suspected permissions, and I fixed the permissions (apply them in postinstall script in a pkg made with Composer. I actually have tested this by altering the interval in the .plist file to 120 seconds, and sure enough every 120 seconds, the script runs. I want it to happen every 48 hours (172800 seconds) and it doesn’t execute. I suspected launchctl load, and I am running that command as well in the postnatal script. Would be great to get this to run every 48 hours. Thx in advance.

    23. John Klimeck
      September 12, 2015 at 5:49 pm

      corrected, postinstall script

    24. sdkorders
      November 25, 2015 at 5:40 pm

      This is a great script! I’ve tested and implemented this, however I am noticing when I moved it from testing to a wide implementation large amounts of systems keep re-enrolling when they shouldn’t need to and are getting a Device Signature error. Any ideas what could be causing that? Once they re-enroll it all works fine.

    25. adaman
      August 18, 2017 at 4:53 pm

      Is there any risk in leaving the quickadd package on a publicly accessible webserver? I’m just wondering if there’s any way a hacker could either reverse engineer it and get some form of access to the embedded management account credentials, or use it to enroll a bunch of random machines in the wild to the server. Thoughts?

    1. No trackbacks yet.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s

    %d bloggers like this: