diff --git a/docs/_static/pc_primer.svg b/docs/_static/pc_primer.svg new file mode 100644 index 0000000..7f21c71 --- /dev/null +++ b/docs/_static/pc_primer.svg @@ -0,0 +1,299 @@ + + + + + + + + + + + +mathematics,andthecorelibraryis<1000linesofcode.Unlikeexistingimplementations, +JPC +leveragesordinarydifferentialequationsolvers(ODE)tointegratethegradientflowinference +dynamicsofPCNs. +JPC +alsoprovidessometheoreticaltoolsthatcanbeusedtostudyandpotentially +identifyproblemswithPCNs. +Therestofthepaperisstructuredasfollows.AfterabriefreviewofPC(§ +2 +),weshowcasesome +empiricalresultsshowingthatasecond-orderODEsolvercanachievesignificantlyfasterruntimes +thanstandardEulerintegrationofthegradientflowPCinferencedynamics,withcomparableperfor- +manceondifferentdatasetsandnetworks(§ +3 +).Wethenexplainthelibrary’scoreimplementation + +4 +),beforeconcludingwithpossibleextensions(§ +5 +). + + +2Predictivecoding:Aprimer +HereweincludeaminimalpresentationofPCnecessarytogetstartedwith +JPC +.Thereader +isreferredto[ +14 +, +8 +, +7 +, +12 +]forreviewsandto[ +1 +]foramoreformaltreatment. +PCNsaretypicallydefinedbyanenergyfunctionwhichisasumofsquaredpredictionerrors +acrosslayers,andwhichforastandardfeedforwardnetworktakestheform +F += +L +X +` +=1 +|| +z +` + +f +` +( +W +` +z +` + +1 +) +|| +2 +(1) +where +z +` +istheactivityofagivenlayerand +f +` +issomeactivationfunction.Weignore +multipledatapointsandbiasesforsimplicity. +TotrainaPCN,thelastlayerisclampedtosomedata, +z +L +: += +y +.Thiscouldbealabelfor +classificationoranimageforgeneration,andthesetwosettingsaretypicallyreferredtoas +discriminative +and +generative +PC.Thefirstlayercanalsobefixedtosomedataservingasa +“prior”, +z +0 +: += +x +,suchasanimageinasupervisedtask.Inunsupervisedtraining,thislayeris +leftfreetovarylikeanyotherhiddenlayer. +Theenergy(Eq. +1 +)isthenminimisedinabi-levelfashion,firstw.r.t.theactivities(inference) +andthenw.r.t.theweights(learning) + + + + +x + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +y + + + + + + +Infer: +argmin +z +` +F +(2) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Learn: +argmin +W +` +F +(3) +Theinferencedynamicsaregenerallyfirstruntoconvergenceuntil + +z +` + +0 +.Then,atthe +reachedequilibriumoftheactivities,theweightsareupdatedviacommonneuralnetwork +optimiserssuchasstochasticGDorAdam(Eq. +3 +).Thisprocessisrepeatedforeverytraining +step,typicallyforagivendatabatch.InferenceistypicallyperformedbystandardGDonthe +energy,whichcanbeseenastheEulerdiscretisationofthegradientsystem +˙ +z +` += + +@ +F +/ +@ +z +` +. +JPCsimplyleverageswell-testedODEsolverstointegratethisgradientflow. +3Runtimeefficiency +AcomprehensivebenchmarkingofvarioustypesofPCNwithGDasinferenceoptimiserwasrecently +performedby[ +10 +].Forthisreason,herewefocusonruntimeefficiency,comparingstandardEuler +integrationoftheinferencegradientflowdynamicswithHeun,asecond-orderexplicitRunge–Kutta +method.Notethat,asasecond-ordermethod,HeunhasahighercomputationalcostthanEuler; +however,itcouldstillbefasterifitrequiressignificantlyfewerstepstoconverge. +Thesolverswerecomparedonfeedforwardnetworkstrainedtoclassifystandardimagedatasets, +withdifferentnumberofhiddenlayers +H +2 +{ +3 +, +5 +, +10 +} +.Becauseourgoalwastospecifically +2 + + +